الدورات

title


Best Ways to Safely Delete 10 Million Records in MySQL Without Crashing Your Server

Best Ways to Safely Delete 10 Million Records in MySQL Without Crashing Your Server

deleting 10 million rows is doable but you need to be careful: big deletes can lock tables, bloat InnoDB undo/redo/ibdata files, block replication, and take the server down if done naively. Below I give the best approaches (fast → safe), when to use each, concrete commands/scripts you can run, and a checklist of gotchas.


Top 3 strategies (pick one)

1) DROP a partition (fastest, safest if data is partitioned)

If the rows to remove map cleanly to one or more partitions (e.g., by date or ID range), just ALTER TABLE ... DROP PARTITION. This instantly removes the data with minimal redo and no row-by-row delete.

When to use: your table is partitioned by the column used in the delete condition (e.g., DATE or id_range).

Example:

-- assume table partitioned by RANGE (YEAR(created_at)) or by id ranges
ALTER TABLE mytable
  DROP PARTITION p2022, p2023;

Benefit: near-instant, minimal server load. Afterward run ANALYZE/OPTIMIZE if needed.


2) Create new table with only kept rows, swap (fast, minimal logging)

When you can identify rows to keep rather than delete, copy the kept rows to a new table, rename tables. This avoids row-by-row delete and is usually faster than deletes.

When to use: you’re deleting a large fraction of rows and can afford a maintenance window and enough disk space to hold a copy.

Steps:

-- 1) create new table like old one (indexes too)
CREATE TABLE mytable_new LIKE mytable;

-- 2) copy the rows you want to keep (use chunking if needed)
INSERT /*+ NO_INDEX_MERGE */ INTO mytable_new
SELECT * FROM mytable WHERE <KEEP_CONDITION>;   -- e.g. created_at >= '2024-01-01'

-- 3) swap names inside a transaction or with rename
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;

-- 4) drop old
DROP TABLE mytable_old;

Notes: copying will generate logs and I/O. Keep constraints and foreign keys in mind — if there are FK dependencies, handle them (disable/recreate or use pt-online-schema-change like approach).


3) Chunked DELETE loop (safe, simplest if partitioning/copy not possible)

Delete rows in small batches (e.g., 1k–100k rows per statement) and commit between batches. This keeps transactions small, avoids long locks, and keeps InnoDB undo space under control.

When to use: you can’t partition and can’t copy the whole table.

Recommended pattern (by PK range or LIMIT with ORDER BY):

Option A — delete by primary key range (recommended)

-- find smallest and largest id to delete then loop in app/script
DELETE FROM mytable
WHERE id >= 1000000 AND id < 1010000;
-- repeat for next range

Option B — delete using LIMIT (works if no full table scan is too expensive)

-- repeat until no rows deleted
DELETE FROM mytable
WHERE <condition>
ORDER BY id
LIMIT 10000;

Example shell loop (bash + mysql client):

while true; do
  affected=$(mysql -N -B -e "DELETE FROM mydb.mytable WHERE <condition> ORDER BY id LIMIT 10000; SELECT ROW_COUNT();" | tail -n1)
  if [ "$affected" -eq 0 ]; then break; fi
  sleep 0.5
done

Or a stored-proc style (MySQL):

SET @rows = 1;
WHILE @rows > 0 DO
  DELETE FROM mytable WHERE <condition> ORDER BY id LIMIT 10000;
  SET @rows = ROW_COUNT();
  DO SLEEP(0.2); -- optional throttle
END WHILE;

Tuning:

  • Batch size: start with 5k–50k depending on row size and server. If deletes are causing IO spikes, lower the batch size.
  • Sleep between batches (0.1s–1s) to reduce load.
  • Use primary-key order to avoid expensive scans.
  • Commit after each DELETE (each statement auto-commits unless you're in explicit transaction).

Practical considerations & tips

  1. BACKUP first — always take a logical or physical backup or snapshot before massive deletes.
  2. Foreign keys & cascading
  • If FKs exist, a DELETE may cascade to child tables or cause FK checks to scan. Consider deleting children first or temporarily dropping/recreating FKs (only if you’re 100% safe).
  1. Binary log / Replication
  • On a replica setup, deleting many rows will replicate and can cause lag. Consider throttling or using pt-archiver which can be replication-friendly. Disabling binlog is risky and usually not recommended unless you fully understand implications.
  1. Transaction size
  • InnoDB handles deletes within transactions; large transactions consume undo/redo logs and may blow up disk usage. Keep transactions small by batching.
  1. Indexes
  • Deleting many rows updates indexes. If you copy-to-new-table approach, you can create non-unique indexes after copying for faster insert.
  1. Triggers
  • If triggers exist, they’ll fire per-row. That can be very slow — consider disabling triggers if possible.
  1. Server settings to watch
  • innodb_log_file_size (large deletes produce redo); innodb_buffer_pool_size (cache), disk IO. Monitor SHOW PROCESSLIST, SHOW ENGINE INNODB STATUS.
  1. Use tools if available
  • Percona Toolkit’s pt-archiver is designed for large deletes with throttling and safe archiving. (If you use it, test first.)
  1. Post-cleanup
  • Reclaim space: OPTIMIZE TABLE mytable; for InnoDB will rebuild the table and free space (expensive). Alternatively, ALTER TABLE ... FORCE to rebuild. For large tables, the create-new-table-swap approach yields better control.
  1. Estimate time & test
  • Test your chosen approach on a copy or a small slice and measure time taken per batch, then extrapolate.

Example recommended plan (practical, low-risk)

  1. Take a backup / snapshot.
  2. If table is partitionable and deletions map to partitions → drop partition.
  3. Else if you are deleting >30–40% of rows → create new table with keep rows and swap.
  4. Else → use chunked DELETE with LIMIT 10k and a 0.2s pause, monitor server and adjust.
  5. After completion: ANALYZE TABLE and then OPTIMIZE TABLE (or rebuild indexes) during low traffic.

Quick cheat-sheet commands

Chunked delete example:

-- run repeatedly until 0 rows deleted
DELETE FROM mydb.mytable
WHERE created_at < '2024-01-01'
ORDER BY id
LIMIT 10000;

Copy & swap example:

CREATE TABLE mytable_new LIKE mytable;
INSERT INTO mytable_new SELECT * FROM mytable WHERE created_at >= '2024-01-01';
RENAME TABLE mytable TO mytable_old, mytable_new TO mytable;
DROP TABLE mytable_old;

Partition drop example:

ALTER TABLE mytable DROP PARTITION p_old;