Deleting millions of rows in MySQL

前端未结

关注

 14  750

I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still

相关标签:

14条回答

死守一世寂寞

2020-12-02 07:48
```
DELETE FROM `table`
WHERE (whatever criteria)
ORDER BY `id`
LIMIT 1000
```
Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.
0 讨论(0)
发布评论:

提交评论
- 加载中...

臣服心动

2020-12-02 07:49

For us, the DELETE WHERE %s ORDER BY %s LIMIT %d answer was not an option, because the WHERE criteria was slow (a non-indexed column), and would hit master.

SELECT from a read-replica a list of primary keys that you wish to delete. Export with this kind of format:

00669163-4514-4B50-B6E9-50BA232CA5EB
00679DE5-7659-4CD4-A919-6426A2831F35

Use the following bash script to grab this input and chunk it into DELETE statements [requires bash ≥ 4 because of mapfile built-in]:

sql-chunker.sh (remember to chmod +x me, and change the shebang to point to your bash 4 executable):

#!/usr/local/Cellar/bash/4.4.12/bin/bash

# Expected input format:
: <<!
00669163-4514-4B50-B6E9-50BA232CA5EB
00669DE5-7659-4CD4-A919-6426A2831F35
!

if [ -z "$1" ]
  then
    echo "No chunk size supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"
fi

if [ -z "$2" ]
  then
    echo "No file supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"
fi

function join_by {
    local d=$1
    shift
    echo -n "$1"
    shift
    printf "%s" "${@/#/$d}"
}

while mapfile -t -n "$1" ary && ((${#ary[@]})); do
    printf "DELETE FROM my_cool_table WHERE id IN ('%s');\n" `join_by "','" "${ary[@]}"`
done < "$2"

Invoke like so:

./sql-chunker.sh 1000 ids.txt > batch_1000.sql

This will give you a file with output formatted like so (I've used a batch size of 2):

DELETE FROM my_cool_table WHERE id IN ('006CC671-655A-432E-9164-D3C64191EDCE','006CD163-794A-4C3E-8206-D05D1A5EE01E');
DELETE FROM my_cool_table WHERE id IN ('006CD837-F1AD-4CCA-82A4-74356580CEBC','006CDA35-F132-4F2C-8054-0F1D6709388A');

Then execute the statements like so:

mysql --login-path=master billing < batch_1000.sql

For those unfamiliar with login-path, it's just a shortcut to login without typing password in the command line.

0 讨论(0)

北荒

2020-12-02 07:52

I'd also recommend adding some constraints to your table to make sure that this doesn't happen to you again. A million rows, at 1000 per shot, will take 1000 repetitions of a script to complete. If the script runs once every 3.6 seconds you'll be done in an hour. No worries. Your clients are unlikely to notice.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-12-02 07:56
Here's the recommended practice:
```
rows_affected = 0
do {
 rows_affected = do_query(
   "DELETE FROM messages WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH)
   LIMIT 10000"
 )
} while rows_affected > 0
```
Deleting 10,000 rows at a time is typically a large enough task to make each query efficient, and a short enough task to minimize the impact on the server4 (transactional storage engines might benefit from smaller transactions). It might also be a good idea to add some sleep time between the DELETE statements to spread the load over time and reduce the amount of time locks are held.

Reference MySQL High Performance
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-12-02 07:59
the following deletes 1,000,000 records, one at a time.
```
 for i in `seq 1 1000`; do 
     mysql  -e "select id from table_name where (condition) order by id desc limit 1000 " | sed 's;/|;;g' | awk '{if(NR>1)print "delete from table_name where id = ",$1,";" }' | mysql; 
 done
```
you could group them together and do delete table_name where IN (id1,id2,..idN) im sure too w/o much difficulty
0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2020-12-02 07:59

I'd use mk-archiver from the excellent Maatkit utilities package (a bunch of Perl scripts for MySQL management) Maatkit is from Baron Schwartz, the author of the O'Reilly "High Performance MySQL" book.

The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it's just an incremental DELETE.

It's already built for archiving your unwanted rows in small batches and as a bonus, it can save the deleted rows to a file in case you screw up the query that selects the rows to remove.

No installation required, just grab http://www.maatkit.org/get/mk-archiver and run perldoc on it (or read the web site) for documentation.

0 讨论(0)
发布评论:

提交评论
- 加载中...