Deleting millions of rows in MySQL

前端 未结 14 750
春和景丽
春和景丽 2020-12-02 07:05

I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still

相关标签:
14条回答
  • 2020-12-02 07:48
    DELETE FROM `table`
    WHERE (whatever criteria)
    ORDER BY `id`
    LIMIT 1000
    

    Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.

    0 讨论(0)
  • 2020-12-02 07:49

    For us, the DELETE WHERE %s ORDER BY %s LIMIT %d answer was not an option, because the WHERE criteria was slow (a non-indexed column), and would hit master.

    SELECT from a read-replica a list of primary keys that you wish to delete. Export with this kind of format:

    00669163-4514-4B50-B6E9-50BA232CA5EB
    00679DE5-7659-4CD4-A919-6426A2831F35
    

    Use the following bash script to grab this input and chunk it into DELETE statements [requires bash ≥ 4 because of mapfile built-in]:

    sql-chunker.sh (remember to chmod +x me, and change the shebang to point to your bash 4 executable):

    #!/usr/local/Cellar/bash/4.4.12/bin/bash
    
    # Expected input format:
    : <<!
    00669163-4514-4B50-B6E9-50BA232CA5EB
    00669DE5-7659-4CD4-A919-6426A2831F35
    !
    
    if [ -z "$1" ]
      then
        echo "No chunk size supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"
    fi
    
    if [ -z "$2" ]
      then
        echo "No file supplied. Invoke: ./sql-chunker.sh 1000 ids.txt"
    fi
    
    function join_by {
        local d=$1
        shift
        echo -n "$1"
        shift
        printf "%s" "${@/#/$d}"
    }
    
    while mapfile -t -n "$1" ary && ((${#ary[@]})); do
        printf "DELETE FROM my_cool_table WHERE id IN ('%s');\n" `join_by "','" "${ary[@]}"`
    done < "$2"
    

    Invoke like so:

    ./sql-chunker.sh 1000 ids.txt > batch_1000.sql
    

    This will give you a file with output formatted like so (I've used a batch size of 2):

    DELETE FROM my_cool_table WHERE id IN ('006CC671-655A-432E-9164-D3C64191EDCE','006CD163-794A-4C3E-8206-D05D1A5EE01E');
    DELETE FROM my_cool_table WHERE id IN ('006CD837-F1AD-4CCA-82A4-74356580CEBC','006CDA35-F132-4F2C-8054-0F1D6709388A');
    

    Then execute the statements like so:

    mysql --login-path=master billing < batch_1000.sql
    

    For those unfamiliar with login-path, it's just a shortcut to login without typing password in the command line.

    0 讨论(0)
  • 2020-12-02 07:52

    I'd also recommend adding some constraints to your table to make sure that this doesn't happen to you again. A million rows, at 1000 per shot, will take 1000 repetitions of a script to complete. If the script runs once every 3.6 seconds you'll be done in an hour. No worries. Your clients are unlikely to notice.

    0 讨论(0)
  • 2020-12-02 07:56

    Here's the recommended practice:

    rows_affected = 0
    do {
     rows_affected = do_query(
       "DELETE FROM messages WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH)
       LIMIT 10000"
     )
    } while rows_affected > 0
    

    Deleting 10,000 rows at a time is typically a large enough task to make each query efficient, and a short enough task to minimize the impact on the server4 (transactional storage engines might benefit from smaller transactions). It might also be a good idea to add some sleep time between the DELETE statements to spread the load over time and reduce the amount of time locks are held.

    Reference MySQL High Performance

    0 讨论(0)
  • 2020-12-02 07:59

    the following deletes 1,000,000 records, one at a time.

     for i in `seq 1 1000`; do 
         mysql  -e "select id from table_name where (condition) order by id desc limit 1000 " | sed 's;/|;;g' | awk '{if(NR>1)print "delete from table_name where id = ",$1,";" }' | mysql; 
     done
    

    you could group them together and do delete table_name where IN (id1,id2,..idN) im sure too w/o much difficulty

    0 讨论(0)
  • 2020-12-02 07:59

    I'd use mk-archiver from the excellent Maatkit utilities package (a bunch of Perl scripts for MySQL management) Maatkit is from Baron Schwartz, the author of the O'Reilly "High Performance MySQL" book.

    The goal is a low-impact, forward-only job to nibble old data out of the table without impacting OLTP queries much. You can insert the data into another table, which need not be on the same server. You can also write it to a file in a format suitable for LOAD DATA INFILE. Or you can do neither, in which case it's just an incremental DELETE.

    It's already built for archiving your unwanted rows in small batches and as a bonus, it can save the deleted rows to a file in case you screw up the query that selects the rows to remove.

    No installation required, just grab http://www.maatkit.org/get/mk-archiver and run perldoc on it (or read the web site) for documentation.

    0 讨论(0)
提交回复
热议问题