I recently found and fixed a bug in a site I was working on that resulted in millions of duplicate rows of data in a table that will be quite large even without them (still
For us, the DELETE WHERE %s ORDER BY %s LIMIT %d answer was not an option, because the WHERE criteria was slow (a non-indexed column), and would hit master.
SELECT from a read-replica a list of primary keys that you wish to delete. Export with this kind of format:
00669163-4514-4B50-B6E9-50BA232CA5EB
00679DE5-7659-4CD4-A919-6426A2831F35
Use the following bash script to grab this input and chunk it into DELETE statements [requires bash ≥ 4 because of mapfile built-in]:
sql-chunker.sh (remember to chmod +x me, and change the shebang to point to your bash 4 executable):
#!/usr/local/Cellar/bash/4.4.12/bin/bash
# Expected input format:
: <
Invoke like so:
./sql-chunker.sh 1000 ids.txt > batch_1000.sql
This will give you a file with output formatted like so (I've used a batch size of 2):
DELETE FROM my_cool_table WHERE id IN ('006CC671-655A-432E-9164-D3C64191EDCE','006CD163-794A-4C3E-8206-D05D1A5EE01E');
DELETE FROM my_cool_table WHERE id IN ('006CD837-F1AD-4CCA-82A4-74356580CEBC','006CDA35-F132-4F2C-8054-0F1D6709388A');
Then execute the statements like so:
mysql --login-path=master billing < batch_1000.sql
For those unfamiliar with login-path, it's just a shortcut to login without typing password in the command line.