问题
I have a table in MySQL (50 million rows) new data keep inserting periodically.
This table has following structure
CREATE TABLE values (
id double NOT NULL AUTO_INCREMENT,
channel_id int(11) NOT NULL,
val text NOT NULL,
date_time datetime NOT NULL,
PRIMARY KEY (id),
KEY channel_date_index (channel_id,date_time)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Two rows must never have duplicate channel_id and date_time, but if such insert occurs it is important to keep the newest value.
Is there a procedure to check for duplicates realtime before the insert or should I keep inserting all data while doing periodic checks for duplicity in a different cycle.
Realtime speed is important here, because 100 inserts occur per second.
回答1:
To prevent future duplicates:
- Change
KEY channel_date_index (channel_id,date_time)
toUNIQUE (channel_id,date_time)
- Change the
INSERT
toINSERT ... ON DUPLICATE KEY UPDATE ...
to change the timestamp when that pair exists.
To fix the existing table, you could do ALTER IGNORE TABLE ... ADD UNIQUE(...)
. However that would not give you the latest timestamps.
For minimum downtime (not maximum speed), use pt-online-schema-change.
来源:https://stackoverflow.com/questions/29350826/what-is-the-fastest-procedure-to-remove-duplicates-from-a-big-table-in-mysql