问题
How could I optimise query, which will find all records, which:
- have activation_request.date_confirmed not null
and
- do not have related string value in another table: activation_request.email = user.username shouldn't return any record
I tried:
SELECT email
FROM activation_request l
LEFT JOIN user r ON r.username = l.email
WHERE l.date_confirmed is not null
AND r.username IS NULL
and
SELECT email
FROM activation_request
WHERE date_confirmed is not null
AND NOT EXISTS (SELECT 1
FROM user
WHERE user.username = activation_request.email
)
but both tables have xxx.xxx.xxx records hence after all night running those queries unfortunatelly I haven't got any results.
Create statements:
CREATE TABLE `activation_request` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`date_confirmed` datetime DEFAULT NULL,
`email` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
KEY `emailIdx` (`email`),
KEY `reminderSentIdx` (`date_reminder_sent`),
KEY `idx_resent_needed` (`date_reminder_sent`,`date_confirmed`),
) ENGINE=InnoDB AUTO_INCREMENT=103011867 DEFAULT CHARSET=utf8;
CREATE TABLE `user` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`version` bigint(20) NOT NULL,
`username` varchar(255) NOT NULL,
(...)
PRIMARY KEY (`id`),
UNIQUE KEY `Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb` (`username`),
) ENGINE=InnoDB AUTO_INCREMENT=431400048 DEFAULT CHARSET=latin1;
Explain for LEFT JOIN:
[[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]] [[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]]
After adding indexes on staging db (with slightly less data, but the same structure) query is now running ~24h and still no results):
$ show processlist;
| Id | User | Host | db | Command | Time | State | Info
| 64 | root | localhost | staging_db | Query | 110072 | Sending data | SELECT ar.email FROM activation_request ar WHERE ar.date_confirmed is not null AND NOT EXISTS (SELE |
Mysql version:
$ select version();
5.6.16-1~exp1
All other commands on the list are Sleep
so there is no other query running and possibly disturbing/locking rows.
回答1:
For this query:
SELECT ar.email
FROM activation_request ar
WHERE ar.date_confirmed is not null AND
NOT EXISTS (SELECT 1
FROM user u
WHERE u.username = ar.email
)
I would recommend indexes on activation_request(date_confirmed, email)
and user(username)
.
Unless you have a really humongous amount of data, though, your problem may be that tables are locked.
来源:https://stackoverflow.com/questions/59669491/optimise-comparing-data-in-two-big-mysql-tables