Optimise comparing data in two big MySQL tables

南笙酒味 提交于 2020-01-16 08:40:09

问题


How could I optimise query, which will find all records, which:

  • have activation_request.date_confirmed not null

and

  • do not have related string value in another table: activation_request.email = user.username shouldn't return any record

I tried:

SELECT  email 
FROM activation_request l 
    LEFT JOIN user r ON r.username = l.email 
WHERE l.date_confirmed is not null 
AND r.username IS NULL

and

SELECT email 
FROM  activation_request 
WHERE  date_confirmed is not null 
AND NOT EXISTS (SELECT 1 
                FROM user  
                WHERE  user.username = activation_request.email
                )

but both tables have xxx.xxx.xxx records hence after all night running those queries unfortunatelly I haven't got any results.

Create statements:

CREATE TABLE `activation_request` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `version` bigint(20) NOT NULL,
  `date_confirmed` datetime DEFAULT NULL,
  `email` varchar(255) NOT NULL,
  (...)
  PRIMARY KEY (`id`),
  KEY `emailIdx` (`email`),
  KEY `reminderSentIdx` (`date_reminder_sent`),
  KEY `idx_resent_needed` (`date_reminder_sent`,`date_confirmed`),
) ENGINE=InnoDB AUTO_INCREMENT=103011867 DEFAULT CHARSET=utf8;




CREATE TABLE `user` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `version` bigint(20) NOT NULL,
  `username` varchar(255) NOT NULL,
  (...)
  PRIMARY KEY (`id`),
  UNIQUE KEY `Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb` (`username`),
) ENGINE=InnoDB AUTO_INCREMENT=431400048 DEFAULT CHARSET=latin1;

Explain for LEFT JOIN:

[[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]] [[id:1, select_type:SIMPLE, table:l, type:ALL, possible_keys:null, key:null, key_len:null, ref:null, rows:49148965, Extra:Using where], [id:1, select_type:SIMPLE, table:r, type:index, possible_keys:null, key:Q52plW9W7TJWZcLj00K3FmuhwMSw4F7vmxJGyjxz5iiINVR9fXyacEoq4rHppb, key_len:257, ref:null, rows:266045508, Extra:Using where; Not exists; Using index; Using join buffer (Block Nested Loop)]]

After adding indexes on staging db (with slightly less data, but the same structure) query is now running ~24h and still no results):

$ show processlist;

| Id | User    | Host                                            | db       | Command | Time   | State        | Info 
| 64 | root    | localhost                                       | staging_db   | Query   | 110072 | Sending data | SELECT ar.email FROM  activation_request ar WHERE ar.date_confirmed is not null AND NOT EXISTS (SELE |

Mysql version:

$ select version();
5.6.16-1~exp1

All other commands on the list are Sleep so there is no other query running and possibly disturbing/locking rows.


回答1:


For this query:

SELECT ar.email 
FROM  activation_request ar
WHERE ar.date_confirmed is not null AND
      NOT EXISTS (SELECT 1 
                  FROM user u
                  WHERE u.username = ar.email
                 )

I would recommend indexes on activation_request(date_confirmed, email) and user(username).

Unless you have a really humongous amount of data, though, your problem may be that tables are locked.



来源:https://stackoverflow.com/questions/59669491/optimise-comparing-data-in-two-big-mysql-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!