Why is LEFT JOIN slower than INNER JOIN?

ぃ、小莉子 提交于 2019-12-22 06:49:23

问题


I have two queries, the first one (inner join) is super fast, and the second one (left join) is super slow. How do I make the second query fast?

EXPLAIN SELECT saved.email FROM saved INNER JOIN finished ON finished.email = saved.email;

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  finished    index   NULL    email   258 NULL    32168   Using index
1   SIMPLE  saved   ref email   email   383 func    1   Using where; Using index

EXPLAIN SELECT saved.email FROM saved LEFT JOIN finished ON finished.email = saved.email;

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  saved   index   NULL    email   383 NULL    40971   Using index
1   SIMPLE  finishedindex   NULL    email   258 NULL    32168   Using index

Edit: I have added table info for both tables down below.

CREATE TABLE `saved` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `slug` varchar(255) DEFAULT NULL,
  `email` varchar(127) NOT NULL,
  [omitted fields include varchar, text, longtext, int],
  PRIMARY KEY (`id`),
  KEY `slug` (`slug`),
  KEY `email` (`email`)
) ENGINE=MyISAM AUTO_INCREMENT=56329 DEFAULT CHARSET=utf8;

CREATE TABLE `finished` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `slug` varchar(255) DEFAULT NULL,
  `submitted` int(11) DEFAULT NULL,
  `status` int(1) DEFAULT '0',
  `name` varchar(255) DEFAULT NULL,
  `email` varchar(255) DEFAULT NULL,
  [omitted fields include varchar, text, longtext, int],
  PRIMARY KEY (`id`),
  KEY `assigned_user_id` (`assigned_user_id`),
  KEY `event_id` (`event_id`),
  KEY `slug` (`slug`),
  KEY `email` (`email`),
  KEY `city_id` (`city_id`),
  KEY `status` (`status`),
  KEY `recommend` (`recommend`),
  KEY `pending_user_id` (`pending_user_id`),
  KEY `submitted` (`submitted`)
) ENGINE=MyISAM AUTO_INCREMENT=33063 DEFAULT CHARSET=latin1;

回答1:


With INNER JOIN, MySQL generally will start with the table with the smallest number of rows. In this case, it starts with table finished and does a look up for the corresponding record in saved using the index on saved.email.

For a LEFT JOIN, (excluding some optimizations) MySQL generally joins the records in order (starting with the left most table). In this case, MySQL starts with the table saved, then attempts to find each corresponding record in finished. Since there is no usable index on finished.email, it must do a full scan for each look up.

Edit

Now that you posted your schema, I can see that MySQL is ignoring the index (finished.email) when going from utf8 to latin1 character set. You've not posted the character sets and collations for each column, so I'm going by the default character set for the table. The collations must be compatible in order for MySQL to use the index.

MySQL can coerce (upgrade) a latin1 collation, which is very limited, up to a utf8 collation such as unicode_ci (so the first query can use the index on saved.email by upgrading latin1 collation to utf8), but the opposite is not true (the second query can't use the index on finished.email since it can't downgrade a utf8 collation down to latin1).

The solution is to change both email columns to a compatible collation, perhaps most easily by making them identical character sets and collations.




回答2:


The LEFT JOIN query is slower than the INNER JOIN query because it's doing more work.

From the EXPLAIN output, it looks like MySQL is doing nested loop join. (There's nothing wrong with nested loops; I think that's the only join operation that MySQL uses in version 5.5 and earlier.)

For the INNER JOIN query, MySQL is using an efficient "ref" (index lookup) operation to locate the matching rows.

But for the LEFT JOIN query, it looks like MySQL is doing a full scan of the index to find the matching rows. So, with the nested loops join operation, MySQL is doing a full index scan scan for each row from the other table. So, that's on the order of tens of thousands of scans, and each of those scans is inspecting tens of thousands of rows.

Using the estimated row counts from the EXPLAIN output, that's going to require (40971*32168=) 1,317,955,128 string comparisons.

The INNER JOIN query avoids a lot of that work, so it's a lot faster. (It's avoiding all those string comparisons by using an index operation.

-- LEFT JOIN
id select table    type   key   key_len ref    rows  Extra
-- ------ -------- -----  ----- ------- ----  -----  ------------------------
1  SIMPLE saved    index  email     383 NULL  40971  Using index
1  SIMPLE finished index  email     258 NULL  32168  Using index

-- INNER JOIN 
id select table    type   key   key_len ref    rows  Extra
-- ------ -------- -----  ----- ------- ----  -----  ------------------------  
1  SIMPLE finished index  email     258 NULL  32168  Using index
1  SIMPLE saved    ref    email     383 func      1  Using where; Using index
                   ^^^^^                ^^^^  ^^^^^  ^^^^^^^^^^^^

NOTE: Markus Adams spied the difference in characterset in the email columns CREATE TABLE statements that were added to your question.

I believe that it's the difference in the characterset that's preventing MySQL from using an index for your query.


Q2: How do I make the LEFT JOIN query faster?

A: I don't believe it's going to be possible to get that specific query to run faster, without a schema change, such as changing the characterset of the two email columns to match.

The only affect that the "outer join" to the finished table looks like it is to produce "duplicate" rows whenever more than one matching row is found. I'm not understanding why the outer join is needed. Why not just get rid of it altogether, and just do:

SELECT saved.email FROM saved



回答3:


I'm afraid more info will probably be needed.

However, inner joins eliminate any item that has a null foreign key (no match, if you will). This means that there are less rows to scan to associate.

For a left join however, any non-match needs to be given a blank row, so all of the rows are scanned regardless -- nothing can be eliminated.

This makes the data set larger and requires more resources to process. Also, when you write your select, don't do select * -- instead, explicitly state which columns you want.




回答4:


The data types of saved.email and finished.email differ in two respects. First, they have different lengths. Second, finished.email can be NULL. So, your LEFT JOIN operation can't exploit the index on finished.email.

Can you change the definition of finished.email to this, so it matches the field you're joining it with?

`email` varchar(127) NOT NULL

If you do you'll probably get a speedup.



来源:https://stackoverflow.com/questions/27787269/why-is-left-join-slower-than-inner-join

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!