问题
Given the following two queries:
Query #1
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623, 204072)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Query #2 - 4 IDs instead 5
SELECT log.id
FROM log
WHERE user_id IN
(188858, 188886, 189854, 203623)
and type in (14, 15, 17)
ORDER BY log.id DESC
LIMIT 25 OFFSET 0;
Explain Plan
-- Query #1
1 SIMPLE log range idx_user_id_and_log_id idx_user_id_and_log_id 4 41280 Using index condition; Using where; Using filesort
-- Query #2
1 SIMPLE log index idx_user_id_and_log_id PRIMARY 4 53534 Using where
Why the addition of a single ID makes the execution plan so different? I'm talking about a difference in time of milliseconds to ~1 minute. I thought that it could be related to the eq_range_index_dive_limit
parameters, but it's bellow 10 anyway (the default). I know that I can force the usage of the index instead of the clustered index
, but I wanted to know why MySQL decided that.
Should I try to understand that? Or sometimes it's not possible to understand query planner decisions?
Extra Details
- Table Size: 11GB
- Rows: 108 Million
- MySQL: 5.6.7
- Doesn't matter which ID is removed from the IN clause.
- The index:
idx_user_id_and_log_id(user_id, id)
回答1:
As you have shown, MySQL has two alternative query plans for queries with ORDER BY ... LIMIT n
:
- Read all qualifying rows, sort them, and pick the n top rows.
- Read the rows in sorted order and stop when n qualifying rows have been found.
In order to decide which is the better option, the optimizer needs to estimate the filtering effect of your WHERE condition. This is not straight-forward, especially for columns that are not indexed, or for columns where values are correlated. In your case, one probably has to read a lot more of the table in sorted order in order to find the first 25 qualifying rows than what the optimizer expected.
There have been several improvements in how LIMIT queries are handled, both in later releases of 5.6 (you are running on a pre-GA release!), and in newer releases (5.7, 8.0). I suggest you try to upgrade to a later release, and see if this still is an issue.
In general, if you want to understand query planner decisions, you should look at the optimizer trace for the query.
回答2:
JOIN is much more efficient.
Create a temporary table with the values of the IN operator. Then make a JOIN between table 'log' to the temporary table of values.
Refer to this answer for more info.
回答3:
Add
INDEX(user_id, type, id),
INDEX(type, user_id, id)
Each of these is a "covering" index. As such, the entire query can be performed by looking only in one index, without touching the 'data'.
I have two choices for the Optimizer -- hopefully it will be able to pick whether user_id IN (...)
is more selective or type IN (...)
in order to pick the better index.
If, after adding those, you don't have any use for idx_user_id_and_log_id(user_id, id)
, DROP
it.
(No, I can't explain why query 2 chose to do a table scan.)
来源:https://stackoverflow.com/questions/51793915/mysql-why-5th-id-in-the-in-clause-drastically-changes-query-plan