问题
I am using MariaDB 10.0.21 and running a query similar to the following query against 12 Million Rows:
SELECT
`primary_key`
FROM
`texas_parcels`
WHERE
`zip_code`
IN ('28461', '48227', '60411', '65802', '75215', '75440', '75773', '75783', '76501', '76502', '76504', '76511', '76513', '76518', '76519', '76520', '76522', '76525', '76527', '76528', '76530', '76537', '76539', '76541', '76542', '76543', '76548', '76549', '76550', '76556', '76567', '76571', '76574', '76577', '76578', '76642', '76704', '76853', '77418', '77434', '77474', '77833', '77835', '77836', '77845', '77853', '77879', '77964', '77975', '78002', '78003', '78006', '78013', '78028', '78056', '78064', '78070', '78114', '78123', '78130', '78132', '78133', '78154', '78155', '78359', '78382', '78602', '78605', '78606', '78607', '78608', '78609', '78610', '78611', '78612', '78613', '78614', '78615', '78616', '78617', '78619', '78620', '78621', '78623', '78624', '78626', '78628', '78629', '78632', '78633', '78634', '78636', '78638', '78639', '78640', '78641', '78642', '78643', '78644', '78645', '78648', '78650', '78652', '78653', '78654', '78655', '78656', '78657', '78659', '78660', '78662', '78663', '78664', '78665', '78666', '78669', '78672', '78676', '78681', '78699', '78701', '78702', '78703', '78704', '78705', '78717', '78719', '78721', '78722', '78723', '78724', '78725', '78726', '78727', '78728', '78729', '78730', '78731', '78732', '78733', '78734', '78735', '78736', '78737', '78738', '78739', '78741', '78744', '78745', '78746', '78747', '78748', '78749', '78750', '78751', '78752', '78753', '78754', '78756', '78757', '78758', '78759', '78828', '78934', '78940', '78941', '78942', '78945', '78946', '78947', '78948', '78953', '78954', '78956', '78957', '78963', '92536')
ORDER BY
`timestamp_updated` ASC
LIMIT 1000;
I have a compound index on (zip_code,timestamp_updated)
, I obtain results in ~1.6 Seconds. In the next query I am still seraching the same zipcodes but I am using OR instead of IN().
SELECT
`primary_key`
FROM
`texas_parcels`
WHERE
(`zip_code` = '28461' OR `zip_code` = '48227' OR `zip_code` = '60411' OR `zip_code` = '65802' OR `zip_code` = '75215' OR `zip_code` = '75440' OR `zip_code` = '75773' OR `zip_code` = '75783' OR `zip_code` = '76501' OR `zip_code` = '76502' OR `zip_code` = '76504' OR `zip_code` = '76511' OR `zip_code` = '78957' OR `zip_code` = '78963' OR `zip_code` = '92536' OR `zip_code` = '76513' OR `zip_code` = '76518' OR `zip_code` = '76519' OR `zip_code` = '76520' OR `zip_code` = '76522' OR `zip_code` = '76525' OR `zip_code` = '76527' OR `zip_code` = '76528' OR `zip_code` = '76530' OR `zip_code` = '76537' OR `zip_code` = '76539' OR `zip_code` = '76541' OR `zip_code` = '76542' OR `zip_code` = '76543' OR `zip_code` = '76548' OR `zip_code` = '76549' OR `zip_code` = '76550' OR `zip_code` = '76556' OR `zip_code` = '76567' OR `zip_code` = '76571' OR `zip_code` = '76574' OR `zip_code` = '76577' OR `zip_code` = '76578' OR `zip_code` = '76642' OR `zip_code` = '76704' OR `zip_code` = '76853' OR `zip_code` = '77418' OR `zip_code` = '77434' OR `zip_code` = '77474' OR `zip_code` = '77833' OR `zip_code` = '77835' OR `zip_code` = '77836' OR `zip_code` = '77845' OR `zip_code` = '77853' OR `zip_code` = '77879' OR `zip_code` = '77964' OR `zip_code` = '77975' OR `zip_code` = '78002' OR `zip_code` = '78003' OR `zip_code` = '78006' OR `zip_code` = '78013' OR `zip_code` = '78028' OR `zip_code` = '78056' OR `zip_code` = '78064' OR `zip_code` = '78070' OR `zip_code` = '78114' OR `zip_code` = '78123' OR `zip_code` = '78130' OR `zip_code` = '78132' OR `zip_code` = '78133' OR `zip_code` = '78154' OR `zip_code` = '78155' OR `zip_code` = '78359' OR `zip_code` = '78382' OR `zip_code` = '78602' OR `zip_code` = '78605' OR `zip_code` = '78606' OR `zip_code` = '78607' OR `zip_code` = '78608' OR `zip_code` = '78609' OR `zip_code` = '78610' OR `zip_code` = '78611' OR `zip_code` = '78612' OR `zip_code` = '78613' OR `zip_code` = '78614' OR `zip_code` = '78615' OR `zip_code` = '78616' OR `zip_code` = '78617' OR `zip_code` = '78619' OR `zip_code` = '78620' OR `zip_code` = '78621' OR `zip_code` = '78623' OR `zip_code` = '78624' OR `zip_code` = '78626' OR `zip_code` = '78628' OR `zip_code` = '78629' OR `zip_code` = '78632' OR `zip_code` = '78633' OR `zip_code` = '78634' OR `zip_code` = '78636' OR `zip_code` = '78638' OR `zip_code` = '78639' OR `zip_code` = '78640' OR `zip_code` = '78641' OR `zip_code` = '78642' OR `zip_code` = '78643' OR `zip_code` = '78644' OR `zip_code` = '78645' OR `zip_code` = '78648' OR `zip_code` = '78650' OR `zip_code` = '78652' OR `zip_code` = '78653' OR `zip_code` = '78654' OR `zip_code` = '78655' OR `zip_code` = '78656' OR `zip_code` = '78657' OR `zip_code` = '78659' OR `zip_code` = '78660' OR `zip_code` = '78662' OR `zip_code` = '78663' OR `zip_code` = '78664' OR `zip_code` = '78665' OR `zip_code` = '78666' OR `zip_code` = '78669' OR `zip_code` = '78672' OR `zip_code` = '78676' OR `zip_code` = '78681' OR `zip_code` = '78699' OR `zip_code` = '78701' OR `zip_code` = '78702' OR `zip_code` = '78703' OR `zip_code` = '78704' OR `zip_code` = '78705' OR `zip_code` = '78717' OR `zip_code` = '78719' OR `zip_code` = '78721' OR `zip_code` = '78722' OR `zip_code` = '78723' OR `zip_code` = '78724' OR `zip_code` = '78725' OR `zip_code` = '78726' OR `zip_code` = '78727' OR `zip_code` = '78728' OR `zip_code` = '78729' OR `zip_code` = '78730' OR `zip_code` = '78731' OR `zip_code` = '78732' OR `zip_code` = '78733' OR `zip_code` = '78734' OR `zip_code` = '78735' OR `zip_code` = '78736' OR `zip_code` = '78737' OR `zip_code` = '78738' OR `zip_code` = '78739' OR `zip_code` = '78741' OR `zip_code` = '78744' OR `zip_code` = '78745' OR `zip_code` = '78746' OR `zip_code` = '78747' OR `zip_code` = '78748' OR `zip_code` = '78757' OR `zip_code` = '78758' OR `zip_code` = '78759' OR `zip_code` = '78828' OR `zip_code` = '78934' OR `zip_code` = '78940' OR `zip_code` = '78941' OR `zip_code` = '78942' OR `zip_code` = '78945' OR `zip_code` = '78946' OR `zip_code` = '78947' OR `zip_code` = '78948' OR `zip_code` = '78953' OR `zip_code` = '78954' OR `zip_code` = '78956')
ORDER BY
`timestamp_updated` ASC
LIMIT 1000;
This second query obtains the same results in the same order in ~7.8 Seconds. When running each query through an explain, they are nearly the same, they give me a slightly different rows
amount.
id select_type table type possible_keys key key_len ref rows filtered Extra
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 SIMPLE TX_Property range Zip Code Search Zip Code Search 15 (NULL) 2402699 99.88 Using where; Using index; Using filesort
2 SIMPLE TX_Property range Zip Code Search Zip Code Search 15 (NULL) 2321908 99.91 Using where; Using index; Using filesort
When Profiling the two queries the only major increase in time is the Sorting Result
time, in the second query it took up to 7.2 Seconds.
I guess what I don't understand is how the different operator in the where could make such a huge difference when it comes to order, it would make sense if there was a major difference in time for executing? Maybe I am just not exactly sure how profiling works, and it actually is just the time of executing the where portion, but it is just labeled in a convoluted way?
I also wanted to note that when I ran the queries without the ORDER BY timestamp_updated ASC
the top query took ~0.106 Seconds and the second query took ~0.157 Seconds.
回答1:
Removing the ORDER BY
runs so much faster because it can stop after only 1000 rows. How many rows match that OR/IN?
Notice that the EXPLAINs
say that the query is Using index
. That means that you have a "covering" index. That is all the fields in the SELECT
are in the one index.
In InnoDB, each secondary key implicitly includes the PK, so INDEX(zip_code, timestamp_updated)
is effectively INDEX(zip_code, timestamp_updated, primaryKey)
The index is not very efficient since you have two non-trivial things going on: (1) IN or OR, (2) ORDER BY. Only one or the other can be handled by the index. Your index lets it use zip_code
. It
- finds the rows in the index that match any of those zipcodes,
- gathers the timestamp and pk, putting the 3 columns in a tmp table
- sorts
- delivers the first 1000.
If, instead, you said INDEX(timestamp_updated, zip_code)
you would still have a 'covering' index, but in this flavor, the index would (I hope) prevent the need for the SORT. Oh, given that, it might be able to stop after 1000 rows. Here's how it will work:
- Scan through the index in timestamp order.
- Check each row for being one of those zips. (Here the test might be faster in IN format)
- If match, deliver row; if 1000, stop.
But wait... Now you are at the mercy of the 12M rows. If 1000 rows with those zips occur early (old timestamps), it can stop fast. If you need to check all the rows to find 1000 (or there aren't even 1000), then it is a full index scan, and this arrangement of the index is 'bad'.
If you give the optimizer both INDEXes
, it will dutifully make an intelligent choice based on inadequate information (no distribution of the values), and might pick the worse one.
You effectively need a 2-dimensional index. Such don't exist. (Well, maybe Spatial could be kludged?) But...
PARTITION BY RANGE(timestamp)
together with the INDEX
starting with zip might work better. But I doubt if the optimizer is smart enough to realize that if it found 1000 rows in the first partition it could quit. And it still fails badly if there aren't 1000 results.
PARTITION BY RANGE(zip)
together with the INDEX
starting with timestamp probably will not help, since that many zips won't do much pruning.
Please provide EXPLAIN FORMAT=JSON SELECT...;
for each of your attempts. There may be some subtle clues there to explain the wide time variations.
Did you run each timing twice? (Otherwise, caching may have colored the results.)
Another approach
I do not know how well this will perform, but here goes:
SELECT primary_key
FROM (
( SELECT primary_key, timestamp_updated
FROM texas_parcels
WHERE zip_code = '28461'
ORDER BY timestamp_updated
LIMIT 1000
)
UNION ALL (
SELECT primary_key, timestamp_updated
FROM texas_parcels
WHERE zip_code = '48227'
ORDER BY timestamp_updated
LIMIT 1000
)
UNION ALL (
SELECT primary_key, timestamp_updated
FROM texas_parcels
WHERE zip_code = '60411'
ORDER BY timestamp_updated
LIMIT 1000 ) ...
) x
ORDER BY timestamp_updated
LIMIT 1000
It seems like x
will have only a few hundred thousand rows, not 1.3M. But UNION
has some overhead, etc. Note the LIMIT
in each subquery and on the outside. If you need OFFSET
, too, it gets trickier.
回答2:
You have a pretty long list of zip codes that you are comparing. MySQL has an optimization that affects why the execution time without the order by
is a bit different. With a list of constants, MySQL sorts the list and does a binary search. I could see this explaining the last results.
With the order by
, I am not sure. The actual execution might be affected by other things running on the server. Do you know if anything else is running?
回答3:
The MYSQL has an optimization , your select when you use OR the number of comparisons increases.
来源:https://stackoverflow.com/questions/32256402/sql-performance-using-or-is-slower-than-in-when-using-order-by