Why is my multi-column query dramatically slower than the corresponding single-column queries, even with a multi-column index?

有些话、适合烂在心里 提交于 2019-12-23 03:25:14

问题


I have the following query:

SELECT * 
from stop_times 
WHERE (departure_time BETWEEN '02:41' AND '05:41' 
       OR departure_time BETWEEN '26:41' AND '29:41') 
    AND stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)

that returns 134 rows in ~800ms. If I split it:

SELECT * 
from stop_times 
WHERE (departure_time BETWEEN '02:41' AND '05:41' 
       OR departure_time BETWEEN '26:41' AND '29:41')

returns ~110k rows in ~10ms and

SELECT * 
from stop_times 
WHERE stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)

returns ~5k rows in ~100ms.

I tried using both a multi-column index (departure_time and stop_id) as well as 2 separate indexes, but in either case the first query can't seem to take less than ~800ms. My stop_times table has about 3.5M rows. Is there anything I could be missing and that would significantly speed up that first query?

UPDATE 1: SHOW TABLE CREATE:

CREATE TABLE `stop_times` (
  `trip_id` varchar(20) DEFAULT NULL,
  `departure_time` time DEFAULT NULL,
  `stop_id` varchar(20) DEFAULT NULL,
  KEY `index_stop_times_on_trip_id` (`trip_id`),
  KEY `index_stop_times_on_departure_time_and_stop_id` (`departure_time`,`stop_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

stop_id and trip_id being varchars instead of integers is beyond my control unfortunately...

UPDATE 2: EXPLAIN for departure_time, stop_id multi-column index:

select_type: SIMPLE
type: range
rows: 239084

EXPLAIN for stop_id, departure_time multi-column index:

select_type: SIMPLE
type: range
rows: 141

UPDATE 3: EXPLAIN for IN(51511,51509,51508,51510,6,53851,51522,51533)

select_type: SIMPLE
type: ALL
rows: 3556973 (lol)

EXPLAIN for IN("51511","51509","51508","51510","6","53851","51522","51533")

select_type: SIMPLE
type: range
rows: 141

回答1:


Did you create an index stop_id, departure_time? Because departure_time, stop_id will do absolutely nothing.

This is a really hard one - it has every possible bad thing for dealing with indexes :(

You have a range, an OR and a non contiguous IN - it doesn't get worse than that.

Try stop_id, departure_time and if it doesn't help then there is nothing much you can do short of switching to PostgreSQL.


You can also try rewriting the query as:

SELECT * 
from stop_times 
WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
      AND departure_time BETWEEN '02:41' AND '05:41'
      )
   OR ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
      AND departure_time BETWEEN '26:41' AND '29:41' 
      ) 

or:

    SELECT * 
    from stop_times 
    WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
          AND departure_time BETWEEN '02:41' AND '05:41'
          )
UNION ALL
    SELECT * 
    from stop_times 
    WHERE ( stop_times.stop_id IN(51511,51509,51508,51510,6,53851,51522,51533)
          AND departure_time BETWEEN '26:41' AND '29:41' 
          )



回答2:


There is one possibility you could try, which is to prepare a list of all the times that occur within both ranges first, and then stick them together in a large IN clause - it may look horrible, but it will remove the OR condition which isn't helping your query... And you should be able to build the IN string using your favourite programming language :)

WHERE departure_time IN ('02:41','02:42','02:43', ... '26:41','26:42','26:43', ... etc )

Your query contains two blocks of three hours, which equates to 6 * 60 = 360 entries in the IN clause...

Worth a try at least...



来源:https://stackoverflow.com/questions/8223765/why-is-my-multi-column-query-dramatically-slower-than-the-corresponding-single-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!