A SQL query searching for rows that satisfy Column1 <= X <= Column2 is very slow

后端 未结 12 1401
盖世英雄少女心
盖世英雄少女心 2021-01-11 16:27

I am using a MySQL DB, and have the following table:

CREATE TABLE SomeTable (
  PrimaryKeyCol BIGINT(20) NOT NULL,
  A BIGINT(20) NOT NULL,
  FirstX INT(11) N         


        
12条回答
  •  甜味超标
    2021-01-11 17:06

    Indexes will not help you in this scenario, except for a small percentage of all possible values of X.

    Lets say for example that:

    • FirstX contains values from 1 to 1000 evenly distributed
    • LastX contains values from 1 to 1042 evenly distributed

    And you have following indexes:

    1. FirstX, LastX,
    2. LastX, FirstX,

    Now:

    • If X is 50 the clause FirstX <= 50 matches approximately 5% rows while LastX >= 50 matches approximately 95% rows. MySQL will use the first index.

    • If X is 990 the clause FirstX <= 990 matches approximately 99% rows while LastX >= 990 matches approximately 5% rows. MySQL will use the second index.

    • Any X between these two will cause MySQL to not use either index (I don't know the exact threshold but 5% worked in my tests). Even if MySQL uses the index, there are just too many matches and the index will most likely be used for covering instead of seeking.

    Your solution is the best. What you are doing is defining upper and lower bound of "range" search:

    WHERE FirstX <= 500      -- 500 is the middle (worst case) value
    AND   FirstX >= 500 - 42 -- range matches approximately 4.3% rows
    AND   ...
    

    In theory, this should work even if you search FirstX for values in the middle. Having said that, you got lucky with 4200000 value; possibly because the maximum difference between first and last is a smaller percentage.


    If it helps, you can do the following after loading the data:

    ALTER TABLE testdata ADD COLUMN delta INT NOT NULL;
    UPDATE testdata SET delta = LastX - FirstX;
    ALTER TABLE testdata ADD INDEX delta (delta);
    

    This makes selecting MAX(LastX - FirstX) easier.


    I tested MySQL SPATIAL INDEXES which could be used in this scenario. Unfortunately I found that spatial indexes were slower and have many constraints.

提交回复
热议问题