MySQL query optimization of LIKE term% ORDER BY int

白昼怎懂夜的黑 提交于 2019-12-20 02:26:20

问题


My question is regarding the handling of MySQL index on VARCHAR combined with an int COLUMN when using prefix matching. e.g. if I have such a query:

SELECT * FROM tbl WHERE name LIKE 'query%' ORDER BY weight DESC LIMIT 5

Considering I have one index one name->weight, does that index need to find all apperances of the prefix query and then ORDER BY, or does he keeps the cross calculation indexed even with the use of prefix matching (%). I'm troubled by it, because for popular names (e.g. query=john) I might find myself searching for a long time for all appearances of john, and that will make the limit useless and the query to become slow as I'm dealing with a large dataset.


回答1:


You asked another question "Creating an Index that is best for wildcard search through 40Million names". Okay, you have 40 million records.

Now consider following formula:

x = COUNT(DISTINCT values in a column) / COUNT(values in a column)

An index on a column is much the better, the nearer x is to 1. If it is 1, all values are distinct, there are no duplicates and an index is therefore quite fast.

Now you are looking for 'john%'. That's 4 letters and an open end. Which letters is not important, your DB has to deal with 26*26*26*26=456976 distinct values. Put that in above formula and your 40 million records. You get an x of 0,0114244.

I don't know what's the threshold again, but IIRC it's 0,1 or something. So, if you're x is above 0,1 the index is used, if it's lower, it's not.

Why is that so? Using an index can even slow things down, cause your DB has to look at the index, see in that index on which position on your physical hard drive the appropriate record is and then get that record. Therefore, when x is below 10% it's faster just to do a whole table scan.

To summarize: Filtering 40 million records with only one weak index like yours is simply useless.




回答2:


Provided that 'query' is of equal or shorter length than the indexed prefix of name:

  1. A composite BTREE index on (name, weight) will be ordered by name then weight. Conceptually:

    +---------+--------+---------+
    | name(7) | weight | address |
    +---------+--------+---------+
    | queryaa |    500 | 0x1.... |
    | queryaa |    500 | 0xe.... |
    | queryaa |    498 | 0x8.... |
    | queryaa |    491 | 0xb.... |
    | queryaa |    486 | 0xc.... |
    | queryaa |    430 | 0x3.... |
    | queryab |    600 | 0x2.... |
    | queryab |    592 | 0x7.... |
    | queryab |    550 | 0x4.... |
    | queryab |    321 | 0xa.... |
    | queryab |    321 | 0x6.... |
    | queryab |    304 | 0x9.... |
    | queryab |    297 | 0x5.... |
    | querybc |    800 | 0xd.... |
    :         :        :         :
    
  2. MySQL can very quickly traverse such an index to find the top 5 weights for each indexed prefix within the range defined by the filter name LIKE 'query%' (I'm not certain that it does this step, but I'd be surprised if it did not):

    +---------+--------+---------+
    | name(7) | weight | address |
    +---------+--------+---------+
    | queryaa |    500 | 0x1.... |
    | queryaa |    500 | 0xe.... |
    | queryaa |    498 | 0x8.... |
    | queryaa |    491 | 0xb.... |
    | queryaa |    486 | 0xc.... |
    | queryab |    600 | 0x2.... |
    | queryab |    592 | 0x7.... |
    | queryab |    550 | 0x4.... |
    | queryab |    321 | 0xa.... |
    | queryab |    321 | 0x6.... |
    | querybc |    800 | 0xd.... |
    :         :        :         :
    
  3. At this point, MySQL must perform a filesort on the results:

    +---------+--------+---------+
    | name(7) | weight | address |
    +---------+--------+---------+
    | querybc |    800 | 0xd.... |
    | queryab |    600 | 0x2.... |
    | queryab |    592 | 0x7.... |
    | queryab |    550 | 0x4.... |
    | queryaa |    500 | 0x1.... |
    | queryaa |    500 | 0xe.... |
    | queryaa |    498 | 0x8.... |
    | queryaa |    491 | 0xb.... |
    | queryaa |    486 | 0xc.... |
    | queryab |    321 | 0xa.... |
    | queryab |    321 | 0x6.... |
    :         :        :         :
    
  4. And only then can it use the top 5 results to fetch the associated records from the table:

    +---------+--------+---------+
    | name(7) | weight | address |
    +---------+--------+---------+
    | querybc |    800 | 0xd.... |  --> fetch from table
    | queryab |    600 | 0x2.... |  --> fetch from table
    | queryab |    592 | 0x7.... |  --> fetch from table
    | queryab |    550 | 0x4.... |  --> fetch from table
    | queryaa |    500 | 0x1.... |  --> fetch from table
    +---------+--------+---------+
    

If 'query' is longer than the indexed prefix of name, then MySQL must perform lookups into the table in step 1 above in order to adequately filter the records which are subsequently ordered.



来源:https://stackoverflow.com/questions/12296258/mysql-query-optimization-of-like-term-order-by-int

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!