问题
My question is regarding the handling of MySQL index on VARCHAR combined with an int COLUMN when using prefix matching. e.g. if I have such a query:
SELECT * FROM tbl WHERE name LIKE 'query%' ORDER BY weight DESC LIMIT 5
Considering I have one index one name->weight, does that index need to find all apperances of the prefix query
and then ORDER BY, or does he keeps the cross calculation indexed even with the use of prefix matching (%). I'm troubled by it, because for popular names (e.g. query=john) I might find myself searching for a long time for all appearances of john, and that will make the limit useless and the query to become slow as I'm dealing with a large dataset.
回答1:
You asked another question "Creating an Index that is best for wildcard search through 40Million names". Okay, you have 40 million records.
Now consider following formula:
x = COUNT(DISTINCT values in a column) / COUNT(values in a column)
An index on a column is much the better, the nearer x
is to 1. If it is 1, all values are distinct, there are no duplicates and an index is therefore quite fast.
Now you are looking for 'john%'. That's 4 letters and an open end. Which letters is not important, your DB has to deal with 26*26*26*26=456976 distinct values. Put that in above formula and your 40 million records. You get an x
of 0,0114244.
I don't know what's the threshold again, but IIRC it's 0,1 or something. So, if you're x
is above 0,1 the index is used, if it's lower, it's not.
Why is that so? Using an index can even slow things down, cause your DB has to look at the index, see in that index on which position on your physical hard drive the appropriate record is and then get that record. Therefore, when x is below 10% it's faster just to do a whole table scan.
To summarize: Filtering 40 million records with only one weak index like yours is simply useless.
回答2:
Provided that 'query'
is of equal or shorter length than the indexed prefix of name
:
A composite
BTREE
index on(name, weight)
will be ordered byname
thenweight
. Conceptually:+---------+--------+---------+ | name(7) | weight | address | +---------+--------+---------+ | queryaa | 500 | 0x1.... | | queryaa | 500 | 0xe.... | | queryaa | 498 | 0x8.... | | queryaa | 491 | 0xb.... | | queryaa | 486 | 0xc.... | | queryaa | 430 | 0x3.... | | queryab | 600 | 0x2.... | | queryab | 592 | 0x7.... | | queryab | 550 | 0x4.... | | queryab | 321 | 0xa.... | | queryab | 321 | 0x6.... | | queryab | 304 | 0x9.... | | queryab | 297 | 0x5.... | | querybc | 800 | 0xd.... | : : : :
MySQL can very quickly traverse such an index to find the top 5 weights for each indexed prefix within the range defined by the filter
name LIKE 'query%'
(I'm not certain that it does this step, but I'd be surprised if it did not):+---------+--------+---------+ | name(7) | weight | address | +---------+--------+---------+ | queryaa | 500 | 0x1.... | | queryaa | 500 | 0xe.... | | queryaa | 498 | 0x8.... | | queryaa | 491 | 0xb.... | | queryaa | 486 | 0xc.... | | queryab | 600 | 0x2.... | | queryab | 592 | 0x7.... | | queryab | 550 | 0x4.... | | queryab | 321 | 0xa.... | | queryab | 321 | 0x6.... | | querybc | 800 | 0xd.... | : : : :
At this point, MySQL must perform a filesort on the results:
+---------+--------+---------+ | name(7) | weight | address | +---------+--------+---------+ | querybc | 800 | 0xd.... | | queryab | 600 | 0x2.... | | queryab | 592 | 0x7.... | | queryab | 550 | 0x4.... | | queryaa | 500 | 0x1.... | | queryaa | 500 | 0xe.... | | queryaa | 498 | 0x8.... | | queryaa | 491 | 0xb.... | | queryaa | 486 | 0xc.... | | queryab | 321 | 0xa.... | | queryab | 321 | 0x6.... | : : : :
And only then can it use the top 5 results to fetch the associated records from the table:
+---------+--------+---------+ | name(7) | weight | address | +---------+--------+---------+ | querybc | 800 | 0xd.... | --> fetch from table | queryab | 600 | 0x2.... | --> fetch from table | queryab | 592 | 0x7.... | --> fetch from table | queryab | 550 | 0x4.... | --> fetch from table | queryaa | 500 | 0x1.... | --> fetch from table +---------+--------+---------+
If 'query'
is longer than the indexed prefix of name
, then MySQL must perform lookups into the table in step 1 above in order to adequately filter the records which are subsequently ordered.
来源:https://stackoverflow.com/questions/12296258/mysql-query-optimization-of-like-term-order-by-int