mysql not using index?

后端 未结 2 872
终归单人心
终归单人心 2020-12-19 23:25

I have a table with columns like word, A_, E_, U_ .. these columns with X_ are tinyints having the value of how many times the specific letter exists in the word (to later h

2条回答
  •  清歌不尽
    2020-12-20 00:23

    You're asking about the backend query optimizer. In particular you're asking: "how does it choose an access path? Why index here but tablescan there?"

    Let's think about that optimizer. What is it optimizing? Elapsed time, in expectation. It has a model for how long sequential reads and random reads take, and for query selectivity, that is, expected number of rows returned by a query. From several alternative access paths it chooses the one that appears to require the least elapsed time.

    Your id > 250000 query had a few things going for it:

    1. good selectivity, so less than 1% of rows will appear in the result set
    2. id is the Primary Key, so all columns are immediately available upon navigating to the right place in the btree

    This caused the optimizer to compute an expected elapsed time for the indexed access path much smaller than expected time for tablescan.

    On the other hand, your u_ > 0 query has very poor selectivity, dragging nearly a quarter of the rows into the result set. Additionally, the index is not a covering index for your * demand of copying all column values into the result set. So the optimizer predicts it will have to read a quarter of the index blocks, and then essentially all of the data row blocks that they point to. So compared to tablescan, we'd have to read more blocks from disk, and they would be random reads instead of sequential reads. Both of those argue against using the index, so tablescan was selected because it was cheapest. Also, remember that often multiple rows will fit within a single disk block, or within a single read request. We would call it a pessimizer if it always chose the indexed access path, even in cases where indexed disk I/O would take longer.

    summary advice

    Use an index on a single column when your queries have good selectivity, returning much less than 1% of a relation's rows. Use a covering index when your queries have poor selectivity and you're willing to make a space vs. time tradeoff.

提交回复
热议问题