MySQL index cardinality - performance vs storage efficiency

后端 未结 1 1905
孤街浪徒
孤街浪徒 2020-12-23 14:47

Say you have a MySQL 5.0 MyISAM table with 100 million rows, with one index (other than primary key) on two integer columns.

From my admittedly poor understanding of

相关标签:
1条回答
  • 2020-12-23 15:40

    Whereas a higher cardinality means less efficient storage, but faster read performance, because it has to navigate through less branches to get to whatever data it is looking for to narrow down the rows for the query.

    Higher cardinality means better read performance because, by definition, there are fewer records to read.

    To process a query like this:

    SELECT  *
    FROM    mytable
    WHERE   indexed_col = @myvalue
    

    , the engine should do the following steps:

    1. Find the first entry satisfying the condition.

      This is done traversing the B-Tree, starting from the root entry.

      Across the pages, the search is performed by following B-Tree links; within a page, the search is performed using binary search (unless your keys are compressed, in which case it's a linear search).

      This algorithm same efficiency for both high cardinality and low cardinality columns. Finding the first 3 (as opposed to any 3) in these lists:

      1  2  3  4  5  6  7  8  9  10
      
      3  3  3  3  3  3  3  3  4  4
      

      requires same O(log(n)) steps.

    2. Traversing the index until the key value changes. This, of course, requires linear time: the more records you have, the more you need to traverse.

    If you only need the first record:

    SELECT  *
    FROM    mytable
    WHERE   indexed_col = @myvalue
    LIMIT 1
    

    , the column cardinality does not affect read performance.

    How does cardinality affect write performance?

    Each index key has a hidden additional value: a record pointer. This is the whole point of having an index: you need to know which record does it point to.

    Since a record pointer, by definition, is unique, each index key is unique too. The index entries sharing the same key value are sorted by the record pointer.

    This is to make the index maintainable: if you delete a record with a value of an indexed column shared by a million of other records, the corresponding index record should be deleted too. But the whole million of the index records is not being looked through: instead, the record pointer is used as an additional search condition.

    Each index key is in fact unique (even if you don't define the index as unique), and, hence, has maximum cardinality possible.

    So the answer to your questions is: no, the column cardinality does not affect the index write performance.

    0 讨论(0)
提交回复
热议问题