How does SQL server work out the estimated number of rows?

后端 未结 4 1944
有刺的猬
有刺的猬 2020-12-20 23:19

I\'m trying to debug a fairly complex stored procedure that joins across many tabls (10-11). I\'m seeing that for a part of the tree the estimated number of rows drasticly d

4条回答
  •  独厮守ぢ
    2020-12-20 23:58

    SQL Server splits each index into up to 200 ranges with the following data (from here):

    • RANGE_HI_KEY

      A key value showing the upper boundary of a histogram step.

    • RANGE_ROWS

      Specifies how many rows are inside the range (they are smaller than this RANGE_HI_KEY, but bigger than the previous smaller RANGE_HI_KEY).

    • EQ_ROWS

      Specifies how many rows are exactly equal to RANGE_HI_KEY.

    • AVG_RANGE_ROWS

      Average number of rows per distinct value inside the range.

    • DISTINCT_RANGE_ROWS

      Specifies how many distinct key values are inside this range (not including the previous key before RANGE_HI_KEY and RANGE_HI_KEY itself);

    Usually, most populated values go into RANGE_HI_KEY.

    However, they can get into the range and this can lead to the skew in distribution.

    Imagine these data (among the others):

    Key value Count of rows

    1          1
    2          1
    3          10000
    4          1
    

    SQL Server usually builds two ranges: 1 to 3 and 4 to the next populated value, which makes these statistics:

    RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
    3             2           10000    1               2
    

    , which means the when searching for, say, 2, there is but 1 row and it's better to use the index access.

    But if 3 goes inside the range, the statistics are these:

    RANGE_HI_KEY  RANGE_ROWS  EQ_ROWS  AVG_RANGE_ROWS  DISTINCT_RANGE_ROWS
    4             10002       1        3334            3
    

    The optimizer thinks there are 3334 rows for the key 2 and index access is too expensive.

提交回复
热议问题