sql server 2005 indexes and low cardinality

问题

How does SQL Server determine whether a table column has low cardinality?

The reason I ask is because query optimizer would most probably not use an index on a gender column (values 'm' and 'f'). However how would it determine the cardinality of the gender column to come to that decision?

On top of this, if in the unlikely event that I had a million entries in my table and only one entry in the gender column was 'm', would SQL server be able to determine this and use the index to retrieve that single row? Or would it just know there are only 2 distinct values in the column and not use the index?

I appreciate the above discusses some poor db design, but I'm just trying to understand how query optimizer comes to its decisions.

Many thanks.

回答1:

See Statistics Used by the Query Optimizer in Microsoft SQL Server 2005 .

With 1 value 'm' and 999999 'f' the statistics will give a cardinality estimate of 1 for 'm', and something close to 1M for 'f'. But that whether the index will be used or not, there are more factors.

In general such a low selectivity column does not make sense on an index alone. However, it does make sense as a leftmost column on a more complex index, and even as a leftmost column on the clustered index. And even if a column would make sense for 'm' and not for 'f', the query auto-parametrization may play a trick on you and generate a plan for a variable @gender instead.

You'll have to either read more or give more details. Some good resources are the QO team and team members blogs:

QO team blog
Conor Cunningham blog
Craig Friedman blog

来源：https://stackoverflow.com/questions/2615617/sql-server-2005-indexes-and-low-cardinality

标签

sql-server

indexing

cardinality