How to choose the clustered index in SQL Server?

时光总嘲笑我的痴心妄想 提交于 2019-11-30 14:54:34

问题


Usually the clustered index is created in SQL Server Management Studio by setting the primary key, however my recent question about PK <-> clustered index (Meaning of Primary Key to Microsoft SQL Server 2008) has shown that it is not necessary to set PK and clustered index to be equal.

So how should we choose clustered indexes then? Let's have the following example:

create table Customers (ID int, ...) create table Orders (ID int, CustomerID int)

We would usually create the PK/CI on both ID columns but i thought about creating it for Orders in CustomerID. Is that the best choice?


回答1:


According to The Queen Of Indexing - Kimberly Tripp - what she looks for in a clustered index is primarily:

  • Unique
  • Narrow
  • Static

And if you can also guarantee:

  • Ever-increasing pattern

then you're pretty close to having your ideal clustering key!

Check out her entire blog post here, and another really interesting one about clustering key impacts on table operations here: The Clustered Index Debate Continues.

Anything like an INT (esp. an INT IDENTITY) or possibly an INT and a DATETIME are ideal candiates. For other reasons, GUID's aren't good candidates at all - so you might have a GUID as your PK, but don't cluster your table on it - it'll be fragmented beyond recognition and performance will suffer.




回答2:


A best candidate for a CLUSTERED index is the key you use to refer to your records most often.

Usually, this is a PRIMARY KEY, since it's what used in searches and/or FOREIGN KEY relationships.

In your case, Orders.ID will most probably participate in the searches and references, so it is the best candidate for being a clustering expression.

If you create the CLUSTERED index on Orders.CustomerID, the following things will happen:

  1. CustomerID is not unique. To ensure uniqueness, a special hidden 32-bit column known as uniquifier will be added to each record.

  2. Records in the table will be stored according to this pair of columns (CustomerID, uniquifier).

  3. A secondary index on Order.ID will be created, with (CustomerID, uniquifier) as the record pointers.

  4. Queries like this:

    SELECT  *
    FROM    Orders
    WHERE   ID = 1234567
    

    will have to do an external operation, a Clustered Seek, since not all columns are stored in the index on ID. To retrieve all columns, the record should first be located in the clustered table.

This additional operation requires IndexDepth as many page reads as a simple Clustered Seek, the IndexDepth beign O(log(n)) of total number of the records in your table.




回答3:


If you're concerned about clustering it's usually to help improve data retrieval. In you example, you're probably going to want all records for a given customer at once. Clustering on customerID will keep those rows on the same physical page rather than scattered throughout multiple pages in your file.

ROT: Cluster on what you want to show a collection of. Line items in a purchase order is the classic example.



来源:https://stackoverflow.com/questions/2267326/how-to-choose-the-clustered-index-in-sql-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!