Should primary keys be always assigned as clustered index

[亡魂溺海] 提交于 2019-11-30 12:38:58

The ideal clustered index key is:

  1. Sequential
  2. Selective (no dupes, unique for each record)
  3. Narrow
  4. Used in Queries

In general it is a very bad idea to use a GUID as a clustered index key, since it leads to mucho fragmentation as rows are added.

EDIT FOR CLARITY:

PK and Clustered key are indeed separate concepts. Your PK does not need to be your clustered index key.

In practical applications in my own experience, the same field that is your PK should/would be your clustered key since it meets the same criteria listed above.

Remus Rusanu

Yes, it is possible to have a non-clustered primary key, and it is possible to have a clustered key that is completely unrelated to the primary key. By default a primary keys gets to be the clustered index key too, but this is not a requirement.

The primary key is a logical concept: is the key used in your data model to reference entities.
The clustered index key is a physical concept: is the order in which you want the rows to be stored on disk.

Choosing a different clustered key is driven by a variety of factors, like key width when you desire a narrower clustered key than the primary key (because the clustered key gets replicated in every non-clustered index. Or support for frequent range scans (common in time series) when the data is frequently accessed with queries like date between '20100101' and '20100201' (a clustered index key on date would be appropriate).

This subject has been discussed here ad nauseam before, see also What column should the clustered index be put on?.

First, I have to say that I have misgivings about the choice of a GUID as the primary key for this table. I am of the opinion that EmployeeNumber would probably be a better choice, and something naturally unique about the employee would be better than that, such as an SSN (or ATIN), which employers must legally obtain anyway (at least in the US).

Putting that aside, you should never base a clustered index on a GUID column. The clustered index specifies the physical order of rows in the table. Since GUID values are (in theory) completely random, every new row will fall at a random location. This is very bad for performance. There is something called 'sequential' GUIDs, but I would consider this a bit of a hack.

Clustered indexes cause the data to be physically stored in that order. For this reason when testing for ranges of consecutive rows, clustered indexes help a lot.

GUID's are really bad clustered indexes since their order is not in a sensible pattern to order on. Int Identity columns aren't much better unless order of entry helps (e.g. most recent hires)

Since you're probably not looking for ranges of employees it probably doesn't matter much which is the Clustered index, unless you can segment blocks of employees that you often aren't interested in (e.g. Termination Dates)

Since EmployeeNumber is unique, I would make it the PK. In SQL Server, a PK is often a clustered index.

Joins on GUIDs is just horrible. @JNK answers this well.

Using a clustured index on something else than the primary key will improve performance on SELECT query which will take advantage of this index.

But you will loose performance on UPDATE query, because in most scenario, they rely on the primary key to found the specific row you want to update.

CREATE query could also loose performance because when you add a new row in the middle of the index a lot of row have to be moved (physically). This won't happen on a primary key with an increment as new record will always be added in the end and won't make move any other row.

If you don't know what kind of operation need the most performance, I recommend to leave the clustered Index on the primary key and use nonclustered index on common search criteria.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!