What's the difference between a Table Scan and a Clustered Index Scan?

后端 未结 3 1525
离开以前
离开以前 2020-12-04 09:37

Since both a Table Scan and a Clustered Index Scan essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

3条回答
  •  失恋的感觉
    2020-12-04 10:06

    In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

    A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

    If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

    And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

    So:

    • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

    • For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

    • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

    • For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

    Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

    • INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
    • UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
    • DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
    • single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
    • range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
    • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.

提交回复
热议问题