What do Clustered and Non clustered index actually mean?

前端 未结 11 2146
粉色の甜心
粉色の甜心 2020-11-22 06:40

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes. I goo

11条回答
  •  天命终不由人
    2020-11-22 07:40

    Let me offer a textbook definition on "clustering index", which is taken from 15.6.1 from Database Systems: The Complete Book:

    We may also speak of clustering indexes, which are indexes on an attribute or attributes such that all of tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them.

    To understand the definition, let's take a look at Example 15.10 provided by the textbook:

    A relation R(a,b) that is sorted on attribute a and stored in that order, packed into blocks, is surely clusterd. An index on a is a clustering index, since for a given a-value a1, all the tuples with that value for a are consecutive. They thus appear packed into blocks, execept possibly for the first and last blocks that contain a-value a1, as suggested in Fig.15.14. However, an index on b is unlikely to be clustering, since the tuples with a fixed b-value will be spread all over the file unless the values of a and b are very closely correlated.

    Note that the definition does not enforce the data blocks have to be contiguous on the disk; it only says tuples with the search key are packed into as few data blocks as possible.

    A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.

    To connect two concepts together, a clustered relation can have a clustering index and nonclustering index. However, for non-clustered relation, clustering index is not possible unless the index is built on top of the primary key of the relation.

    "Cluster" as a word is spammed across all abstraction levels of database storage side (three levels of abstraction: tuples, blocks, file). A concept called "clustered file", which describes whether a file (an abstraction for a group of blocks (one or more disk blocks)) contains tuples from one relation or different relations. It doesn't relate to the clustering index concept as it is on file level.

    However, some teaching material likes to define clustering index based on the clustered file definition. Those two types of definitions are the same on clustered relation level, no matter whether they define clustered relation in terms of data disk block or file. From the link in this paragraph,

    An index on attribute(s) A on a file is a clustering index when: All tuples with attribute value A = a are stored sequentially (= consecutively) in the data file

    Storing tuples consecutively is the same as saying "tuples are packed into roughly as few blocks as can possibly hold those tuples" (with minor difference on one talking about file, the other talking about disk). It's because storing tuple consecutively is the way to achieve "packed into roughly as few blocks as can possibly hold those tuples".

提交回复
热议问题