问题
From Database System Concepts
We use the term hash index to denote hash file structures as well as secondary hash indices. Strictly speaking, hash indices are only secondary index structures. A hash index is never needed as a clustering index structure, since, if a file itself is organized by hashing, there is no need for a separate hash index structure on it. However, since hash file organization provides the same direct access to records that indexing provides, we pretend that a file organized by hashing also has a clustering hash index on it.
Is "secondary index" the same concept as "nonclustering index" (which is what I understood from the book)?
Is a hash index never a clustering index or not?
Could you rephrase or explain why the reason "A hash index is never needed as a clustering index structure" is "if a file itself is organized by hashing, there is no need for a separate hash index structure on it"? What about "if a file itself is not organized by hashing"?
Thanks.
回答1:
The text tries to explain something but unfortunately creates more confusion than it resolves.
At the logical level, database tables (correct term : "relations") are made up of rows (correct term : "tuples") which represent facts about the real world the db is aimed to represent/reflect. Don't ever call those rows/tuples "records" because "records" is a concept pertaining to the physical level, which is distinct from the logical.
Typically, but this is not a universal law cast in stone, you will find that the physical organization consists of a "main" datastore which has a record for each tuple and where that record contains each and every attribute (column) value of the tuple (row). (That's unless there are LOBs in play or so.) Those records must be given a physical location in the store they are stored in and this is usually/typically done using a B-tree on the primary key values. This facilitates :
- retrieving only specific [tuples/rows with] primary key values from the relation/table.
- traversing the [tuples of] relation in-order of primary key values
- retrieving only [tuples/rows within] specific ranges of primary key values from the relation/table.
This B-tree on the primary key values is typically called the "clustering" index.
Often, there is also a frequent need for retrieving only [tuples/rows with] specific values of attributes that are not the primary key. If that needs to be done as efficiently/fast as it can for values of the primary key, we use similar indexes that are then sometimes called "secondary". Those indexes typically do not contain all the attribute/column values of the tuple/row indexed, but only the attribute values to be indexed plus a mention of the primary key value (so we can find the rest of the attributes in the "main" datastore.
Those "secondary" indexes will mostly also be B-tree indexes which will permit in-order traversal for the attributes being indexed, but they can potentially also be hashing indexes, which permit only to look up tuples/rows using equality comparisons with a given key value ("key" = index key, nothing to do with the keys on the relation/table, though obviously for most keys on the table/relation, there will be a dedicated index too where the index key has the same attributes as the table key it supports).
Finally, there is no theoretical reason why a "primary" (/"clustered") index could not be a hash index (the text kinda suggests the opposite but that is plain wrong). But given the poor level of the explanation in your textbook, it is probably not expected of you to be taught that.
Also note that there are still other ways to physically organize a database than just using B-tree or hash indexes.
So to sum up :
"Clustered" usually refers to the index on the primary data records store and is usually a B-tree [or some such] on the primary key and the textbook presumably does not want you to know about more advanced possibilities
"Secondary" usually refers to additional indexes that provide additional "fast access to specific tuples/rows" and is usually also a B-tree that permits in-order traversal just like the "clustered"/"primary" index but can also be a hash index that permits only "access by given value" but no in-order traversal.
Hope it helps.
来源:https://stackoverflow.com/questions/50909435/is-a-hash-index-never-a-clustering-index