This seems like a duplicate even as I ask it, but I searched and didn\'t find it. It seems like a good question for SO -- even though I\'m sure I can find it on many blogs e
If you have modelled, designed, normalised etc, then you will have no identity columns.
You will have identified natural and candidate keys for your tables.
You may decide on a surrogate key because of the physical architecture (eg narrow, numeric, strictly monotonically increasing), say, because using a nvarchar(100) column is not a good idea (still need unique constraint).
Or because of ideology: they appeal to OO developers I've found.
Ok, assume ID columns. As your db gets more complex, say several layers, how can you jon parent and grand-.child tables directly. You can't: you always need intermediate tables and well indexed PK-FL columns. With a composite key, it's all there for you...
Don't get me wrong: I use them. But I know why I use them...
Edit:
I'd be interested to collate "always ID"+"no stored procs" matches on one hand, with "use stored procs"+"IDs when they benefit" on the other...
There are two concepts that are close but should not be confused: IDENTITY
and PRIMARY KEY
Every table (except for the rare conditions) should have a PRIMARY KEY
, that is a value or a set of values that uniquely identify a row.
See here for discussion why.
IDENTITY
is a property of a column in SQL Server
which means that the column will be filled automatically with incrementing values.
Due to the nature of this property, the values of this column are inherently UNIQUE
.
However, no UNIQUE
constraint or UNIQUE
index is automatically created on IDENTITY
column, and after issuing SET IDENTITY_INSERT ON
it's possible to insert duplicate values into an IDENTITY
column, unless it had been explicity UNIQUE
constrained.
The IDENTITY
column should not necessarily be a PRIMARY KEY
, but most often it's used to fill the surrogate PRIMARY KEY
s
It may or may not be useful in any particular case.
Therefore, the answer to your question:
The question: should every table in a database have an IDENTITY field that's used as the PK?
is this:
IDENTITY
field as a PRIMARY KEY
.Three cases come into my mind when it's not the best idea to have an IDENTITY
as a PRIMARY KEY
:
PRIMARY KEY
is composite (like in many-to-many link tables)PRIMARY KEY
is natural (like, a state code)PRIMARY KEY
should be unique across databases (in this case you use GUID
/ UUID
/ NEWID
)All these cases imply the following condition:
IDENTITY
when you care for the values of your PRIMARY KEY
and explicitly insert them into your table.Update:
Many-to-many link tables should have the pair of id
's to the table they link as the composite key.
It's a natural composite key which you already have to use (and make UNIQUE
), so there is no point to generate a surrogate key for this.
I don't see why would you want to reference a many-to-many
link table from any other table except the tables they link, but let's assume you have such a need.
In this case, you just reference the link table by the composite key.
This query:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (a_id, b_id, PRIMARY KEY (a_id, b_id))
CREATE TABLE business_rule (id, a_id, b_id, FOREIGN KEY (a_id, b_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a
ON a.id = br.a_id
is much more efficient than this one:
CREATE TABLE a (id, data)
CREATE TABLE b (id, data)
CREATE TABLE ab (id, a_id, b_id, PRIMARY KEY (id), UNIQUE KEY (a_id, b_id))
CREATE TABLE business_rule (id, ab_id, FOREIGN KEY (ab_id) REFERENCES ab)
SELECT *
FROM business_rule br
JOIN a_to_b ab
ON br.ab_id = ab.id
JOIN a
ON a.id = ab.a_id
, for obvious reasons.
Recognize the distinction between an Identity field and a key... Every table should have a key, to eliminate the data corruption of inadvertently entering multiple rows that represent the same 'entity'. If the only key a table has is a meaningless surrogate key, then this function is effectively missing.
otoh, No table 'needs' an identity, and certainly not every table benefits from one... Examples are: A table with a short and functional key, a table which does not have any other table referencing it through a foreign Key, or a table which is in a one to zero-or-one relationship with another table... none of these need an Identity
Every table should have some set of field(s) that uniquely identify it. Whether or not there is a numeric identifier field separate from the data fields will depend on the domain you are attempting to model. Not all data easily falls into the 'single numeric id' paradigm, and as such it would be inappropriate to force it. Given that, a lot of data does easily fit in this paradigm and as such would call for such an identifier. There is no one answer to always do X in any programming environment, and this is another example.