Should I have a dedicated primary key field?

五迷三道 提交于 2019-12-17 07:34:02

问题


I'm designing a small SQL database to be used by a web application.

Let's say a particular table has a Name field for which no two rows will be allowed to have the same value. However, users will be able to change the Name field at any time.

The primary key from this table will be used as a foreign key in other tables. So if the Name field was used as the primary key, any changes would need to be propagated to those other tables. On the other hand, the uniqueness requirement would be handled automatically.

My instinct would be to add an integer field to act as the primary key, which could be automatically populated by the database. Is there any point in having this field or would it be a waste of time?


回答1:


I would use a generated PK myself, just for the reasons you mentioned. Also, indexing and comparing by integer is faster than comparing by strings. You can put a unique index on the name field too without making it a primary key.




回答2:


What you are describing is called a surrogate key. See the Wikipedia article for the long answer.




回答3:


Though it's faster to search and join on an integer column (as many have pointed out), it's even faster to never join in the first place. By storing a natural key, you can often eliminate the need for a join.

For a smallish database, the CASCADE updates to the foreign key references wouldn't have much performance impact, unless they were changing extremely often.

That being said, you should probably use an integer or GUID as a surrogate key in this case. An updateable-by-design primary key isn't the best idea, and unless your application has a very compelling business reason to be unique by name - you will inevitably have conflicts.




回答4:


Yes - and as a rule of thumb, always, for every table.

You should definitely not use a changeable field as a primary key and in the vast majority of circumstances you don't want to use a field that has any other purpose as a primary key.

This is basic good practice for db schemas.




回答5:


Have an integer primary key is always a good thing from the performance prospective. All of your relationships will be much more efficient with an integer primary key. For example, JOINs will be very much faster (SQL Server).

It will also allow you future modifications of the database. Quite often you have a unique name column only to find out later that the name it is not unique at all.

Right now, you could enforce the uniqueness of the column Name by having an index on it as well.




回答6:


I would use an auto-generated ID field for the primary key. It's easier to join with tables based off integer IDs than text. Also, if field Name is updated often, if it were a primary key, the database would be put under stress for updating the index on that field much more often.

If field Name is always unique, you should still mark it as unique in the database. However, often there will be a possibility (maybe not currently but possibly in the future in your case) of two same names, so I do not recommend it.

Another advantage for using IDs is in the case you have a reporting need on your database. If you have a report you want for a given set of names, the ID filter on the report would stay consistent even when the names might change.




回答7:


If you're living in the rarefied circles of theoretical mathematicians (like C. Date does in the-land-where-there-are-no-nulls, because all data values are known and correct), then primary keys can be built from the components of the data that identify the idealized platonic entity to which you are referring (i.e. name+birthday+place of birth+parent's names), but in the messy real world "synthetic keys" that can identify your real-world entities within the context of your database are a much more practical way to do things. (And nullable fields can be very useful to. Take that, relational-design-theory people!)




回答8:


If your name column will be changing it isn't really a good candidate for a primary key. A primary key should define a unique row of a table. If it can be changed it's not really doing that. Without knowing more specifics about your system I can't say, but this might be a good time for a surrogate key.

I'll also add this in hopes of dispelling the myths of using auto-incrementing integers for all of your primary keys. It is NOT always a performance gain to use them. In fact, quite often it's the exact opposite. If you have an auto-incrementing column that means that every INSERT in the system now has that added overhead of generating a new value.

Also, as Mark points out, with surrogate IDs on all of your tables if you have a chain of tables that are related, to get from one to another you might have to join all of those tables together to traverse them. With natural primary keys that is usually not the case. Joining 6 tables with integers is going to usually be slower than joining 2 tables with a string.

You also often loose the ability to do set-based operations when you have auto-incrementing IDs on all of your tables. Instead of insert 1000 rows into a parent table, then inserting 5000 rows into a child table, you now have to insert the parent rows one at a time in a cursor or some other loop just to get the generated IDs so that you can assign them to the related children. I've seen a 30 second process turned into a 20 minute process because someone insisted on using auto-incrementing IDs on all of the tables in a database.

Finally (at least for reasons I'm listing here - there are certainly others), using auto-incrementing IDs on all of your tables promotes poor design. When the designer no longer has to think about what a natural key might be for a table it usually results in erroneous duplicates ending up in the data. You can try to avoid the problem with unique indexes, but in my experience developers and designers don't go through that extra effort and after a year of using their new system they find that the data is a mess because the database didn't have proper constraints on the data through natural keys.

There's certainly a time for using surrogate keys, but using them blindly on all tables is almost always a mistake.




回答9:


The primary key for a record must be unique and permanent. If a record naturally has a simple key which fulfills both of those, then use it. However, they don't come around very often. For a person record, the person's name is neither unique nor permanent, so you pretty much have to use a auto-increment.

The one place where natural keys do work is on a code table, for example, a table mapping a status value to its description. There is little sense to give "Active" a primary key of 1, "Delay" a primary key of 2, etc. When it is just as easy to give "Active" a primary key of "ACT"; "Delayed", "DLY"; "On Hold", "HLD" and so on.

Note also, some say you should use integers over strings because they compare faster. Not really true. A comparing two 4-byte character fields will take exactly as long as comparing two 4-byte integer fields. Longer string will, of course take longer, but if you keep the codes short, there's no difference.




回答10:


The primary key must be unique for every row. The auto_increment Integer is very good idea, and if you don't have other ideas about populating the primary key then this is the best way.




回答11:


In addition to what is all said, consider using a UUID as PK. It will allow you to create keys that are uniq spanning multiple databases.

If you ever need to export/merge data with other database, then the data will always stay unique and relationships can be easily maintained.



来源:https://stackoverflow.com/questions/166750/should-i-have-a-dedicated-primary-key-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!