Why use an auto-incrementing primary key when other unique fields exist?

问题

I'm taking a course called "database systems" and for our class project I have to design a website.

Here's an example of a table I created:

CREATE TABLE users
(
  uid INT NOT NULL AUTO_INCREMENT,
  username VARCHAR(60),
  passhash VARCHAR(255),
  email VARCHAR(60),
  rdate DATE,
  PRIMARY KEY(uid)
);

The professor told me "uid" (user id) was completely useless and unnecessary and I should have used the username as the primary key, since no two users can have the same username.

I told him it was convenient for me use a user id because when I call something like domain.com/viewuser?id=5 I just check the parameter with: is_numeric($_GET['id'])... needless to say he was not convinced.

Since I've seen user_id and other similar attributes (thread_id, comment_id, among others) on plenty of tutorials and looking at the database schema of popular software (eg. vbulletin) there must be plenty of other (stronger) reasons.

So my question is: How would you justify the need of a not null auto incrementing id as a primary key vs using another attribute like the username?

回答1:

Auto-incrementing primary keys are useful for several reasons:

They allow duplicate user names as on Stack Overflow
They allow the user name (or email address, if that's used to login) to be changed (easily)
Selects, joins and inserts are faster than varchar primary keys as its much faster to maintain a numeric index
As you mentioned, validation becomes very simple: if ((int)$id > 0) { ... }
Sanitation of input is trivial: $id = (int)$_GET['id']
There is far less overhead as foreign keys don't have to duplicate potentially large string values

I would say trying to use any piece of string information as a unique identifier for a record is a bad idea when an auto-incrementing numeric key is so readily available.

Systems with unique user names are fine for very small numbers of users, but the Internet has rendered them fundamentally broken. When you consider the sheer number of people named "john" that might have to interact with a website, it's ridiculous to require each of them to use a unique display name. It leads to the awful system we see so frequently with random digits and letters decorating a username.

However, even in a system where you enforced unique usernames, it's still a poor choice for a primary key. Imagine a user with 500 posts: The foreign key in the posts table is going to contain the username, duplicated 500 times. The overhead is prohibitive even before you consider that somebody might eventually need to change their username.

回答2:

If the username is the primary key and a user changes his/her username, you will need to update all the tables which have foreign key references to the users table.

回答3:

If you have demonstrated to your professor that assigning a unique arbitrary integer to each user is of value to your application then of course he would be wrong to say that it is "completely useless and unnecessary".

However, maybe you missed his point. If he told you that the requirement is that "no two users can have the same username" then you haven't met that requirement.

Sincere thanks for posting your SQL DDL, it is very useful but most don't bother on SO.

Using your table, I can do this:

INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);
INSERT INTO users (username) VALUES (NULL);

Which results in this:

SELECT uid, username, passhash, email, rdate 
FROM users;

uid   username   passhash   email   rdate
1     <NULL>     <NULL>     <NULL>  <NULL>
2     <NULL>     <NULL>     <NULL>  <NULL>
3     <NULL>     <NULL>     <NULL>  <NULL>
4     <NULL>     <NULL>     <NULL>  <NULL>

I think is the point your professor was trying to make: without enforcing the natural key on username you don't really have any data integrity at all.

If I was the prof, I'd also urge you to remove nullable columns from your design.

回答4:

This is typically called a surrogate key and it has many benefits. One of which is insulating your database relationships from the application data. More details and the corresponding disadvantages can be found at the wiki link provided above.

回答5:

Because someone might want to change their username (or any name for that matter).

回答6:

Your professor is doing the right thing by pointing out that you should have made username unique and not nullable if it was a requirement that user names should be unique. The uid could be a key as well but unless you are actually using it somewhere then it isn't needed. The more important aspect of the design ought to be to implement the natural key. So I agree with your professor's comment.

回答7:

I'll need someone with more database knowledge to back me up on this one, but i believe you get a faster response in foreign key lookup time.

Additionally, you may decide later that you want usernames to change, or that the requirements for usernames may change (maybe a longer string?). Using an ID prevents having to change all foreign keys.

Lets face it, most projects aren't going to expand that much, but do you really want to risk the headache 12 months down the road, when you could conform to good programming standards now?

回答8:

For instance, integer search (?id=5) is much way faster and has higher cardinality than string search (?username=bob). Another example, uid is auto_increment, so you don't have to insert it explicitly but it will auto increment in each insert query.

PS: Your prof is soooo wrong about it :D

回答9:

we use ID to prevent duplication data and it can make some procces become not complicated (if we want to update or delete data), it more simple if we use ID.

if you dont want to use ID you can use another fields. but dont forget to make them become UNIQUE. it can make your data become preventive from duplication data.

another way outside PRIMARY is UNIQUE.

回答10:

I go with all the answers above. I would say an ID is easy to implement and when it comes to indexing, Int is always preferred compared to a varchar. Your professor should know better, why would he say no to Int id is above me!

回答11:

Because userid is supposed to be unique (cannot be duplicated) & sometimes is index.

回答12:

And do you want to store your usernames in clear text for any one to steal? I would never consider using a natural key that I might want to encrypt someday (or want to encrypt now).

来源：https://stackoverflow.com/questions/4103433/why-use-an-auto-incrementing-primary-key-when-other-unique-fields-exist

标签

sql

database

database-design

data-modeling