Use item specific prefixes and autonumber for primary keys?

问题

We had a meeting this morning about how would should store our ID for some assets that we have in our database that we are making, the descusion generated a bit of heat so I decided to consult the experts of SO.

The table structure that I belive that we should have(short version) is like the following:

Example 1)

AssetId - int(32) - Primary Key
Type - string

so some example data is like this:

==AssetId======Type===
  12345        "Manhole"
  155415       "Pit"

etc.

Another member of the team suggested something like this:

Example 2)

AssetId - string - Primary Key
Type - string

so some example data is like this:

==AssetId======Type===
  "MH12345"    "Manhole"
  "P155415"    "Pit"

where we make a short version of the type and append it to the front of the ID and store it in the database. I have seen a few asset databases that do this and have never really this approach.

I have never really liked the idea of using strings as ID for sorting reasons. I also feel like it is storing useless information just for the sake of it when you already have the type of asset store anyway.

What approach would you take? And why? Are there any benefits to using approach 1 over 2?

EDIT: Yes I will be using AUTO_INCREMENT for approach 1.

回答1:

Usually the rule of thumb is that never use meaningful information in primary keys (like Social Security number or barcode). Just plain autoincremented integer. However constant the data seems - it may change at one point (new legislation comes and all SSNs are recalculated).

回答2:

This is a decision between surrogate and natural keys, the first being surrogate (or "technical") and the second being natural.

I've come to the conclusion that you should pretty much always use surrogate keys. If you use natural keys, those may change and updating primary/foreign keys is not generally a good idea.

回答3:

I'd go for the former. Creating unique IDs should be left to the SQL server, and you can't have those created automagically in a thread-safe manner if they're strings. To my understanding you'd have to handle that yourself somehow?

Speed is another factor. Dealing with int values is always going to be faster than strings. I'd say that there are other perf benefits around indexing that a much more SQL savvy person than me could elaborate on ;)

In my experience, having string IDs has been a fail.

回答4:

I would choose a numeric primary key for performance reasons. Integer comparisons are much cheaper than string comparisons, and it will occupy less space in the DB.

回答5:

Well, I want to make some points and suggestions,

Consider having a separate table for Type, say with the column Id and Desc, then make a foreign key TypeId in this table. One step further in order to normalize the thing. But it may not desirable. Do it if you think it serve some purpose
Making it String does make sense, if later you folks think of shifting towards UUID. You don't need to change the data-type then

[Edited]

I agree with Cletus here. That surrogate key proved to be beneficial in some real life projects. They allow change, and you know well that, change is the only constant.

回答6:

I personally believe the first approach is far, far better. It lets the database software do simple integer comparisons to find and sort by the key, which will improve table operation performance (SELECTs, complex JOINs, by-key INDEX lookups, etc.)

Of course, I'm assuming that either way, you're using some kind of auto-incrementing method to produce the IDs - either a sequence, an AUTO_INCREMENT, or something similar. Do me a favor, and don't build those in your program's code, OK?

回答7:

I prefer Example 1 for the reasons you mentioned and the only argument that I can think of for using Example 2 is if you are trying to accomodate string IDs from an existing database (quite common) however even in that scenario, I prefer to use the following approach.

==AssetId(PK)==Type========DeprecatedId====
  12345        "Manhole"   "MH64247"
  155415       "Pit"       "P6487246"

回答8:

If your assets already have unique natural identifiers (such as employees with their employee IDs), use them. There's no point creating another unique identifier.

On the other hand, if there's no natural unique ID, use the shortest one you can that'll ensure enough unique keys for your expected table size (such as your integer). It'll require less disk space and probably be faster. And, in addition, if you find yourself needing to use a string-based key later, it's a simple substitution job:

add sting primary key to asset table.
add string foreign key to referring tables.
update string relationships with simple UPDATE command using integer relationships.
add foreign key constraints for sting columns.
remove foreign key constraints for integer columns.
remove integer columns altogether.

Some of these steps may be problematic on specific DBMS', perhaps requiring a table unload/reload to delete the integer primary key columns but that strategy is basically what's required.

回答9:

The one and only advantage of example 2 is that you can easily tell just from the primary key alone which row of which table this key applies to. The idea is nice, but whether or not it is useful depends on your logging and errormessage strategies. It does probably have a performance disadvantage, so I would not use it unless you can name some specific reasons why to use it.

(You can have this advantage also by using a global sequence to generate numerical keys, or by using different numeric ranges, last digits or whatever. Then you don't have performance disadvantages, but maybe you won't find the table so easily.)

来源：https://stackoverflow.com/questions/506164/use-item-specific-prefixes-and-autonumber-for-primary-keys

标签

database-design

primary-key