Is this a good way to model address information in a relational database?

删除回忆录丶 提交于 2019-11-27 17:32:09

I actually use this as one of my interview questions. The following is a good place to start:

Addresses
---------
AddressId (PK)
Street1
... (etc)

and

AddressTypes
------------
AddressTypeId
AddressTypeName

and

UserAddresses (substitute "Company", "Account", whatever for Users)
-------------
UserId
AddressTypeId
AddressId

This way, your addresses are totally unaware of how they are being used, and your entities (Users, Accounts) don't directly know anything about addresses either. It's all up to the linking tables you create (UserAddresses in this case, but you can do whatever fits your model).

One piece of somewhat contradictory advice for a potentially large database: go ahead and put a "primary" address directly on your entities (in the Users table in this case) along with a "HasMoreAddresses" field. It seems icky compared to just using the clean design above, but can simplify coding for typical use cases, and the denormalization can make a big difference for performance.

Option 2, without a doubt.

Some important things to keep in mind: it's an important aspect of design to indicate to the users when addresses are linked to one another. I.e. corporate address being the same as shipping address; if they want to change the shipping address, do they want to change the corporate address too, or do they want to specify a new loading dock? This sort of stuff, and the ability to present users with this information and to change things with this sort of granularity is VERY important. This is important, too, about the updates; give the user the granularity to "split" entries. Not that this sort of UI is easy to design; in point of fact, it's a bitch. But it's really important to do; anything less will almost certainly cause your users to get very frustrated and annoyed.

Also; I'd strongly recommend keeping around the old address data; don't run a process to clean it up. Unless you have a VERY busy database, your database software will be able to handle the excess data. Really. One common mistake I see about databases is attempting to overoptimize; you DO want to optimize the hell out of your queries, but you DON'T want to optimize your unused data out. (Again, if your database activity is VERY HIGH, you may need to have something does this, but it's almost a certainty that your database will work well with still having excess data around in the tables.) In most situations, it's actually more advantageous to simply let your database grow than it is to attempt to optimize it. (Deletion of sporadic data from your tables won't cause a significant reduction in the size of your database, and when it does... well, the reindexing that that causes can be a gigantic drain on the database.)

I think I agree with JohnFx..

Another thing about (snail-)mail addresses, since you want to include country I assume you want to ship/mail internationally, please keep the address field mostly freeform text. It's really annoying having to make up an 5 digit zip code when Norway don't have zip-codes, we have 4 digit post-numbers.

The best fields would be:

  • Name/Company
  • Address (multiline textarea)
  • Country

This should be pretty global, if the US-postal system require zip-codes in a specific format, then include that too but make it optional unless USA is selected as country. Everyone know how to format the address in their country, so as long as you keep the linebreaks it should be okay...

Do you want to keep a historical record of what address was originally on the purchase order?

If yes go with option 1, otherwise store it in the supplier table and link each purchase order to the supplier.

BTW: A sure sign of a poor DB design is the need for an automated job to keep the data "cleaned up" or in synch. Option 2 is likely a bad idea by that measure

Why would any of the rows on the address table become unused? Surely they would still be pointed at by the purchase order that used them?

It seems to me that stopping the duplicates should be the priority, thus negating the need for any cleanup.

In the case of orders, you would never want to update the address as the person (or company) address changed if the order has been sent. You meed the record of where the order was actually sent if there is an issue with the order.

The address table is a good idea. Make a unique constraint on it so that the same entity cannot have duplicate addresses. You may still get them as users may add another one instead of looking them up and if they spell things slightly differently (St. instead of Street) the unique constraint won't prevent that. Copy the data at the time the order is created to the order. This is one case where you want the multiple records because you need a historical record of what you sent where. Only allowing inserts and deletes to the table makes no sense to me as they aren't any safer than updates and involve more work for the database. An update is done in one call to the database. If an address changes in your idea, then you must first delete the old address and then insert the new one. Not only more calls to the databse but twice the chance of making a code error.

I've seen every system using option 1 get into data quality trouble. After 5 years 30% of all addresses will no longer be current.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!