db design critic /suggestions

问题

I am redesigning a pharmacy db system and need inputs to see if the new design is optimal or requires tweaking.

Here's a snapshot of the old system..

As can be seen, the pharmacies table stores pharmacy information, along with its address and contact information. Pharmacies are grouped together for invoicing purposes(pharmacygroup) or for sales, advertsing other purposes (banner group). The invoice group may have a different physical address, different contact information.

Here's my new design. I have split the address from both the pharmacy and pharmacygroup table into a table of its own and made a new table for contacts. Their could be technical contacts, account contacts, owner contacts etc, hence the contacttypes table. The pharmacy and the pharmacygroup can have separate contact info, I thought of making a single contact table and have a 'linktype' and 'linkid' column to indicate if its a pharmacy contact or pharmacy group contact, but I am not sure if this is a right approach. Is this a good design or will it be costly in terms of data retrieval because of the number of joins?? Another thing I noted that , is in the old design , they didn't create any foreign key constraints, although the pharmacy table had groupid and bannergroupid references for pharmacygroup and bannergroup, possibly to save time for data retrieval. Is this a good approach?

回答1:

Your design looks good to me. I always prefer to have a couple of extra joins on the design step over spending time reorganizing data after system went into production. You never know in advance what kind of reports will be requested by management/sales/financial people, and proper relational design will give you more freedom.

Also, you cannot blame only a couple of extra JOINs for your performance issues. You should always look at:

data volumes (and physical data layout),
transaction amount and density,
I/O, CPU, memory usage,
your RDBMS configuration,
SQL queries quality.

In my view, JOINs will be on the bottom of this list.

As to the RI constraints (Referential Integrity), I've seen a couple of projects that had been running without any Primary/Foreign keys for increased performance. The main excuse was: we have all checks embedded into the Application and Application is the only source of any changes in the system. On the other hand, they agreed, that it is not known, whether systems were in a consistent state (in fact, analysis showed they were not).

I always stick to creating all possible keys/constraints on the design state, as there always will be some “cowboys” around, who will dig into your database and “adjust” data they seem fits better. Still, you might want to temporarily disable or even drop some constraints/indexes for the bulk data manipulations, which is also an official recommendation.

If uncertain, create 2 test databases, one with and another without constraints. Load some data and compare query performance. I think it will be similar.

And here my comments on your sketches, decisions are all yours.

You might want to create a common contacts table the same way you did for addresses, i.e. add contact_id, owner_contact_id, etc. columns to the target relations instead of referencing relations from contacts table;
As you have only one column in contacttype table (and in case you'll have a common contacts), it's better to move the only field away and avoid this table;
You seem to have mixture of singular/plural names for your tables, better to stick to a common pattern here. I personally prefer singular;
In pharmacygroup your PK is named id, while all the rest PKs follow tableid pattern, it will be easier to write scripts later if you'll use a common pattern here;
In addresses table you have fields with underscores, like street_name, while elsewhere you avoid _ — consider making it common;
References are named differently. Although it is not so highly important, I do have a couple of systems where I have to rely on the constraints' names, so it's better to use some pattern here. I use the following one:
1. prefix p_, f_, c_, t_, u_ or i_ for primary, foreign keys, check constraints, triggers, unique and other indexes;
2. name of the table;
3. name of the column constraint/index/trigger refers to.

Why I prefer naming tables in singular form? Because I always name PK using table_id pattern, and IMHO pharmacy_id looks better then pharmacies_id. I use this approach as I have a bunch of general-purpose scripts which relies on this pattern when performing data consistency checks prior to loading it into the main tables.

EDIT: More on contacts. You can use contact_id in all your tables, making it a primary contact, whatever this might mean in your application. Should you need more contacts to be there for some relations, then you can go with different prefixes, like owner_contact_id, sales_contact_id, etc.

In case you expect a huge number of contacts to be there for some relations, like pharmacygroup, then you will can add an extra table like this:

CREATE TABLE pharmacygroupcontact (
    contactid     int4,
    groupid       int4,
    contact_desc  text
);

It partially copies your initial groupcontacts, but consists of two FKs and a description. Which approach is better I cannot tell as I'm not aware how Application is designed.

回答2:

You have 2 contact tables, I would create one, then use linking tables to link groupcontacts and pharmacycontacts. I would definitely want to have the FK and PK relationships setup to.

来源：https://stackoverflow.com/questions/10099684/db-design-critic-suggestions

标签

sql

database

postgresql