normalization in database

问题

I have one table. How can I normalize this.

回答1:

Good question given your data set. Keep in mind the whole point of normalization is to reduce duplication. 3NF is often the best way to go. But given my experience I've found very little benefit to pulling a repeated value out into a different table if it will be the only value in that table. Take for example your most duplicated column, emp_type. If you were to normalize it into a separate table , it would look like this:

Emp_Type_Id | Emp_type
------------------------
1           | Manager
2           | Engineer
3           | Tech Support

And your current table would look like:

Emp_ID | Emp_Name | Salary | Emp_Type_Id | Emp_Skill
----------------------------------------------------
1      | raj      | 90000  | 1           | Department
2      | ravi     | 50000  | 2           | Software
3      | shyam    | 70000  | 2           | Hardware
.
.
.

This is technically more normalized than before as the emp_type value is no longer repeated in your database. But it's significantly more hassle to deal with a separate table and a relationship for a single value. If emp_type included other information as well such as Valid_Salary_Range, Department_Location, etc. then it would absolutely be best to normalize it out into a separate table. But if it's only a single value you are normalizing, what's the difference between storing the value as opposed to storing an ID pointed to that value? In my opinion it's a pointless extra step.

Long story short, I would not normalize your table at all, it's already normalized to a sufficient level.

回答2:

Even before we start, I'll mention each step's rules first. Why so? Because I don't know the dependencies in your table. And so putting forth the rules and my assumptions about your data will clarify how I arrived at the final solution.

In the end, if you aren't happy with the answer, for the least you will take away the understanding of how normalization works and can go ahead with doing it by yourself on the data that only you understand the best.

The Natural Order of Database Designing

Start with specifying the entities (these are the tables)
Specify the needed attributes of your entities (these are columns)
Specify the unique property of your entity (the primary key of your table). If there is none, give one yourself (also called synthetic or surrogate primary keys).
Finally specify the relation between these entities (primary key-foreign key relation and 1-N, N-N relation etc.)

----------

Now, that the basic designing rule is in place, it is easy to see that your single table actually has four separate entities fused together. They are:

1. Employee - who has id and name

2. Role - which can be Engineer, Tech Support etc

3. Department - which can be Software, Hardware, Department (could use better word though), Voice etc.

4. Salary - which will have EmpId, DateOfChange and Amount. This is because employees have different salaries and also the salaries of same employee changes over time.

So, we are going to break this table to four tables as mentioned above. Out of the four tables, Employee table has id as the primary key, Role and Department need synthetic (maybe even auto-incrementing) keys and Salary will have {EmpId, DataOfChange} together as the primary key. This will look like:

Table Name    Columns
Employee      Id, Name, RoleId, DeptId
Role          RoleId, RoleType
Department    DeptId, DeptName
Salary        EmpId, DateOfChange, Amount

All the above table can have more items. I'm trying to hit the design that has minimalistic difference from that table you have already given. Like Salary table can also have a field like ReasonOfSalaryChange which could have values like NewHiring, Promotion etc... But we hold the changes to minimal.

Hope all is ok until now. If that is so, we move on to the actual normalization that you have asked.

Common Use Normalizations

I'm mentioning common use rules because honestly I never needed BCNF, 4th and 5th NF in practice.

The rules -

1NF: simply states that all columns must have atomic values. If a column needs more than one value, create another table. Your new tables pass that test. All values in your table's columns are atomic.

2NF: requires 1NF qualification and that any non-key field should be dependent on the entire primary key. All your non-key fields (Name in Employee, RoleType in Role, DeptName in Department, Amount in Salary) depend on the respective table's primary key (Id, RoleId, DeptId and {EmpId, DateOfChange}). So the tables qualify 2NF fine.

3NF: requires 2NF qualification and that no non-key field should depend upon any other non-key field. This means that there should be no dependencies between table's columns except on the primary keys. Role, Department and Salary tables are default 3NF qualified as they have only one non-key column and it's dependent on PK. Employee table, you can verify yourself, has no non-key element that is dependent on any other non-key element. So the tables are perfectly 3NF qualified the way they currently are.

All that's left now is to point out that RoleId, DeptId and EmpId are foreign keys from the Role, Department and Employee table respectively. And that would be my final redesigned and normalized database submission to you.

来源：https://stackoverflow.com/questions/32190581/normalization-in-database

标签

database-design

database-normalization