What is Normalization?
Normalization is the process of dividing the large database tables into smaller tables to reduce data redundancy.
What is data redundancy?
Data redundancy is the unnecessary duplication of the data in our database that not only increases the size of our database but also creates the data anomalies.
What are data anomalies?
Anomalies are the problems that occur when there is too much data redundancy which indicates the database is poorly planned and unnormalized. There are three types of data anomalies:
- Update Anomaly: It happens when we try to update the record in our table. In the employee table, we have a project manager for each project. Now, most probably each project has more than one employee, which means more than one record in our table. For each record, we are repeating the project code, project name, and project manager. Now If we have to change the project manager, let’s say for the GLO project from Qadir Shaikh to Hamza Imran. We have to change it for every record where the project is GLO. There is a chance we might miss one or two entries leaving the database in an inconsistent state. Where in some records project manager is Qadir Shaikh and for others, it is Hamza Imran.
- Insertion Anomaly: Let’s say we want to add a new employee who has been hired recently and currently is in training so hasn’t been assigned any project yet. We can’t add employees without the project details or we have to fill the project fields with null values. OR let’s say we want to add a new project in our database that we have received from the client but we haven’t started working on it yet, so currently, there is no employee assigned to this project. But we can’t store the project details without any employee since employee_id (primary key) is linked with the employee.
- Deletion Anomaly: Let’s say our project NJAS is completed and we want to remove it from our table so we will remove all the entries of the table where the project is NJAS. But If we do so we will also remove the record of all the employees (Umer shaikh all others) who were working on this project We can remove these anomalies by normalizing our Employees table. There are five rules to normalize the table. Each rule is called the normal form. There is also another normal form called Boyce Codd normal form which is also called 3.5 normal form. Let’s look at this one by one.
Type of Normal Form:
- 1st Normal Form says each should have more than one value
- 2nd Normal Form removes Partial Dependency
- 3rd Normal Form removes Transitive Dependency
- Boyce Codd Normal Form (the strict version of 3NF)
- 4th Normal Form removes Multi-valued Dependency
- 5th Normal Form removes Join Dependency
For more details, you can read this article to learn about all the normal forms and how to apply them
What is Normalization? | How to apply Normalization up to the 5th Normal Form?