In previous chapter, “Types of keys“, we had a good discussion on keys and their types in SQL Server. In this chapter “Database normalization”, we are going to discuss database normalization process and normal forms. Lets start our discussion with definition of Normalization.
Normalization
Normalization is a process of minimizing redundant data from database by decomposing the columns of a table into multiple tables. This process increases the number of tables in the database but it minimizes the redundant (duplicate), inaccurate and inconsistent data in database. Decomposed tables are connected using relationships (primary keys and foreign keys) to reduce the insert, update and delete anomalies. During normalization process, database designer decides the actual structure of the tables and their relationships. It is easy to find and fix any design problem at this early stage. During normalization process, we have various normal forms with some conditions. A table must satisfy all conditions of that normal form to qualify be in that normal form. Lets discuss what is a normal form in next sub section.
Normal Form
Normal form defines a set of standard which must be followed for a good database design. During normalization process database designer converts the database tables in normal forms and check its functional dependency. Each normal form has some predefined standards which must be followed to qualify that normal form. To qualify each normal form, a table must qualify the previous normal form conditions first i.e. a table in 3NF must satisfy all conditions of 2NF which in turn must satisfy all conditions of 1NF.
Lets try to understand the types of normal form with an example of employee table:
First Normal Form – 1NF
A table can be in First Normal Form (1NF), when it satisfies below conditions:
- All columns of a table must have a single atomic value in each row of a table i.e. a cell (intersection of rows and columns) in a table must contain a single atomic value.
- Each row of a table should have an unique identifier to uniquely identify the rows of the table i.e. no two rows of a table can be identical in 1NF. A primary key or composite key can be used to uniquely identify records of the table.
To understand the concept of First Normal Form (1NF), consider this employee table for an example:
EmpId | EmpName | ProjectId | ProjectName | Zip | City |
---|---|---|---|---|---|
100 | Rohit | 300,301 | ERP, Finance Management | 123456 | Kolkata |
101 | Ramesh | 300,302 | ERP, KPO Management | 654321 | New Delhi |
To bring the employee Table in First Normal Form; 1) Organize the ProjectId and ProjectName attribute values in single atomic values and then 2) EmpId and ProjectId attributes can uniquely identify each row of the employee table.
Have a look on employee table after bringing it in First Normal Form:
EmpId | ProjectId | EmpName | ProjectName | Zip | City |
---|---|---|---|---|---|
100 | 300 | Rohit | ERP | 123456 | Kolkata |
100 | 301 | Rohit | Finance Management | 123456 | Kolkata |
101 | 300 | Ramesh | ERP | 654321 | New Delhi |
101 | 302 | Ramesh | KPO Management | 654321 | New Delhi |
Note: EmpId and ProjectId uniquely identifies each row of employee table.
Second Normal Form – 2NF
A table can be in Second Normal Form (2NF), if satisfies below conditions:
- The table must satisfy all the conditions of 1NF.
- All non-key attributes (columns) are dependent on key columns i.e. Each column of the table must be fully functionally dependent on key column (or set of columns). Partial dependencies must be removed from the table in case table has a composite primary key.
Note: A table with a single column primary key is automatically in 2NF and does not need to be tested for partial dependency.
In continuation of employee table as an example, to bring this table in 2NF, we have to find and remove any partial dependency of non key columns (EmpName, ProjectName, Zip and City) on key columns (EmpId and ProjectId). We can see that column ProjectName functionally depends on ProjectId but not on EmpId attribute which shows a partial dependency in the table. To bring this table in Second Normal Form, we need to break this table in two tables, employee table and project table as below:
Employee Table in 2NF:
EmpId | EmpName | Zip | City |
---|---|---|---|
100 | Rohit | 123456 | Kolkata |
101 | Ramesh | 654321 | New Delhi |
Project Table in 2NF:
EmpId | ProjectId | ProjectName |
---|---|---|
100 | 300 | ERP |
100 | 301 | Finance Management |
101 | 300 | ERP |
101 | 300 | KPO Management |
Third Normal Form – 3NF
A table must satisfy below conditions to be in Third Normal Form (3NF):
- The table must satisfy all conditions of 2NF.
- Transitive dependency of non-key attributes on key column must be removed i.e. if column A depends on column B and column B depends on column C, column A is transitively depends on column C. Any non-key column must not dependent on another non-key column.
To bring employee table in Third Normal Form (3NF), we have to find and remove any transitive dependency from this table. We can see that attribute City depends on Zip, and attribute Zip depends on EmpId which shows a transitive dependency of city attribute on EmpId. To bring this table in 3NF, split the employee table as below:
Employee table in 3NF:
EmpId | EmpName | Zip |
---|---|---|
100 | Rohit | 123456 |
101 | Ramesh | 654321 |
Zip – City table:
Zip | City |
---|---|
123456 | Kolkata |
654321 | New Delhi |
Boyce-Codd Normal Form
Boyce-Codd Normal Form (BCNF) is an extension of 3NF with strict condition. A table must satisfy below conditions to be in BCNF:
- Table must be in 3NF.
- For any functional dependency X -> Y, X should be a super key.
In employee table, in dependency EmpId -> EmpName, Zip, EmpId is the super key and in Zip – City table, in dependency Zip -> City, Zip is the super key for city.
We might need to test the above BCNF functional dependency in tables which have overlapping composite candidate keys.
Normal forms at a glance
Its time to summarize our reading. We have below image to summarize the reading on normal forms:
Note on normalization
Normalization helps in managing a database efficiently by reducing inconsistent, inaccurate and redundant data from database but over normalization must be avoided during this practice. Over normalization might be occurred as a result of decomposing a table in many small tables in such a way so that a reference table need to refer another table to get a single unit of information. Also under normalization leads to inconsistent, inaccurate and redundant data in database. We must balance database normalization between over normalization and under normalization to get the benefit of fast query processing and efficiency of database management.
Till now, we have discussed below database concepts in this tutorial:
Now we are good to start with T-SQL and its concepts from next chapter.
Pingback: Tech Recruiter Tries Coding pt. 2 — Google Apps (Type) Script | FusionWorks