Normal Forms

4.5 Normal Forms

Normalization is a procedure in relational database design that aims at converting relational schemas into a more desirable form. The goal is to remove redundancy in relations and the problems that follow from it, namely insertion, deletion and update anomalies.

The Normal forms progress towards obtaining an optimal design. Normalization is a step- wise process, where each step transforms the relational schemas into a higher normal form. Each Normal form contains all the previous normal forms and some additional optimization over them.

4.5.1 First Normal Form (1NF)

A relation is considered to be in first normal form if all of its attributes have domains that are indivisible or atomic.

The idea of atomic values for attribute ensures that there are no ‘repeating groups’ . This is because a relational database management system is capable of storing a single value only at the intersection of a row and a column. Repeating Groups are when we attempt to store multiple values at the intersection of a row and a column and a table that will contain such a value is not strictly relational.

As per C. J. Date’s extended definition [4.8], “A table is in 1NF if and only if it satisfies the following five conditions:

 There is no top-to-bottom ordering to the rows.  There is no left-to-right ordering to the columns.  There are no duplicate rows.

Chapter 4 – Relational Database Design 97

 Every row-and-column intersection contains exactly one value from the applicable

domain (and nothing else).  All columns are regular [i.e. rows have no hidden components such as row IDs,

object IDs, or hidden timestamps]. ”

A column storing "Relatives of a family" for example, are not an atomic value attribute as they refer to a set of names. While a column, like Employee Number, which cannot be broken down further is an atomic value.

Example

Consider the following table that shows the Movie relation. In the relation, {Movie_Title, Year} form a candidate key.

Movie_Title Year

Type

Director

Director_D yr_releases_cnt Actors

OB

Notting Hill 1999

Romantic Roger M

05/06/1956 30 Hugh G Rhys I

Lagaan 2000

Drama

Ashutosh G

15/02/1968 50 Aamir K Gracy S

Table 4.8 - Non-normalized relation Movie

The above relation is not in 1NF and is not even strictly relational. This is because it contains the attribute Actors with values that are further divisible. In order to convert it into

a 1NF relation, we decompose the table further into Movie Table and Cast Table as shown in Figure 4.2 below

Database Fundamentals

Figure 4.2 – Converting to First Normal Form. Example of a relation in 1NF

In Figure 4.2, the intersection of each row and column in each table now holds an atomic value that cannot be broken down further and thus the decomposition has produced a relation in 1NF, assuming the actor name in the Actors column is not divisible further as ‘first name’ and ‘surname’.

4.5.2 Second Normal Form (2NF)

A relation is in second formal form when it is in 1NF and there is no such non-key attribute that depends on part of the candidate key, but on the entire candidate key.

It follows from the above definition that a relation that has a single attribute as its candidate key is always in 2NF.

Example

To normalize the above 1NF movie relation further, we try to convert it to 2NF by eliminating any dependency on part of the candidate key. In the above, Yr_releases_cnt depends on Year. That is, Year → Yr_releases_cnt but the candidate key is {Movie_Title, Year}.

So to achieve 2NF, we further decompose the above tables into Movie relation, Yearly releases relation and Cast relation as shown in Figure 4.3.

Chapter 4 – Relational Database Design 99

Figure 4.3 – Converting to Second Normal Form

In the above figure, each non–key attribute is now dependent on the entire candidate key and not a part of it. Thus, the above decomposition has now produced a relation in 2NF.

4.5.3 Third Normal Form (3NF)

A relation is in third normal form if it is in 2NF and there is no such non-key attribute that depends transitively on the candidate key. That is every attribute depends directly on the primary key and not through a transitive relation where an attribute Z may depend on a non-key attribute Y and Y in turn depends on the primary key X.

Transitivity, as seen earlier, means that when X →Y and Y→ Z, then X→Z. It follows from 3NF relation that the non-key attributes are mutually independent.

Example

To normalize the above 2NF movie relation further, we try to convert it to 3NF by eliminating any transitive dependency of non-prime attribute on the primary key. In the above Figure 4.3, Director_DOB depends on Director, that is Director → Director_DOB.

Nevertheless, the candidate key is {Movie_Title, Year}. So here {Movie_Title, Year} → Director and Director → Director_DOB hence there is transitive dependency.

Therefore, to achieve 3NF, we further decompose the above tables into Movie relation, Director Relation, Yearly releases relation and Cast relation as shown in Figure 4.4.

Database Fundamentals 100

Figure 4.4 – Converting to Third Normal Form

In the figure above, each non–key attribute is mutually independent and depend on just the whole candidate key. Thus, the above decomposition has now produced a relation in 3NF.

4.5.4 Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form is a stricter version of 3NF that applies to relations where there may be overlapping candidate keys.

A relation is said to be in Boyce-Codd normal form if it is in 3NF and every non-trivial FD given for this relation has a candidate key as its determinant. That is, for every X → Y, X

is a candidate key.

Example

Consider a Guest Lecture relation for a college as shown in Table 4.9 below. Assume each teacher teaches only one subject.

Candidate Keys: {Subject, Lecture_Day}, {Lecture_Day, Teacher}

Chapter 4 – Relational Database Design 101

Subject Lecture_Day

Teacher

Graphics Monday

Dr. Arindham Singh

Databases Monday

Dr. Emily Jose

Java Wednesday

Dr. Prasad

Graphics Tuesday

Dr. Arindham Singh

Java Thursday

Dr. George White

Table 4.9 - Guest Lecture relation

In the above relation, there are no non-atomic values hence, it is in 1NF. All the attributes are part of candidate keys hence it is in 2NF and 3NF. An FD, Teacher → Subject exists for the above relation. However, Teacher alone is not a

candidate key; therefore, the above relation is not a BCNF. To convert it into a BCNF relation, we decompose it into relations Subject area experts and Lecture timetable as shown in Figure 4.5.

Figure 4.5 – Converting to Boyce-Codd Normal Form

The relation in Table 4.9 is now decomposed into BCNF as for non–trivial FD, Teacher → Subject. Teacher is now a candidate key in Figure 4.5 (a) Subject area experts’ relation.