Introduction to Schema Refinement
CHAPTER 5
SCHEMA REFINEMENT
Prepared By :- Vijaykumar Mantri, BVRIT, NSP
• Conceptual database design gives us a set
of relation schemas and integrity constraints
(ICs) that can be regarded as a good starting
point for the final database design.
• This initial design must be refined by taking
the lCs into account more fully than is
possible with just the ER model constructs
and also by considering performance criteria
and typical workloads.
Introduction to Schema Refinement
• We now present an overview of the
problems that schema refinement is
intended to address and a refinement
approach based on decompositions.
• Redundant storage of information is the
root cause of these problems.
• Although decomposition can eliminate
redundancy, it can lead to problems of
its own and should be used with
caution.
Introduction to Schema Refinement
1) Problems caused by Redundancy
Redundant Storage
Update Anomalies
Insertion Anomalies
Deletion Anomalies
Hourly_Emps (SSN, Name, Lot,
Rating, Hourly_wages, Hours_worked)
SSN
Name
Lot Rating Hourly Hours_
_wages worked
123 Rajesh
48
8
10
40
456
326
434
612
22
35
35
35
8
5
5
8
10
7
7
10
30
30
32
40
Ajay
Arun
Kamal
Nitin
2. Decompositions
• The Problems arising from redundancy can be
solved by replacing a relation with collection of
smaller relations.
• A Decomposition of a relation schema R
consists of replacing the relation schema by two
(or more) relation schemas that each contain a
subset of attributes of R and together include all
attributes of R.
• Hourly_Emps2 (SSN, Name, Lot, Rating,
Hours_worked)
• Wages( Rating, Hourly_wages)
Problems related to Decomposition
• Unless we are careful decomposing a relation
schema can create some problems than it
solves.
We need to ask two questions repeatedly
1) Is there reason to decompose a relation?
• To answer this question, several normal forms
have been proposed for relations.
• If a relation schema is in one of these normal
forms, we know that certain kinds of problems
cannot arise.
2) What problems (if any) does the decomposition
cause?
• With respect to the second question, two properties
of decompositions are of particular interest. The
lossless-join property enables us to recover any
instance of the decomposed relation from
corresponding instances of the smaller relations.
• The dependency-preservation property enables us to
enforce any constraint on the original relation by
simply enforcing some constraints on each of the
smaller relations. That is, we need not perform joins
of the smaller relations to check whether a constraint
on the original relation is violated.
Functional Dependencies
• A Functional Dependencies (FD) is a kind of
IC that generalizes the concept of a key.
• Let R be a relation schema & let X & Y be
nonempty sets of attributes in R. then an
instance r of R satisfies the FD X Y if
following holds for every pair of tuples t1 & t2
in r
If t1.X = t2.X then t1.Y = t2.Y
A
B
C
D
a1
b1
c1
d1
a1
b1
c1
d2
a1
b2
c2
D1
a2
b1
c3
d1
AB C
Closure of a Set of FDs
• We say that an FD f is implied by a given set F
of FDs if f holds on every relation instance that
satisfies all dependencies in F; that is, f holds
whenever all FDs in F hold.
• The set of all FDs implied by a given set F of
FDs is called the closure of F, denoted by F+.
• The three rules called Armstrong’s Axioms, can
be applied repeatedly to infer all FDs implied by
a set F of FDs.
Armstrong’s Axioms
Here X, Y & Z denote sets of attributes of relation
R:
• Reflexivity : If X Y, then X Y.
• Augmentation :
If X Y, then XZ YZ for any Z.
• Transitivity :
If X Y and Y Z, then X Z
• Union : If X Y & X Z, then XYZ
• Decomposition :
If XYZ, then X Y & X Z
•
•
Contracts ( contractid, supplierid, projectid,
deptid, partid, qty, value)
This can be denoted as CSJDPQV.
The meaning of tuple is that the contract with
contractid C is an agreement that supplier S
will supply Q items of part P to project J
associated with department D, the value V of
this contract is equal to value.
• The ICs are known to hold are
1.The contract id C is a key : C CSJDPQV
2.A project purchases a given part using a single
contract: JP C
3.A department purchases at most one part from
supplier: SD P
•
•
•
•
•
Some additional FDs hold in the
closure of the set of given FDs
From JP C, C CSJDPQV & transitivity
JP CSJDPQV
From SD P & augmentation
SDJ JP
From SDJ JP & JP CSJDPQV &
transitivity
SDJ CSJDPQV
From C CSJDPQV using decomposition
C C, C S, C J, etc.
And we may have number of FDs from
reflexivity.
Attribute Closure
• If we just want to check whether a given
dependency, say, X Y, is in the closure of a
set F of FDs, we can do so efficiently without
computing F+.
• We first cornpute the Attribute closure X+ with
respect to F, is the set of attributes A such that X
A can be inferred using the Armstrong
Axioms. We can find attribute closure using this
algorithm.
Closure = X
Repeat until there is no change: {
If there is an FD V W in F such that
V C closure,
then set closure = closure U W
}
Definitions
• Already we know definition of Key, Candidate Key
& Primary Key.
• Superkey – A superkey of a relation schema
R={A1, A2, …An} is a set of attributes S R with
property that no two tuples t1 & t2 in any legal
relation state r of R will have t1[S]=t2[S].
• Prime Attribute – An attribute of relation schema
R is called a prime attribute of R if it is a member of
some candidate key of R.
In above example Marks is fully functionally
dependent on STUDENT# COURSE# and not on
subset of STUDENT# COURSE#. This means Marks
can not be determined either by STUDENT# OR
COURSE# alone. It can be determined only using
STUDENT# AND COURSE# together. Hence Marks
is fully functionally dependent on STUDENT#
COURSE#.
CourseName is not fully functionally dependent on
STUDENT#
COURSE#
because
subset
of
STUDENT#
COURSE#
i.e
only
COURSE#
determines the CourseName and STUDENT# does
not have any role in deciding CourseName. Hence
CourseName is not fully functionally dependent on
STUDENT# COURSE#.
In the above relationship CourseName,
IName, Room# are partially dependent on
composite attributes STUDENT# COURSE#
because
COURSE#
alone
defines
the
CourseName, IName, Room#.
In above example, Room# depends on IName
and in turn IName depends on COURSE#.
Hence Room# transitively depends on
COURSE#.
Similarly Grade depends on Marks, in turn
Marks depends on STUDENT# COURSE#
hence Grade depends Fully transitively on
STUDENT# COURSE#.
Transitive: Indirect
Normal Forms
• First Normal Form (1NF)
– Atomic values
• Second Normal Form (2NF), Third Normal
Form 3NF & Boyce-Codd Normal Form
(BCNF)
– based on primary keys
• Fourth Normal Form (4NF)
– based on keys, multi-valued
dependencies
• Fifth Normal Form (5NF )
– based on keys, join dependencies
• Domain-Key Normal Form
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each higher level is a subset of the lower level
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
BoyceCodd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
Most databases should be 3NF or BCNF in
order to avoid the database anomalies.
First Normal Form (1NF)
• Historically, it is designed to
disallow
– Composite attributes
– Multivalued attributes
– Or the combination of both
• All the values need to be
atomic
In relational database design it is not practically
possible to have a table which is not in 1NF.
ISBN
Title
AuName
AuPhone
PubName
PubPhone
Price
0-321-32132-1
Balloon
Sleepy,
Snoopy,
Grumpy
321-321-1111,
232-234-1234,
665-235-6532
Small House
714-000-0000
$34.00
0-55-123456-9
Main Street
Jones,
Smith
123-333-3333,
654-223-3455
Small House
714-000-0000
$22.95
0-123-45678-0
Ulysses
Joyce
666-666-6666
Alpha Press
999-999-9999
$34.00
1-22-233700-0
Visual
Basic
Roman
444-444-4444
Big House
123-456-7890
$25.00
Author and AuPhone columns are multivalued
ISBN
AuName
AuPhone
0-321-32132-1
Sleepy
321-321-1111
ISBN
Title
PubName
PubPhone
Price
0-321-32132-1
Snoopy
232-234-1234
0-321-32132-1
Balloon
Small House
714-000-0000
$34.00
0-321-32132-1
Grumpy
665-235-6532
0-55-123456-9
Main Street
Small House
714-000-0000
$22.95
0-55-123456-9
Jones
123-333-3333
0-123-45678-0
Ulysses
Alpha Press
999-999-9999
$34.00
0-55-123456-9
Smith
654-223-3455
1-22-233700-0
Visual
Basic
Big House
123-456-7890
$25.00
0-123-45678-0
Joyce
666-666-6666
1-22-233700-0
Roman
444-444-4444
Result Table
Second Normal Form (2NF)
• fd1 and fd4 are partial functional
dependencies. Normalize to:
– Emp (eno, ename, title, bdate, salary, supereno,
dno)
– WorksOn (eno, pno, resp, hours)
– Proj (pno, pname, budget)
Old Scheme {Studio, Movie, Budget, Studio_City}
1.
2.
3.
4.
5.
Key {studio, movie}
{studio, movie} {budget}
{studio} {studio_city}
studio_city is not a part of a key
studio_city functionally depends on studio which is a
proper subset of the key
New Scheme {Studio, Movie, Budget}
New Scheme {Studio, Studio_City}
Scheme {City, Street,
HouseColor, CityPopulation}
1.
2.
3.
4.
5.
HouseNumber,
key {City, Street, HouseNumber}
{City, Street, HouseNumber} {HouseColor}
{City} {CityPopulation}
CityPopulation does not belong to any key.
CityPopulation is functionally dependent on the City
which is a proper subset of the key
New Scheme {City, Street, HouseNumber,
HouseColor}
New Scheme {City, CityPopulation}
Third Normal Form (3NF)
• Third normal form (3NF) is based on the
concept of transitive dependency.
A functional dependency X Y in a
relation schema R is a transitive dependency
if there is a set of attributes Z that is neither
a candidate key nor a subset of any key of
R, and both X Z and Z Y hold.
• Definition : A relation schema R is in 3NF if
it satisfies 2NF and no nonprime attribute of
R is transitively dependent on the primary
key.
Let R be a relation schema, F be the
set of FDs given to hold over R, X be a
subset of the attributes of R and A be an
attribute of R.
R is in third normal form if, for every FD X
A in F, one of the following statement is
true.
• A X, that is, it is a trivial FD or
• X is a superkey or
• A is part of some key for R.
Result Table
RESULTMARKS TABLE
Third Normal Form (3NF)
fd2 results in a transitive dependency eno →
salary. Remove it.
Scheme {Title, PubID, PageCount, Price }
1.
2.
3.
4.
5.
Key {Title, PubId}
{Title, PubId} {PageCount}
{PageCount} {Price}
Both Price and PageCount depend on a key hence 2NF
Transitively {Title, PubID} {Price} hence not in 3NF
New Scheme {PubID, PageCount, Price}
New Scheme {Title, PubID, PageCount}
Scheme {BuildingID, Contractor, Fee}
1.
Primary Key {BuildingID}
2.
{BuildingID} {Contractor}
3.
{Contractor} {Fee}
4.
5.
{BuildingID} {Fee}
Fee transitively depends on the BuildingID
6.
Both Contractor and Fee depend on the entire key hence 2NF
New Scheme {BuildingID, Contractor}
New Scheme {Contractor, Fee}
Boyce-Codd Normal Form (BCNF)
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
Candidate keys in the relation are composite
keys (they are not single attributes)
There is more than one candidate key in the
relation, and
The keys are not disjoint, that is, some
attributes in the keys are common
Boyce-Codd Normal Form (BCNF)
• Let R be a relation schema, F be the set of FDs
given to hold over R, X be a subset of the
attributes of R and A be an attribute of R. R is in
Boyce-Codd normal form if, for every FD X A in
F, one of the following statement is true.
A X, that is, it is a trivial FD or
X is a superkey.
• The difference between 3NF and BCNF is that 3NF
allows a FD X → Y to remain in the relation if X is a
superkey or Y is a prime attribute. BCNF only
allows this FD if X is a superkey.
• Thus, BCNF is more restrictive than 3NF.
However, in practice most relations in 3NF are also
in BCNF.
BCNF versus 3NF
• We can decompose to BCNF but sometimes we do
not want to if we lose a FD.
• The decision to use 3NF or BCNF depends on the
amount of redundancy we are willing to accept and
the willingness to lose a functional dependency.
• Note that we can always preserve the lossless-join
property (recovery) with a BCNF decomposition,
but we do no always get dependency preservation.
• In contrast, we get both recovery and dependency
preservation with a 3NF decomposition.
An example of not having dependency preservation with
BCNF:
Scheme {City, Street, ZipCode }
1. Key1 {City, Street }
2. Key2 {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} {ZipCode}
5. {ZipCode} {City}
6. Dependency between attributes belonging to a key
New Scheme1 {ZipCode, Street }
New Scheme2 {ZipCode, City}
• Consider the relation schema LOTS1A
shown in Figure, which describes land for sale
in various countries. Suppose that there are
two candidate keys:
PROPERTY_ID#
and {COUNTY_NAME, LOT#}
that is, LOT Numbers are unique only within
each Country, but PROPERTY_ID numbers
are unique across all Countries.
• Suppose that we have thousands of lots in
the relation but the lots are from only two
countries: Nepal & Srilanka.
• Suppose also that lot sizes in Nepal are only
0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 acres,
whereas lot sizes in Srilanka are restricted to
1.1, 1.2, ... , 1.9, and 2.0 acres.
• In such a situation we would have the
additional functional dependency FD3: AREA
COUNTY_NAME.
FD3
• If we add this to the other dependencies, the
relation schema LOTS1A still is in 3NF
because COUNTY_NAME is a prime attribute.
• The area of a lot that determines the country, as
specified by FD3, can be represented by 16 tuples
in a separate relation R(AREA,
COUNTRY_NAME), since there are only 16
possible AREA values. This representation
reduces the redundancy of repeating the same
information in the thousands of LOTS1A tuples.
• We can decompose LOTS1A into two BCNF
relations LOTSlAX and LOTSlAY.
FD3
This decomposition loses the functional dependency
FD2 because its attributes no longer coexist in the same
relation after decomposition.
The closure of F contains all dependencies in F+
AC, BA & CB.
Consequently FAB also contains BA & FBC
contains CB. Therefore FAB U FBC contains
AB, BC, BA & CB.
The closure of the dependencies in FAB & FBC now
includes CA.
Thus the decomposition preserves the dependency
CA.
Multivalued Dependencies
• Suppose that we have a relation with
attributes course, teacher, and book, which we
denote as CTB.
• The meaning of a tuple is that teacher T can
teach course C, and book B is a
recommended text for the course.
• There are no FDs; the key is CTB.
• However, the recommended texts for a course
are independent of the instructor.
• The instance shown in Figure illustrates this
situation.
Course
Physics101
Teacher
Green
Book
Mechanics
Physicsl0l
Green
Optics
Physicsl0l
Brown
Mechanics
Physics101
Brown
Optics
Math301
Math301
Math301
Green
Green
Green
Mechanics
Vectors
Geometry
Figure Instance of CTB
• The schema is in BCNF
• There is redundancy in schema.
• Green can teach Physics101 is recorded once per
recommended text for the course.
• Similarly, the fact that Optics is a text for
Physics101 is recorded once per potential teacher.
• The redundancy can be eliminated by
decomposing CTB into CT & CB.
• The redundancy in this example is due to the
constraint that the texts for course independent of
the instructors, which cannot be expressed in
terms of FDs.
• This constraint is an example of Multivalued
Dependency or MVD.
• Let R be a relation schema and let X and Y
be subsets of the attributes of R. Intuitively,
the Multivalued Dependency X Y is
said to hold over R if, in every legal instance
r of R, each X value is associated with a set
of Y values and this set is independent of the
values in the other attributes.
• Formally, if the MVD X Y holds over
and Z = R - XY, the following must be true
for every legal instance r of R
If tl r, t2 r and t1.X= t2.X,
then there must be some t3 r such that
t1.XY = t3.XY and t2· Z = t3.Z.
• If we are given the first
two tuples and told that
the MVD X Y
holds over this relation,
we can infer that the
relation instance must
also contain the third
tuple.
X
Y
Z
A
B1
C1
A
B2
C2
A
B1
C2
A
B2
C1
Fourth Normal Form
• Fourth Normal Form (4NF) is a direct
generalization of BCNF. R be a relation
schema, X and Y be nonempty subsets of
the attributes of R, and F be a set of
dependencies that includes both FDs and
MVDs R is said to be in Fourth Normal Form
(4NF), if, for every MVD XY that holds
over R, one of the following statements is
true:
• Y X or XY = R or
• X is a Superkey.
• The relation CTB is not in 4NF because
C T is a nontrivial MVD and C is not a
key.
• We can eliminate the resulting redundancy
by decomposing CTB into CT and CB; each
of these relations is then in 4NF.
SCHEMA REFINEMENT
Prepared By :- Vijaykumar Mantri, BVRIT, NSP
• Conceptual database design gives us a set
of relation schemas and integrity constraints
(ICs) that can be regarded as a good starting
point for the final database design.
• This initial design must be refined by taking
the lCs into account more fully than is
possible with just the ER model constructs
and also by considering performance criteria
and typical workloads.
Introduction to Schema Refinement
• We now present an overview of the
problems that schema refinement is
intended to address and a refinement
approach based on decompositions.
• Redundant storage of information is the
root cause of these problems.
• Although decomposition can eliminate
redundancy, it can lead to problems of
its own and should be used with
caution.
Introduction to Schema Refinement
1) Problems caused by Redundancy
Redundant Storage
Update Anomalies
Insertion Anomalies
Deletion Anomalies
Hourly_Emps (SSN, Name, Lot,
Rating, Hourly_wages, Hours_worked)
SSN
Name
Lot Rating Hourly Hours_
_wages worked
123 Rajesh
48
8
10
40
456
326
434
612
22
35
35
35
8
5
5
8
10
7
7
10
30
30
32
40
Ajay
Arun
Kamal
Nitin
2. Decompositions
• The Problems arising from redundancy can be
solved by replacing a relation with collection of
smaller relations.
• A Decomposition of a relation schema R
consists of replacing the relation schema by two
(or more) relation schemas that each contain a
subset of attributes of R and together include all
attributes of R.
• Hourly_Emps2 (SSN, Name, Lot, Rating,
Hours_worked)
• Wages( Rating, Hourly_wages)
Problems related to Decomposition
• Unless we are careful decomposing a relation
schema can create some problems than it
solves.
We need to ask two questions repeatedly
1) Is there reason to decompose a relation?
• To answer this question, several normal forms
have been proposed for relations.
• If a relation schema is in one of these normal
forms, we know that certain kinds of problems
cannot arise.
2) What problems (if any) does the decomposition
cause?
• With respect to the second question, two properties
of decompositions are of particular interest. The
lossless-join property enables us to recover any
instance of the decomposed relation from
corresponding instances of the smaller relations.
• The dependency-preservation property enables us to
enforce any constraint on the original relation by
simply enforcing some constraints on each of the
smaller relations. That is, we need not perform joins
of the smaller relations to check whether a constraint
on the original relation is violated.
Functional Dependencies
• A Functional Dependencies (FD) is a kind of
IC that generalizes the concept of a key.
• Let R be a relation schema & let X & Y be
nonempty sets of attributes in R. then an
instance r of R satisfies the FD X Y if
following holds for every pair of tuples t1 & t2
in r
If t1.X = t2.X then t1.Y = t2.Y
A
B
C
D
a1
b1
c1
d1
a1
b1
c1
d2
a1
b2
c2
D1
a2
b1
c3
d1
AB C
Closure of a Set of FDs
• We say that an FD f is implied by a given set F
of FDs if f holds on every relation instance that
satisfies all dependencies in F; that is, f holds
whenever all FDs in F hold.
• The set of all FDs implied by a given set F of
FDs is called the closure of F, denoted by F+.
• The three rules called Armstrong’s Axioms, can
be applied repeatedly to infer all FDs implied by
a set F of FDs.
Armstrong’s Axioms
Here X, Y & Z denote sets of attributes of relation
R:
• Reflexivity : If X Y, then X Y.
• Augmentation :
If X Y, then XZ YZ for any Z.
• Transitivity :
If X Y and Y Z, then X Z
• Union : If X Y & X Z, then XYZ
• Decomposition :
If XYZ, then X Y & X Z
•
•
Contracts ( contractid, supplierid, projectid,
deptid, partid, qty, value)
This can be denoted as CSJDPQV.
The meaning of tuple is that the contract with
contractid C is an agreement that supplier S
will supply Q items of part P to project J
associated with department D, the value V of
this contract is equal to value.
• The ICs are known to hold are
1.The contract id C is a key : C CSJDPQV
2.A project purchases a given part using a single
contract: JP C
3.A department purchases at most one part from
supplier: SD P
•
•
•
•
•
Some additional FDs hold in the
closure of the set of given FDs
From JP C, C CSJDPQV & transitivity
JP CSJDPQV
From SD P & augmentation
SDJ JP
From SDJ JP & JP CSJDPQV &
transitivity
SDJ CSJDPQV
From C CSJDPQV using decomposition
C C, C S, C J, etc.
And we may have number of FDs from
reflexivity.
Attribute Closure
• If we just want to check whether a given
dependency, say, X Y, is in the closure of a
set F of FDs, we can do so efficiently without
computing F+.
• We first cornpute the Attribute closure X+ with
respect to F, is the set of attributes A such that X
A can be inferred using the Armstrong
Axioms. We can find attribute closure using this
algorithm.
Closure = X
Repeat until there is no change: {
If there is an FD V W in F such that
V C closure,
then set closure = closure U W
}
Definitions
• Already we know definition of Key, Candidate Key
& Primary Key.
• Superkey – A superkey of a relation schema
R={A1, A2, …An} is a set of attributes S R with
property that no two tuples t1 & t2 in any legal
relation state r of R will have t1[S]=t2[S].
• Prime Attribute – An attribute of relation schema
R is called a prime attribute of R if it is a member of
some candidate key of R.
In above example Marks is fully functionally
dependent on STUDENT# COURSE# and not on
subset of STUDENT# COURSE#. This means Marks
can not be determined either by STUDENT# OR
COURSE# alone. It can be determined only using
STUDENT# AND COURSE# together. Hence Marks
is fully functionally dependent on STUDENT#
COURSE#.
CourseName is not fully functionally dependent on
STUDENT#
COURSE#
because
subset
of
STUDENT#
COURSE#
i.e
only
COURSE#
determines the CourseName and STUDENT# does
not have any role in deciding CourseName. Hence
CourseName is not fully functionally dependent on
STUDENT# COURSE#.
In the above relationship CourseName,
IName, Room# are partially dependent on
composite attributes STUDENT# COURSE#
because
COURSE#
alone
defines
the
CourseName, IName, Room#.
In above example, Room# depends on IName
and in turn IName depends on COURSE#.
Hence Room# transitively depends on
COURSE#.
Similarly Grade depends on Marks, in turn
Marks depends on STUDENT# COURSE#
hence Grade depends Fully transitively on
STUDENT# COURSE#.
Transitive: Indirect
Normal Forms
• First Normal Form (1NF)
– Atomic values
• Second Normal Form (2NF), Third Normal
Form 3NF & Boyce-Codd Normal Form
(BCNF)
– based on primary keys
• Fourth Normal Form (4NF)
– based on keys, multi-valued
dependencies
• Fifth Normal Form (5NF )
– based on keys, join dependencies
• Domain-Key Normal Form
Levels of Normalization
1NF
2NF
3NF
4NF
5NF
DKNF
Each higher level is a subset of the lower level
Normalization
No transitive
dependency
between
nonkey
attributes
All
determinants
are candidate
keys - Single
multivalued
dependency
BoyceCodd and
Higher
Functional
dependency
of nonkey
attributes on
the primary
key - Atomic
values only
Full
Functional
dependency
of nonkey
attributes on
the primary
key
Most databases should be 3NF or BCNF in
order to avoid the database anomalies.
First Normal Form (1NF)
• Historically, it is designed to
disallow
– Composite attributes
– Multivalued attributes
– Or the combination of both
• All the values need to be
atomic
In relational database design it is not practically
possible to have a table which is not in 1NF.
ISBN
Title
AuName
AuPhone
PubName
PubPhone
Price
0-321-32132-1
Balloon
Sleepy,
Snoopy,
Grumpy
321-321-1111,
232-234-1234,
665-235-6532
Small House
714-000-0000
$34.00
0-55-123456-9
Main Street
Jones,
Smith
123-333-3333,
654-223-3455
Small House
714-000-0000
$22.95
0-123-45678-0
Ulysses
Joyce
666-666-6666
Alpha Press
999-999-9999
$34.00
1-22-233700-0
Visual
Basic
Roman
444-444-4444
Big House
123-456-7890
$25.00
Author and AuPhone columns are multivalued
ISBN
AuName
AuPhone
0-321-32132-1
Sleepy
321-321-1111
ISBN
Title
PubName
PubPhone
Price
0-321-32132-1
Snoopy
232-234-1234
0-321-32132-1
Balloon
Small House
714-000-0000
$34.00
0-321-32132-1
Grumpy
665-235-6532
0-55-123456-9
Main Street
Small House
714-000-0000
$22.95
0-55-123456-9
Jones
123-333-3333
0-123-45678-0
Ulysses
Alpha Press
999-999-9999
$34.00
0-55-123456-9
Smith
654-223-3455
1-22-233700-0
Visual
Basic
Big House
123-456-7890
$25.00
0-123-45678-0
Joyce
666-666-6666
1-22-233700-0
Roman
444-444-4444
Result Table
Second Normal Form (2NF)
• fd1 and fd4 are partial functional
dependencies. Normalize to:
– Emp (eno, ename, title, bdate, salary, supereno,
dno)
– WorksOn (eno, pno, resp, hours)
– Proj (pno, pname, budget)
Old Scheme {Studio, Movie, Budget, Studio_City}
1.
2.
3.
4.
5.
Key {studio, movie}
{studio, movie} {budget}
{studio} {studio_city}
studio_city is not a part of a key
studio_city functionally depends on studio which is a
proper subset of the key
New Scheme {Studio, Movie, Budget}
New Scheme {Studio, Studio_City}
Scheme {City, Street,
HouseColor, CityPopulation}
1.
2.
3.
4.
5.
HouseNumber,
key {City, Street, HouseNumber}
{City, Street, HouseNumber} {HouseColor}
{City} {CityPopulation}
CityPopulation does not belong to any key.
CityPopulation is functionally dependent on the City
which is a proper subset of the key
New Scheme {City, Street, HouseNumber,
HouseColor}
New Scheme {City, CityPopulation}
Third Normal Form (3NF)
• Third normal form (3NF) is based on the
concept of transitive dependency.
A functional dependency X Y in a
relation schema R is a transitive dependency
if there is a set of attributes Z that is neither
a candidate key nor a subset of any key of
R, and both X Z and Z Y hold.
• Definition : A relation schema R is in 3NF if
it satisfies 2NF and no nonprime attribute of
R is transitively dependent on the primary
key.
Let R be a relation schema, F be the
set of FDs given to hold over R, X be a
subset of the attributes of R and A be an
attribute of R.
R is in third normal form if, for every FD X
A in F, one of the following statement is
true.
• A X, that is, it is a trivial FD or
• X is a superkey or
• A is part of some key for R.
Result Table
RESULTMARKS TABLE
Third Normal Form (3NF)
fd2 results in a transitive dependency eno →
salary. Remove it.
Scheme {Title, PubID, PageCount, Price }
1.
2.
3.
4.
5.
Key {Title, PubId}
{Title, PubId} {PageCount}
{PageCount} {Price}
Both Price and PageCount depend on a key hence 2NF
Transitively {Title, PubID} {Price} hence not in 3NF
New Scheme {PubID, PageCount, Price}
New Scheme {Title, PubID, PageCount}
Scheme {BuildingID, Contractor, Fee}
1.
Primary Key {BuildingID}
2.
{BuildingID} {Contractor}
3.
{Contractor} {Fee}
4.
5.
{BuildingID} {Fee}
Fee transitively depends on the BuildingID
6.
Both Contractor and Fee depend on the entire key hence 2NF
New Scheme {BuildingID, Contractor}
New Scheme {Contractor, Fee}
Boyce-Codd Normal Form (BCNF)
• Most 3NF relations are also BCNF
relations.
• A 3NF relation is NOT in BCNF if:
Candidate keys in the relation are composite
keys (they are not single attributes)
There is more than one candidate key in the
relation, and
The keys are not disjoint, that is, some
attributes in the keys are common
Boyce-Codd Normal Form (BCNF)
• Let R be a relation schema, F be the set of FDs
given to hold over R, X be a subset of the
attributes of R and A be an attribute of R. R is in
Boyce-Codd normal form if, for every FD X A in
F, one of the following statement is true.
A X, that is, it is a trivial FD or
X is a superkey.
• The difference between 3NF and BCNF is that 3NF
allows a FD X → Y to remain in the relation if X is a
superkey or Y is a prime attribute. BCNF only
allows this FD if X is a superkey.
• Thus, BCNF is more restrictive than 3NF.
However, in practice most relations in 3NF are also
in BCNF.
BCNF versus 3NF
• We can decompose to BCNF but sometimes we do
not want to if we lose a FD.
• The decision to use 3NF or BCNF depends on the
amount of redundancy we are willing to accept and
the willingness to lose a functional dependency.
• Note that we can always preserve the lossless-join
property (recovery) with a BCNF decomposition,
but we do no always get dependency preservation.
• In contrast, we get both recovery and dependency
preservation with a 3NF decomposition.
An example of not having dependency preservation with
BCNF:
Scheme {City, Street, ZipCode }
1. Key1 {City, Street }
2. Key2 {ZipCode, Street}
3. No non-key attribute hence 3NF
4. {City, Street} {ZipCode}
5. {ZipCode} {City}
6. Dependency between attributes belonging to a key
New Scheme1 {ZipCode, Street }
New Scheme2 {ZipCode, City}
• Consider the relation schema LOTS1A
shown in Figure, which describes land for sale
in various countries. Suppose that there are
two candidate keys:
PROPERTY_ID#
and {COUNTY_NAME, LOT#}
that is, LOT Numbers are unique only within
each Country, but PROPERTY_ID numbers
are unique across all Countries.
• Suppose that we have thousands of lots in
the relation but the lots are from only two
countries: Nepal & Srilanka.
• Suppose also that lot sizes in Nepal are only
0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 acres,
whereas lot sizes in Srilanka are restricted to
1.1, 1.2, ... , 1.9, and 2.0 acres.
• In such a situation we would have the
additional functional dependency FD3: AREA
COUNTY_NAME.
FD3
• If we add this to the other dependencies, the
relation schema LOTS1A still is in 3NF
because COUNTY_NAME is a prime attribute.
• The area of a lot that determines the country, as
specified by FD3, can be represented by 16 tuples
in a separate relation R(AREA,
COUNTRY_NAME), since there are only 16
possible AREA values. This representation
reduces the redundancy of repeating the same
information in the thousands of LOTS1A tuples.
• We can decompose LOTS1A into two BCNF
relations LOTSlAX and LOTSlAY.
FD3
This decomposition loses the functional dependency
FD2 because its attributes no longer coexist in the same
relation after decomposition.
The closure of F contains all dependencies in F+
AC, BA & CB.
Consequently FAB also contains BA & FBC
contains CB. Therefore FAB U FBC contains
AB, BC, BA & CB.
The closure of the dependencies in FAB & FBC now
includes CA.
Thus the decomposition preserves the dependency
CA.
Multivalued Dependencies
• Suppose that we have a relation with
attributes course, teacher, and book, which we
denote as CTB.
• The meaning of a tuple is that teacher T can
teach course C, and book B is a
recommended text for the course.
• There are no FDs; the key is CTB.
• However, the recommended texts for a course
are independent of the instructor.
• The instance shown in Figure illustrates this
situation.
Course
Physics101
Teacher
Green
Book
Mechanics
Physicsl0l
Green
Optics
Physicsl0l
Brown
Mechanics
Physics101
Brown
Optics
Math301
Math301
Math301
Green
Green
Green
Mechanics
Vectors
Geometry
Figure Instance of CTB
• The schema is in BCNF
• There is redundancy in schema.
• Green can teach Physics101 is recorded once per
recommended text for the course.
• Similarly, the fact that Optics is a text for
Physics101 is recorded once per potential teacher.
• The redundancy can be eliminated by
decomposing CTB into CT & CB.
• The redundancy in this example is due to the
constraint that the texts for course independent of
the instructors, which cannot be expressed in
terms of FDs.
• This constraint is an example of Multivalued
Dependency or MVD.
• Let R be a relation schema and let X and Y
be subsets of the attributes of R. Intuitively,
the Multivalued Dependency X Y is
said to hold over R if, in every legal instance
r of R, each X value is associated with a set
of Y values and this set is independent of the
values in the other attributes.
• Formally, if the MVD X Y holds over
and Z = R - XY, the following must be true
for every legal instance r of R
If tl r, t2 r and t1.X= t2.X,
then there must be some t3 r such that
t1.XY = t3.XY and t2· Z = t3.Z.
• If we are given the first
two tuples and told that
the MVD X Y
holds over this relation,
we can infer that the
relation instance must
also contain the third
tuple.
X
Y
Z
A
B1
C1
A
B2
C2
A
B1
C2
A
B2
C1
Fourth Normal Form
• Fourth Normal Form (4NF) is a direct
generalization of BCNF. R be a relation
schema, X and Y be nonempty subsets of
the attributes of R, and F be a set of
dependencies that includes both FDs and
MVDs R is said to be in Fourth Normal Form
(4NF), if, for every MVD XY that holds
over R, one of the following statements is
true:
• Y X or XY = R or
• X is a Superkey.
• The relation CTB is not in 4NF because
C T is a nontrivial MVD and C is not a
key.
• We can eliminate the resulting redundancy
by decomposing CTB into CT and CB; each
of these relations is then in 4NF.