COP 5725 Fall 2012 Database Management Systems
COP 5725 Fall 2012 Database Management Systems
University of Florida, CISE Department Prof. Daisy Zhe Wang
Schema Refinement and Normal Forms
Functional Dependencies Normal Forms
Decompositions Review: Database Design Steps
- Requirements Analysis • Conceptual Database Design
- – E/R model
- Logical Database Design
- – E.g., relational data model
- – Output: logical/conceptual schema
- Schema Refinement • Physical Database Design • Application and Security Design
Why Schema Refinement?
- After Conceptual and Logical Database Design relational schemas + IC’s
- Problems caused by bad schema design
- – Redundant storage
- – : one occurrence of a fact
Update anomaly is changed, but not all occurrences.
- – : cannot insert a piece
Insertion anomaly of data without first inserting another
- – : valid fact is lost when a
Deletion anomaly tuple is deleted. Example of Bad Design Drinkers(name, addr, beersLiked, manf, favBeer) name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway ??? WickedAle ??? Pete’s Spock Enterprise Bud ??? Bud
Data is redundant, because each of the ??? ’s can be figured
out by using the FD’s name -> addr favBeer and beersLiked -> manf .This Bad Design Also Exhibits Anomalies name addr beersLiked manf favBeer
Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud- Update anomaly
- Insert anomaly
- Deletion anomaly
Decompositions
- Integrity constraints, in particular
, can be used to
functional dependencies
identify schemas with such problems and to suggest refinements.
- Main refinement technique:
decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).
Problems Related to Decompositions
- Do we need to decompose a relation?
- – Use FDs and normal forms
- What problems (if any) does the decomposition cause?
- – 2 important properties:
- Lossless-join
- Dependency-preservation (decide efficiency in checking the IC’s)
- – Performance vs. redundancy
Functional Dependencies
- > is an assertion about a relation
X A R
- Say “X determines A holds in R.”
- – Convention : …, , , represent sets of
X Y Z attributes; , , ,… represent single attributes. A B C
: sets of attributes can be written
- – Convention
without set formers: e.g., just , rather than ABC { , , }.
A B C Example name addr beersLiked manf favBeer
Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud Because name -> addrBecause name -> favBeer Because beersLiked -> manf Drinkers(name, addr, beersLiked, manf, favBeer)
Reasonable FD’s to assert: 1. name -> addr 2. name -> favBeer
FD’s With Multiple Attributes
- No need for FD’s with > 1 attribute on right.
- > 1 attribute on left may be essential.
Where Do FD’s Come From?
- Key constraints on entities and relationships
- – ->
K A holds for R (candidate keys)
- – many-one, one-one relationships
- – domain knowledge of the application
Inferring FD’s
- We are given a set of FD’s : -> ,
F
X A
1
1
- > -> , we want to know
X A ,…,
X A
2
2 n n
whether an FD Y -> B must hold in every relation instance that satisfies the given FD’s in F . If so, we say Y -> B is implied by .
F
- Example: -> is implied by ->
A C A B
and B -> C
Closure and Armstrong’s Axioms
Examples
Attribute Closure
Closure Test
- An easier way to test is to compute the
- of , denoted .
closure Y Y : = .
- Basis Y Y
- Induction : Look for an FD
X -> A , which
has left side that is a subset of the
X
- current and right side A is not in .
Y Y
- If exist, add A to Y .
- + X A Exercise, FD inference
- R(A,B,C,D,E): AB->C, BC->AD, D->E,
- 1. what is AB
- AB =AB, given AB->C
- AB =ABC, given BC->D
- AB =ABCD, given D->E
- AB =ABCDE
- 2. D contained in AB ? Yes: AB->D
- Finding All Implied FD’s on a Projected Schema : Decomposi>Motiva
- Example: ABCD with FD’s AB -> C , -> , and -> .
- – Decompose into ABC AD . What FD’s hold in ?
- > , but also -> !
- – Not only AB C C A
- 1. For each set of attributes , compute .
- 2. Add -> for all in .
- 3. However, drop -> whenever we
- No need to compute the closure of the empty set or of the set of all attributes.
- If we find
- = all attributes, so is the closure of any superset of
- ABC
- >
- >
- – A + = ABC ; yields A -> B , A -> C .
- We do not need to compute AB + or
- – B + = BC ; yields B -> C .
- – C +
- – BC + =
- Resulting FD’s: -> , -> ,
- – Only FD that involves a subset of { A C
- Relation R with FDs is in BCNF if, for
- all -> in
- – -> A is a trivial (X contains A) FD, or X contains a key for R.
- – • In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints.
- name->manf manf->manfAddr does.
- In each FD, the left side is a
- Any one of these FD’s shows Drinkers is not in BCNF
- BCNF ensures that no redundancy can be detected using FD information alone.
- – Most desirable normal form (in terms of redundancy)
- – 1. If XY->A
- – 2. If X->A
- Look among the given FD’s for a BCNF violation -> .
- – If any FD following from violates BCNF,
- .
- Compute
- Replace
- Project given FD’s F onto the two new relations.
- Lossless decomposition
- We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.
- The resulting decomposition of :
- Notice: tells us about drinkers,
- Do we need to decompose a relation?
- – Use FDs and normal forms
- What problems (if any) does the decomposition cause?
- – 2 important properties:
- Lossless-join
- Dependency-preservation (decide efficiency in checking the IC’s)
- – Performance vs. redundancy
- There is one structure of FD’s that causes trouble when we decomp
- >
- >
- AB
- – Example:
- There are two keys,
- C -> B is a BCNF violation, so we must
- The problem is that if we use and
- Example with A B C = zip on the next slide.
- 3 Normal Form (3NF) modifies the
- An attribute is if it is a member of
- > violates 3NF if and only if is not
- In our problem situation with FD’s
- >
- >
- Thus
- Although
- >
- There are two important properties of a decomposition:
- If R is in 3NF, some redundancy is possible.
- It is a compromise, used when BCNF not achievable (e.g., no ``good’’ decomp, or performance considerations).
- –
decomposition of R into a collection of 3NF
relations always possible. - We can get both (1) and (2) with a 3NF decomposition.
- But we can’t always get (1) and (2) with a BCNF decomposition.
- – street-city-zip is an example.
- If a relation is in BCNF, it is free of
redundancies that can be detected using FDs.
- If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. All FDs are preserved?
- If a lossless-join, dependency preserving decomposition into BCNF is not possible (or unsuitable, given typical queries), should consider decomposition into 3NF.
- keep
CF->B, can AB->D be inferred? Approach 1: (Using AA and additional rules) AB->C => AB->BC (augmentation) BC->AD => BC->D (decomposition) AB->BC, BC->D => AB->D (transitivity)
Approach 2: Closure Test
1
2
3
2
C D D A ,
ABC
Basic Idea
1. Start with given FD’s and find all
nontrivial FD’s that follow from the given FD’s.
2. Restrict to those FD’s that involve only attributes of the projected schema.
Simple, Exponential Algorithm
X X
X A A
X X
XY A
discover X -> A .
Because -> follows from -> in any
XY A
X A projection .
4. Finally, use only FD’s involving projected attributes.
X
A Few Tricks
X .
Example
with FD’s
A
B
and
B
C .
Project onto
AC .
AC + .
= C ; yields nothing.
BC ; yields nothing. Example --- Continued
A B A C and -> . B C • Projection onto : -> .
AC A C
, }.Normal Forms
Boyce-Codd Normal Form
F
X A F
X
Example Beers(name, manf, manfAddr)
FD’s: name->manf , manf->manfAddr • Only key is {name} . does not violate BCNF, but
Example (II) Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer , beersLiked->manf • Only key is {name, beersLiked} .
not superkey.
X Y A x y1 a x y2 ? Decomposition into BCNF • Given: relation with FD’s .
Redundancy in BCNF
R F
X B
F then there will surely be an FD in itself
F that violates BCNF.
X – Not all attributes, or else X is a superkey. Decompose R Using X -> B
R
by relations with schemas: 1.
R 1 =
X + .
2 = R – (
X + – X ).
Decomposition Picture R-X +
X X + -X R 2 R
1 R Example
Drinkers(name, addr, beersLiked, manf, favBeer)
= name->addr , name -> favBeer , F beersLiked->manf
Steps for BCNF decomposition:
1. Key: name, beersliked 2. find BCNF violation FD X->A: name->addr 3. compute X+: {name, addr, favbeer} 4. decompose to R1 and R2
R1: Drinkers1(name, addr, favBeer) R2: Drinkers2(name, beersLiked, manf)
Example, Continued
For Drinkers1(name, addr, favBeer) 1.
Projecting FD’s: name->addr, name->favbeer
2. Key: name 3. find BCNF violation FD X->A: NO Violation
Example, Continued
For Drinkers2(name, beersLiked, manf) ,
2. Key: name, beersliked 3. find BCNF violation FD X->A: beersliked- >manf 4. compute X+: {beersliked, manf}
5. decompose to R3 and R4
R3: Drinkers3(beersLiked, manf) FDs? key?
R4: Drinkers4(name, beersLiked) FDs? key?
Are they all in BCNF normal form?Example, Concluded
Drinkers
1. Drinkers1(name, addr, favBeer)
2. Drinkers3(beersLiked, manf)
3. Drinkers4(name, beersLiked)
Drinkers1 tells us about beers, and
Drinkers3 tells us the relationship between
Drinkers4 drinkers and the beers they like.
Review: Problems Related to Decompositions
,
AC
decompose into
, C } .
} and { A
, B
{ A
C = zip code.
B = city,
A = street address,
B .
C
and
C
Problem with BCNF
BC . We Cannot Enforce FD’s
AC
as our database schema, we cannot
BC
enforce the FD AB -> C by checking FD’s in these decomposed relations.
= street, = city, and
An Unenforceable FD
street zip 545 Tech Sq. 02138 545 Tech Sq. 02139 city zip
Cambridge 02138 Cambridge 02139 Join tuples with equal zip codes. street city zip 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139
Although no FD’s were violated in the decomposed relations,
FD street city -> zip is violated by the database as a whole. zip -> city street city -> zip
3NF rd
BCNF condition so we do not have to decompose in this problem situation.
prime any key.
X A X a superkey, and also A is not prime.
and AC .
B
C
C are each prime.
, and
B
,
A
AB
, we have keys
B
C
and
C
AB
Example
violates BCNF, it does not violate 3NF.
3NF decomposition
3NF decomposition (II)
Decomposition Properties
1. Lossless-Join : it should be possible to
project the original relations onto the
decomposed schema, and then reconstruct the original.: it should be
2. Dependency Preservation possible to check in the projected relations whether all the given FD’s are satisfied.
3NF vs. BCNF
Lossless-join, dependency-preserving
3NF vs. BCNF (II) • We can get (1) with a BCNF decomposition.
Exercises
R(C,S,J,D,P,Q,V) Key is C, JP->C, SD->P, J->S Does JP->C violates BCNF? 3NF? What is a BCNF decomposition? Lossless? Dependency-Preserving? How to fix it?
Summary
in mind performance requirements