COP 5725 Fall 2012 Database Management Systems

University of Florida, CISE Department Prof. Daisy Zhe Wang

Schema Refinement and Normal Forms

Functional Dependencies Normal Forms

Decompositions Review: Database Design Steps

Requirements Analysis • Conceptual Database Design

– E/R model

Logical Database Design

– E.g., relational data model
– Output: logical/conceptual schema

Schema Refinement • Physical Database Design • Application and Security Design

Why Schema Refinement?

After Conceptual and Logical Database Design  relational schemas + IC’s
Problems caused by bad schema design

– Redundant storage
– : one occurrence of a fact

Update anomaly is changed, but not all occurrences.

– : cannot insert a piece

Insertion anomaly of data without first inserting another

– : valid fact is lost when a

Deletion anomaly tuple is deleted. Example of Bad Design Drinkers(name, addr, beersLiked, manf, favBeer) name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway ??? WickedAle ??? Pete’s Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ??? ’s can be figured

out by using the FD’s name -> addr favBeer and beersLiked -> manf .

This Bad Design Also Exhibits Anomalies name addr beersLiked manf favBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud

Update anomaly
Insert anomaly
Deletion anomaly

Decompositions

Integrity constraints, in particular

, can be used to

functional dependencies

identify schemas with such problems and to suggest refinements.

Main refinement technique:

decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).

Problems Related to Decompositions

Do we need to decompose a relation?

– Use FDs and normal forms

What problems (if any) does the decomposition cause?

– 2 important properties:

Lossless-join
Dependency-preservation (decide efficiency in checking the IC’s)

– Performance vs. redundancy

Functional Dependencies

> is an assertion about a relation

X A R

Say “X determines A holds in R.”

– Convention : …, , , represent sets of

X Y Z attributes; , , ,… represent single attributes. A B C

: sets of attributes can be written

– Convention

without set formers: e.g., just , rather than ABC { , , }.

A B C Example name addr beersLiked manf favBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud Because name -> addr

Because name -> favBeer Because beersLiked -> manf Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD’s to assert: 1. name -> addr 2. name -> favBeer

FD’s With Multiple Attributes

No need for FD’s with > 1 attribute on right.
> 1 attribute on left may be essential.

Where Do FD’s Come From?

Key constraints on entities and relationships

– ->

K A holds for R (candidate keys)

– many-one, one-one relationships
– domain knowledge of the application

Inferring FD’s

We are given a set of FD’s : -> ,

X A

> -> , we want to know

X A ,…,

X A

2 n n

whether an FD Y -> B must hold in every relation instance that satisfies the given FD’s in F . If so, we say Y -> B is implied by .

Example: -> is implied by ->

A C A B

and B -> C

Closure and Armstrong’s Axioms

Examples

Attribute Closure

Closure Test

An easier way to test is to compute the

of , denoted .

closure Y Y : = .

Basis Y Y

Induction : Look for an FD

X -> A , which

has left side that is a subset of the

current and right side A is not in .

Y Y

If exist, add A to Y .

Y ⁺ new Y

⁺ X A

R(A,B,C,D,E): AB->C, BC->AD, D->E,

CF->B, can AB->D be inferred? Approach 1: (Using AA and additional rules) AB->C => AB->BC (augmentation) BC->AD => BC->D (decomposition) AB->BC, BC->D => AB->D (transitivity)

Approach 2: Closure Test

1. what is AB
AB =AB, given AB->C

AB =ABC, given BC->D

AB =ABCD, given D->E

AB =ABCDE

2. D contained in AB ? Yes: AB->D

Finding All Implied FD’s on a Projected Schema : Decomposi>Motiva
Example: ABCD with FD’s AB -> C , -> , and -> .

C D D A ,

– Decompose into ABC AD . What FD’s hold in ?

ABC

> , but also -> !

– Not only AB C C A

Basic Idea

1. Start with given FD’s and find all

nontrivial FD’s that follow from the given FD’s.

2. Restrict to those FD’s that involve only attributes of the projected schema.

Simple, Exponential Algorithm

1. For each set of attributes , compute .

X X

2. Add -> for all in .

3. However, drop -> whenever we

X A A

X X

XY A

discover X -> A .

 Because -> follows from -> in any

XY A

X A projection .

4. Finally, use only FD’s involving projected attributes.

A Few Tricks

No need to compute the closure of the empty set or of the set of all attributes.
If we find

= all attributes, so is the closure of any superset of

X .

Example

with FD’s

and

C .

Project onto

AC .

– A ⁺ = ABC ; yields A -> B , A -> C .

We do not need to compute AB ⁺ or

– B ⁺ = BC ; yields B -> C .
– C ⁺
– BC ⁺ =

AC ⁺ .

= C ; yields nothing.

BC ; yields nothing. Example --- Continued

Resulting FD’s: -> , -> ,

A B A C and -> . B C • Projection onto : -> .

AC A C

– Only FD that involves a subset of { A C

Normal Forms

Boyce-Codd Normal Form

Relation R with FDs is in BCNF if, for

all -> in

X A F

– -> A is a trivial (X contains A) FD, or X contains a key for R.

– • In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints.

Example Beers(name, manf, manfAddr)

FD’s: name->manf , manf->manfAddr • Only key is {name} . does not violate BCNF, but

name->manf manf->manfAddr does.

Example (II) Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer , beersLiked->manf • Only key is {name, beersLiked} .

In each FD, the left side is a

not superkey.

Any one of these FD’s shows Drinkers is not in BCNF

X Y A x y1 a x y2 ? Decomposition into BCNF • Given: relation with FD’s .

Redundancy in BCNF

BCNF ensures that no redundancy can be detected using FD information alone.

– Most desirable normal form (in terms of redundancy)
– 1. If XY->A
– 2. If X->A

R F

Look among the given FD’s for a BCNF violation -> .

X B

– If any FD following from violates BCNF,

F then there will surely be an FD in itself

F that violates BCNF.

Compute

X – Not all attributes, or else X is a superkey. Decompose R Using X -> B

Replace

by relations with schemas: 1.

R ₁ =

X ⁺ .

₂ = R – (

X ⁺ – X ).

Project given FD’s F onto the two new relations.
Lossless decomposition

Decomposition Picture R-X ⁺

X X ⁺ -X R ₂ R

₁ R Example

Drinkers(name, addr, beersLiked, manf, favBeer)

= name->addr , name -> favBeer , F beersLiked->manf

Steps for BCNF decomposition:

1. Key: name, beersliked 2. find BCNF violation FD X->A: name->addr 3. compute X+: {name, addr, favbeer} 4. decompose to R1 and R2

R1: Drinkers1(name, addr, favBeer) R2: Drinkers2(name, beersLiked, manf)

Example, Continued

We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.

For Drinkers1(name, addr, favBeer) 1.

Projecting FD’s: name->addr, name->favbeer

2. Key: name 3. find BCNF violation FD X->A: NO Violation

Example, Continued

For Drinkers2(name, beersLiked, manf) ,

2. Key: name, beersliked 3. find BCNF violation FD X->A: beersliked- >manf 4. compute X+: {beersliked, manf}

5. decompose to R3 and R4

R3: Drinkers3(beersLiked, manf) FDs? key?

R4: Drinkers4(name, beersLiked) FDs? key?

Example, Concluded

The resulting decomposition of :

Drinkers

1. Drinkers1(name, addr, favBeer)

2. Drinkers3(beersLiked, manf)

3. Drinkers4(name, beersLiked)

Notice: tells us about drinkers,

Drinkers1 tells us about beers, and

Drinkers3 tells us the relationship between

Drinkers4 drinkers and the beers they like.

Review: Problems Related to Decompositions

Do we need to decompose a relation?

– Use FDs and normal forms

What problems (if any) does the decomposition cause?

– 2 important properties:

Lossless-join
Dependency-preservation (decide efficiency in checking the IC’s)

– Performance vs. redundancy

There is one structure of FD’s that causes trouble when we decomp
- >
- >
AB

– Example:

There are two keys,
C -> B is a BCNF violation, so we must

decompose into

, C } .

} and { A

, B

{ A

C = zip code.

B = city,

A = street address,

B .

and

Problem with BCNF

BC . We Cannot Enforce FD’s

The problem is that if we use and

as our database schema, we cannot

enforce the FD AB -> C by checking FD’s in these decomposed relations.

= street, = city, and

Example with A B C = zip on the next slide.

An Unenforceable FD

street zip 545 Tech Sq. 02138 545 Tech Sq. 02139 city zip

Cambridge 02138 Cambridge 02139 Join tuples with equal zip codes. street city zip 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139

Although no FD’s were violated in the decomposed relations,

FD street city -> zip is violated by the database as a whole. zip -> city street city -> zip

3NF rd

3 Normal Form (3NF) modifies the

BCNF condition so we do not have to decompose in this problem situation.

An attribute is if it is a member of

prime any key.

> violates 3NF if and only if is not

X A X a superkey, and also A is not prime.

In our problem situation with FD’s

and AC .

C are each prime.

, and

, we have keys

and

Example

Thus
Although

violates BCNF, it does not violate 3NF.

3NF decomposition

3NF decomposition (II)

Decomposition Properties

There are two important properties of a decomposition:

1. Lossless-Join : it should be possible to

project the original relations onto the

: it should be

2. Dependency Preservation possible to check in the projected relations whether all the given FD’s are satisfied.

3NF vs. BCNF

If R is in 3NF, some redundancy is possible.
It is a compromise, used when BCNF not achievable (e.g., no ``good’’ decomp, or performance considerations).

Lossless-join, dependency-preserving

–
decomposition of R into a collection of 3NF
relations always possible.

3NF vs. BCNF (II) • We can get (1) with a BCNF decomposition.

We can get both (1) and (2) with a 3NF decomposition.
But we can’t always get (1) and (2) with a BCNF decomposition.

– street-city-zip is an example.

Exercises

R(C,S,J,D,P,Q,V) Key is C, JP->C, SD->P, J->S Does JP->C violates BCNF? 3NF? What is a BCNF decomposition? Lossless? Dependency-Preserving? How to fix it?

Summary

If a relation is in BCNF, it is free of
redundancies that can be detected using FDs.
If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. All FDs are preserved?
If a lossless-join, dependency preserving decomposition into BCNF is not possible (or unsuitable, given typical queries), should consider decomposition into 3NF.
keep

in mind performance requirements

COP 5725 Fall 2012 Database Management Systems

Schema Refinement and Normal Forms

FD’s With Multiple Attributes

K A holds for R (candidate keys)

Projecting FD’s: name->addr, name->favbeer

An Unenforceable FD

Exercises

Dokumen yang terkait

Relational Database Management System

Relational Database Management Systems for Epidemiologists: SQL Part I

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

Dukungan

Links

COP 5725 Fall 2012 Database Management Systems

Schema Refinement and Normal Forms

FD’s With Multiple Attributes

K A holds for R (candidate keys)

Projecting FD’s: name->addr, name->favbeer

An Unenforceable FD

Exercises

Dokumen yang terkait

Relational Database Management System

Relational Database Management Systems for Epidemiologists: SQL Part I

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

COP 5725 Fall 2012 Database Management Systems

Dokumen yang Anda mencari sudah siap untuk unduhkan