COP 5725 Fall 2012 Database Management Systems

  COP 5725 Fall 2012 Database Management Systems

  University of Florida, CISE Department Prof. Daisy Zhe Wang

Schema Refinement and Normal Forms

  Functional Dependencies Normal Forms

  Decompositions Review: Database Design Steps

  • Requirements Analysis • Conceptual Database Design
    • – E/R model

  • Logical Database Design
    • – E.g., relational data model
    • – Output: logical/conceptual schema

  • Schema Refinement • Physical Database Design • Application and Security Design

  Why Schema Refinement?

  • After Conceptual and Logical Database Design  relational schemas + IC’s
  • Problems caused by bad schema design
    • – Redundant storage
    • – : one occurrence of a fact

  Update anomaly is changed, but not all occurrences.

  • – : cannot insert a piece

  Insertion anomaly of data without first inserting another

  • – : valid fact is lost when a

  Deletion anomaly tuple is deleted. Example of Bad Design Drinkers(name, addr, beersLiked, manf, favBeer) name addr beersLiked manf favBeer Janeway Voyager Bud A.B. WickedAle Janeway ??? WickedAle ??? Pete’s Spock Enterprise Bud ??? Bud

Data is redundant, because each of the ??? ’s can be figured

out by using the FD’s name -> addr favBeer and beersLiked -> manf .

  This Bad Design Also Exhibits Anomalies name addr beersLiked manf favBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud

  • Update anomaly
  • Insert anomaly
  • Deletion anomaly

  Decompositions

  • Integrity constraints, in particular

  , can be used to

  functional dependencies

  identify schemas with such problems and to suggest refinements.

  • Main refinement technique:

  decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).

  Problems Related to Decompositions

  • Do we need to decompose a relation?
    • – Use FDs and normal forms

  • What problems (if any) does the decomposition cause?
    • – 2 important properties:

  • Lossless-join
  • Dependency-preservation (decide efficiency in checking the IC’s)
    • – Performance vs. redundancy

  Functional Dependencies

  • > is an assertion about a relation

  X A R

  • Say “X determines A holds in R.”
    • – Convention : …, , , represent sets of

  X Y Z attributes; , , ,… represent single attributes. A B C

  : sets of attributes can be written

  • – Convention

  without set formers: e.g., just , rather than ABC { , , }.

  A B C Example name addr beersLiked manf favBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle Spock Enterprise Bud A.B. Bud Because name -> addr

  Because name -> favBeer Because beersLiked -> manf Drinkers(name, addr, beersLiked, manf, favBeer)

  Reasonable FD’s to assert: 1. name -> addr 2. name -> favBeer

FD’s With Multiple Attributes

  • No need for FD’s with > 1 attribute on right.
  • > 1 attribute on left may be essential.

  Where Do FD’s Come From?

  • Key constraints on entities and relationships
    • – ->

K A holds for R (candidate keys)

  • – many-one, one-one relationships
  • – domain knowledge of the application

  Inferring FD’s

  • We are given a set of FD’s : -> ,

  F

  X A

  1

  1

  • > -> , we want to know

  X A ,…,

  X A

  2

  2 n n

  whether an FD Y -> B must hold in every relation instance that satisfies the given FD’s in F . If so, we say Y -> B is implied by .

  F

  • Example: -> is implied by ->

  A C A B

  and B -> C

  Closure and Armstrong’s Axioms

  Examples

  Attribute Closure

  Closure Test

  • An easier way to test is to compute the
    • of , denoted .

  closure Y Y : = .

  • Basis Y Y
    • Induction : Look for an FD

  X -> A , which

  has left side that is a subset of the

  X

  • current and right side A is not in .

  Y Y

  • If exist, add A to Y .
Y + new Y
  • + X A
  • Exercise, FD inference

    • R(A,B,C,D,E): AB->C, BC->AD, D->E,

      CF->B, can AB->D be inferred? Approach 1: (Using AA and additional rules) AB->C => AB->BC (augmentation) BC->AD => BC->D (decomposition) AB->BC, BC->D => AB->D (transitivity)

      Approach 2: Closure Test

    • 1. what is AB
    • AB =AB, given AB->C

      1

    • AB =ABC, given BC->D

      2

    • AB =ABCD, given D->E

      3

    • AB =ABCDE

      2

    • 2. D contained in AB ? Yes: AB->D
    Projecting FD’s

    • Finding All Implied FD’s on a Projected Schema : Decomposi>Motiva
    • Example: ABCD with FD’s AB -> C , -> , and -> .

      C D D A ,

    • – Decompose into ABC AD . What FD’s hold in ?

      ABC

    • > , but also -> !
      • – Not only AB C C A

      Basic Idea

      1. Start with given FD’s and find all

      nontrivial FD’s that follow from the given FD’s.

      2. Restrict to those FD’s that involve only attributes of the projected schema.

      Simple, Exponential Algorithm

    • 1. For each set of attributes , compute .

      X X

    • 2. Add -> for all in .
      • 3. However, drop -> whenever we

      X A A

      X X

      XY A

      discover X -> A .

       Because -> follows from -> in any

      XY A

      X A projection .

      4. Finally, use only FD’s involving projected attributes.

      X

      A Few Tricks

    • No need to compute the closure of the empty set or of the set of all attributes.
    • If we find
      • = all attributes, so is the closure of any superset of

      X .

      Example

    • ABC
      • >
      • >

      with FD’s

      A

      B

      and

      B

      C .

      Project onto

      AC .

    • – A + = ABC ; yields A -> B , A -> C .
      • We do not need to compute AB + or

    • – B + = BC ; yields B -> C .
    • – C +
    • – BC + =

      AC + .

      = C ; yields nothing.

      BC ; yields nothing. Example --- Continued

    • Resulting FD’s: -> , -> ,

      A B A C and -> . B C • Projection onto : -> .

      

    AC A C

    , }.

    • – Only FD that involves a subset of { A C

      Normal Forms

      Boyce-Codd Normal Form

    • Relation R with FDs is in BCNF if, for

      F

    • all -> in

      X A F

    • – -> A is a trivial (X contains A) FD, or X contains a key for R.

      X

    • – • In other words, R is in BCNF if the only non-trivial FDs that hold over R are key constraints.

      Example Beers(name, manf, manfAddr)

      FD’s: name->manf , manf->manfAddr • Only key is {name} . does not violate BCNF, but

    • name->manf manf->manfAddr does.

      Example (II) Drinkers(name, addr, beersLiked, manf, favBeer) FD’s: name->addr favBeer , beersLiked->manf • Only key is {name, beersLiked} .

    • In each FD, the left side is a

      not superkey.

    • Any one of these FD’s shows Drinkers is not in BCNF

      X Y A x y1 a x y2 ? Decomposition into BCNF • Given: relation with FD’s .

      Redundancy in BCNF

    • BCNF ensures that no redundancy can be detected using FD information alone.
      • – Most desirable normal form (in terms of redundancy)
      • – 1. If XY->A
      • – 2. If X->A

      R F

    • Look among the given FD’s for a BCNF violation -> .

      X B

    • – If any FD following from violates BCNF,

      F then there will surely be an FD in itself

      F that violates BCNF.

    • .
      • Compute

      X – Not all attributes, or else X is a superkey. Decompose R Using X -> B

    • Replace

      R

      by relations with schemas: 1.

      R 1 =

      X + .

      2 = R – (

      X + – X ).

    • Project given FD’s F onto the two new relations.
    • Lossless decomposition

      Decomposition Picture R-X +

      X X + -X R 2 R

    1 R Example

    Drinkers(name, addr, beersLiked, manf, favBeer)

      = name->addr , name -> favBeer , F beersLiked->manf

      Steps for BCNF decomposition:

      1. Key: name, beersliked 2. find BCNF violation FD X->A: name->addr 3. compute X+: {name, addr, favbeer} 4. decompose to R1 and R2

      R1: Drinkers1(name, addr, favBeer) R2: Drinkers2(name, beersLiked, manf)

      Example, Continued

    • We are not done; we need to check Drinkers1 and Drinkers2 for BCNF.

      For Drinkers1(name, addr, favBeer) 1.

    Projecting FD’s: name->addr, name->favbeer

      2. Key: name 3. find BCNF violation FD X->A: NO Violation

      Example, Continued

      For Drinkers2(name, beersLiked, manf) ,

      2. Key: name, beersliked 3. find BCNF violation FD X->A: beersliked- >manf 4. compute X+: {beersliked, manf}

      5. decompose to R3 and R4

    R3: Drinkers3(beersLiked, manf) FDs? key?

    R4: Drinkers4(name, beersLiked) FDs? key?

    Are they all in BCNF normal form?

      Example, Concluded

    • The resulting decomposition of :

      Drinkers

      1. Drinkers1(name, addr, favBeer)

      2. Drinkers3(beersLiked, manf)

      3. Drinkers4(name, beersLiked)

    • Notice: tells us about drinkers,

      Drinkers1 tells us about beers, and

      Drinkers3 tells us the relationship between

      Drinkers4 drinkers and the beers they like.

      Review: Problems Related to Decompositions

    • Do we need to decompose a relation?
      • – Use FDs and normal forms

    • What problems (if any) does the decomposition cause?
      • – 2 important properties:

    • Lossless-join
    • Dependency-preservation (decide efficiency in checking the IC’s)
      • – Performance vs. redundancy

    • There is one structure of FD’s that causes trouble when we decomp

      • >
      • >

    • AB

    • – Example:
      • There are two keys,
      • C -> B is a BCNF violation, so we must

      ,

      AC

      decompose into

      , C } .

      } and { A

      , B

      { A

      C = zip code.

      B = city,

      A = street address,

      B .

      C

      and

      C

      Problem with BCNF

      BC . We Cannot Enforce FD’s

    • The problem is that if we use and

      AC

      as our database schema, we cannot

      BC

      enforce the FD AB -> C by checking FD’s in these decomposed relations.

      = street, = city, and

    • Example with A B C = zip on the next slide.

    An Unenforceable FD

      street zip 545 Tech Sq. 02138 545 Tech Sq. 02139 city zip

      Cambridge 02138 Cambridge 02139 Join tuples with equal zip codes. street city zip 545 Tech Sq. Cambridge 02138 545 Tech Sq. Cambridge 02139

    Although no FD’s were violated in the decomposed relations,

      FD street city -> zip is violated by the database as a whole. zip -> city street city -> zip

      3NF rd

    • 3 Normal Form (3NF) modifies the

      BCNF condition so we do not have to decompose in this problem situation.

    • An attribute is if it is a member of

      prime any key.

    • > violates 3NF if and only if is not

      X A X a superkey, and also A is not prime.

    • In our problem situation with FD’s
      • >
      • >

      and AC .

      B

      C

      C are each prime.

      , and

      B

      ,

      A

      AB

      , we have keys

      B

      C

      and

      C

      AB

      Example

    • Thus
    • Although
      • >

      violates BCNF, it does not violate 3NF.

      3NF decomposition

      3NF decomposition (II)

      Decomposition Properties

    • There are two important properties of a decomposition:

      1. Lossless-Join : it should be possible to

    project the original relations onto the

    decomposed schema, and then reconstruct the original.

      : it should be

      2. Dependency Preservation possible to check in the projected relations whether all the given FD’s are satisfied.

      3NF vs. BCNF

    • If R is in 3NF, some redundancy is possible.
    • It is a compromise, used when BCNF not achievable (e.g., no ``good’’ decomp, or performance considerations).

      Lossless-join, dependency-preserving

    • decomposition of R into a collection of 3NF

      relations always possible.

      3NF vs. BCNF (II) • We can get (1) with a BCNF decomposition.

    • We can get both (1) and (2) with a 3NF decomposition.
    • But we can’t always get (1) and (2) with a BCNF decomposition.
      • – street-city-zip is an example.

    Exercises

      R(C,S,J,D,P,Q,V) Key is C, JP->C, SD->P, J->S Does JP->C violates BCNF? 3NF? What is a BCNF decomposition? Lossless? Dependency-Preserving? How to fix it?

      Summary

    • If a relation is in BCNF, it is free of

      redundancies that can be detected using FDs.

    • If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. All FDs are preserved?
    • If a lossless-join, dependency preserving decomposition into BCNF is not possible (or unsuitable, given typical queries), should consider decomposition into 3NF.
    • keep

      in mind performance requirements