normalization
play

Normalization Improve the schema by decomposing relations and - PDF document

Normalization Improve the schema by decomposing relations and removing anomalies. CS 235: Boyce-Codd Normal Form (BCNF): all FDs Introduction to Databases follow from the fact key everything . Formally, R is in BCNF if every


  1. Normalization • Improve the schema by decomposing relations and removing anomalies. CS 235: • Boyce-Codd Normal Form (BCNF): all FD’s Introduction to Databases follow from the fact key → everything . • Formally, R is in BCNF if every nontrivial FD for Svetlozar Nestorov R , say X → A , has X a superkey. – “Nontrivial” = right-side attribute not in left side. Lecture Notes #6 BCNF properties Example 1. Guarantees no redundancy due to FD’s. Beers(name, manf, manfAddr). • FD’s: 2. Guarantees no update anomalies = one – name → manf, occurrence of a fact is updated, not all. – manf → manfAddr . 3. Guarantees no deletion anomalies = • Only key is name. valid fact is lost when tuple is deleted. • manf → manfAddr violates BCNF with a left side unrelated to any key. Decomposition into BCNF Proof • Setting: relation R , given FD’s F . Suppose • Let Y → A is a BCNF violation and follows relation R has BCNF violation X → B . from F • Then the computation of Y + used at least • We need only look among FD’s of F for a BCNF violation. one FD X → B from F . • If there are no violations, then the relation • X must be a subset of Y . is in BCNF. • Thus, if Y is not a superkey, X cannot be a • Don’t we have to consider implied FD’s? superkey either, and X → B is also a BCNF violation. • No, because… 1

  2. Decomposition Algorithm (1/2) Decomposition Algorithm (2/2) For every violation X → B among given 3. Find the FD’s for the decomposed FD’s: relations. 1. Compute X + . – Project the FD’s from F = calculate all consequents of F that involve only attributes � Cannot be all attributes – why? from X + or only from ( R − X + ) ∪ X . Decompose R into X + and ( R – X + ) ∪ X . 2. X R X+ Example (1/3) Example (2/3) R = Drinkers(name, addr, beersLiked, manf, favoriteBeer) • BCNF violations? F D ’s: • name → addr – For Drinkers1, name is key and all left sides • name → favoriteBeer of FD’s are superkeys. • beersLiked → manf – For Drinkers2, {name, beersLiked} is the key, Pick BCNF violation name → addr. and beersLiked → manf violates BCNF. • Close the left side: name + = name addr favoriteBeer . • Decomposed relations: Drinkers1(name, addr, favoriteBeer) Drinkers2(name, beersLiked, manf) • Projected FD’s (skipping a lot of work): – For Drinkers1: name → addr and name → favoriteBeer . – For Drinkers2: beersLiked → manf . Third Normal Form (3NF) Example (3/3) • Sometimes we have a dilemma: – If you decompose, you can’t check the FD’s in the decomposed relations. • Decompose Drinkers2 – If you don’t decompose, you violate BCNF. • Close beersLiked + = beersLiked, manf . • Abstractly: AB → C and C → B . • Decompose: • In book: title city → theatre and theatre → city. Drinkers3(beersLiked, manf) • Another example: street city → zip, Drinkers4(name, beersLiked) zip → city. • Resulting relations are all in BCNF: • Keys: AB and AC , but C → B has a left side not a superkey. Drinkers1(name, addr, favoriteBeer) • Suggests decomposition into BC and AC . Drinkers3(beersLiked, manf) – But you can’t check the FD AB → C in these Drinkers4(name, beersLiked) relations. 2

  3. Example “Elegant” Workaround • What can go wrong if we decompose: • Define the problem away. street zip A = street , • A relation R is in 3NF iff for every nontrivial 545 Tech Sq. 02138 B = city , FD X → A , either: 545 Tech Sq. 02139 C = zip. 1. X is a superkey, or city zip Cambridge 02138 2. A is prime = member of at least one Cambridge 02139 key. • Thus, the canonical problem goes away: city street zip you don’t have to decompose because all Cambridge 545 Tech Sq. 02138 Join: attributes are prime. Cambridge 545 Tech Sq. 02139 Decomposition Properties 3NF vs. BCNF 1. We should be able to recover from the • Without proof, we assert that it is always decomposed relations the data of the possible to decompose into BCNF and original. satisfy (1). – Recovery involves projection and join (next • Also without proof, we can decompose time). into 3NF and satisfy both (1) and (2). 2. We should be able to check that the FD’s • But it is not possible to decompose into for the original relation are satisfied by BCNF and get both (1) and (2). checking the projections of those FD’s in – Street-city-zip is an example of this point. the decomposed relations. Multivalued Dependencies Example • The multivalued dependency X →→ Y • Drinkers(name, addr, phones, beersLiked) holds in a relation R if whenever we have • MVD name →→ phones . two tuples of R that agree in all the • If Drinkers has the two tuples: attributes of X , then we can swap their Y name addr phones beersLiked components and get two new tuples that sue a p1 b1 sue a p2 b2 are also in R . it must also have the same tuples with phones components swapped: name addr phones beersLiked sue a p1 b2 sue a p2 b1 3

  4. MVD Rules Splitting Doesn’t Hold • Every FD is an MVD: if X → Y , then • Sometimes you need to have several attributes on the right of an MVD. swapping Y ’s between tuples that agree • For example: Drinkers(name, areaCode, phones, on X doesn’t create new tuples. beersLiked, beerManf) • Example, in Drinkers: name →→ addr . name areaCode phones beersLiked beerManf • Complementation : if X →→ Y , then Leo 773 555-1111 Bud A.B. Leo 773 555-1111 Honkers G.I. X →→ Z , where Z is all attributes not in X Leo 800 555-9999 Bud A.B. or Y . Leo 800 555-9999 Honkers G.I. • Example: since name →→ phones • name →→ areaCode phones holds, but neither holds in Drinkers , so does name →→ areaCode nor name →→ phones do. name →→ addr beersLiked . Fourth Normal Form (4NF) • Eliminate redundancy due to multiplicative effect of MVD's. • Roughly: treat MVD's as FD's for decomposition, but not for finding keys. • Formally: R is in Fourth Normal Form if whenever MVD X →→ Y is nontrivial ( Y is not a subset of X , and X ∪ Y is not all attributes), then X is a superkey. – Remember, X → Y implies X →→ Y , so 4NF is more stringent than BCNF. • Decompose R , using 4NF violation X →→ Y , into XY and X ∪ ( R—Y ). 4

Recommend


More recommend