why is this important
play

Why Is This Important? Many ways to model a given scenario in a - PDF document

Why Is This Important? Many ways to model a given scenario in a database Schema Refinement and How do we find the best one? We will discuss objective criteria for evaluating Normal Forms database design quality Formally define


  1. Why Is This Important?  Many ways to model a given scenario in a database Schema Refinement and  How do we find the best one?  We will discuss objective criteria for evaluating Normal Forms database design quality  Formally define desired properties Chapter 19  Algorithms for determining if a database has these properties  Algorithms for fixing problems 1 2 The Evils of Redundancy Functional Dependencies (FDs)  A functional dependency X  Y holds over relation R  Redundancy is at the root of several problems associated with relational schemas: if, for every allowable instance r of R:  Redundant storage  t1  r, t2  r,  X (t1) =  X (t2) implies  Y (t1) =  Y (t2)  Insert, delete, update anomalies  I.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y are sets of attributes.)  Integrity constraints can be used to identify schemas  An FD is a statement about all allowable relations. with such problems and to suggest refinements.  Must be identified based on semantics of application.  Main refinement technique: decomposition  Given some allowable instance r1 of R, we can check if it  Replacing ABCD with, say, AB and BCD, or ACD and ABD. violates some FD f, but we cannot tell if f holds over R.  Decomposition should be used judiciously:  K is a candidate key for R means that K  R  Is there reason to decompose a relation?  However, K  R does not require K to be minimal.  What problems (if any) does the decomposition cause? 3 4 Wages R W 8 10 Example: Constraints on Entity Set Example (Contd.) 5 7 Hourly_Emps2 S N L R H Are the two smaller  Consider a relation obtained from Hourly_Emps: 123-22-3666 Attishoo 48 8 40 tables better?  Hourly_Emps (ssn, name, lot, rating, hrly_wages, 231-31-5368 Smiley 22 8 30  Problems in single “wide” hrs_worked) table due to R  W: 131-24-3650 Smethurst 35 5 30  Notation: We will denote this relation schema by  Update anomaly: Can we 434-26-3751 Guldu 35 5 32 listing the attributes: SNLRWH change W in just the first 612-67-4134 Madayan 35 8 40 tuple of SNLRWH?  This is really the set of attributes {S,N,L,R,W,H}.  Insertion anomaly: What S N L R W H if we want to insert an  Sometimes, we will refer to all attributes of a relation by employee and don’t know 123-22-3666 Attishoo 48 8 10 40 using the relation name. (e.g., Hourly_Emps for SNLRWH) the hourly wage for his rating? 231-31-5368 Smiley 22 8 10 30  Some FDs on Hourly_Emps:  Deletion anomaly: If we 131-24-3650 Smethurst 35 5 7 30 delete all employees with  ssn is the key: S  SNLRWH rating 5, we lose the 434-26-3751 Guldu 35 5 7 32  rating determines hrly_wages: R  W information about the wage for rating 5. 612-67-4134 Madayan 35 8 10 40 5 6

  2. Reasoning About FDs Reasoning About FDs (Contd.)  Given some FDs, we can infer additional FDs:  Additional rules (that follow from the AA):  ssn  did, did  lot implies ssn  lot  Union: If X  Y and X  Z, then X  YZ  Decomposition: If X  YZ, then X  Y and X  Z  An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.  Example: Contracts(cid, sid, jid, did, pid, qty, value) and:  F + = closure of F; is the set of all FDs that are implied by F.  C is the key: C  CSJDPQV  Project purchases each part using single contract: JP  C  Armstrong’s Axioms (X, Y, Z are sets of attributes):  Dept purchases at most one part from a supplier: SD  P  Reflexivity: If X  Y, then Y  X.  JP  C, C  CSJDPQV imply JP  CSJDPQV  Augmentation: If X  Y, then XZ  YZ for any Z.  Transitivity: If X  Y and Y  Z, then X  Z.  SD  P implies SDJ  JP  These are sound (generate only FDs in F + ) and complete  SDJ  JP, JP  CSJDPQV imply SDJ  CSJDPQV (generate all FDs in F + ) inference rules for FDs. 7 8 Reasoning About FDs (Contd.) So, What Do We Do Now With FDs?  Computing the closure of a set of FDs can be expensive.  Essential for identifying problems in a database  Size of closure is exponential in # attributes design  Typically, we just want to check if a given FD X  Y is in  Provide a way for “fixing” the problem the closure of a set of FDs F. An efficient algorithm:  Compute attribute closure of X (denoted X + ) wrt F:  Key concept: normal forms • Set of all attributes A such that X  A is in F+  A relation that is in a certain normal form has certain • There is a linear time algorithm to compute this.  Check if Y is in X + desirable properties  Does F = {A  B, B  C, CD  E} imply A  E?  I.e, is A  E in the closure F + ? Equivalently, is E in A+? 9 10 Normal Forms Boyce-Codd Normal Form (BCNF)  Reln R with FDs F is in BCNF if, for all X  A in F +  Returning to the issue of schema refinement, the first  A  X (called a trivial FD), or question to ask is whether any refinement is needed.  X is a superkey for R.  If a relation is in a certain normal form (BCNF, 3NF etc.),  In other words, R is in BCNF if the only non-trivial FDs it is known that certain kinds of problems are avoided or that hold over R are key constraints. minimized.  R is free of any redundancy caused by FDs alone.  Helps deciding whether decomposing the relation will help. • No field of any tuple can be inferred (using only FDs) from the values in the other fields in the relation instance  Role of FDs in detecting redundancy: • For X  A, consider two tuples with the same X Y A  Consider a relation R with three attributes, ABC. X value. x y1 a • No FDs hold: There is no redundancy here. • They should have the same A value. Redundancy? • Given A  B: Several tuples could have the same A value, and if so, • No. Since R is in BCNF, X is a superkey and hence x y2 ? they all have the same B value. the “two” tuples must be identical. 11 12

  3. Problems Prevented By BCNF Third Normal Form (3NF)  If BCNF is violated by (non-trivial) FD X  A, one of the  Reln R with FDs F is in 3NF if, for all X  A in F + following holds:  A  X (called a trivial FD), or  X is a subset of some key K.  X is a superkey for R, or • We store (X, A) pairs redundantly.  A is part of some key for R. • E.g., Reserves(S, B, D, C) with SBD as only key and FD S  C  Minimality of a key is crucial in third condition above. • Credit card number of a sailor stored for each reservation  If R is in BCNF, is it automatically in 3NF? What about the  X is not a proper subset of any key. other direction? • Redundant storage of (X, A) pairs as above • And there is a chain of FDs K  X  A, which means that we cannot  If R is in 3NF, some redundancy is possible. associate an X value with a K value unless we also associate an A value  3NF is a compromise, used when BCNF is not achievable (e.g., with an X value. no ``good’’ decomposition, or performance considerations). • E.g., Hourly_Emps(S, N, L, R, W, H) with S as only key and FD R  W  Lossless-join, dependency-preserving decomposition of R into a • Have chain S  R  W, hence cannot record the fact that employee S has rating R without knowing the hourly wage for that rating collection of 3NF relations is always possible. (covered soon) 13 14 What Does 3NF Achieve? Footnote About Other Normal Forms  Prevents same problems as BCNF, except for FDs where  1NF: every field contains only atomic values, i.e., no A is part of some key lists or sets  Consider FD X  A where X is no superkey, but A is part of some  2NF: 1NF, and all attributes that are not part of any key  E.g., Reserves(S, B, D, C) with only key SBD and FDs S  C and candidate key are functionally dependent on the C  S is in 3NF whole of every candidate key • Notice: same example as before, but adding C  S made it 3NF  3NF implies 2NF • Why? Since C  S and SBD is a key, CBD is also a key. Hence for S  C, C is part of a key  4NF: prevents redundancy from multi-valued • Redundancy problem: for each reservation of sailor S, same (S, C) pair is stored. dependencies (see book)  BCNF did not suffer from this redundancy problem.  5NF: addresses redundancy based on join  So, why do we need 3NF? Let’s look at decompositions dependencies, which generalize multi-valued first. dependencies (see book) 15 16 Example Decomposition Decomposition of a Relation Schema  Decompositions should be used only when needed.  Suppose relation R contains attributes A1,..., An. A  Let SNLRWH have FDs S  SNLRWH and R  W decomposition of R replaces R by two or more  Second FD causes violation of 3NF relations such that: • W values repeatedly associated with R values.  Each new relation schema contains a subset of the  Easiest fix: create a relation RW to store these associations attributes of R (and no attributes that do not appear in R), and remove W from the main schema: and • I.e., we decompose SNLRWH into SNLRH and RW  Every attribute of R appears as an attribute of at least one  Each SNLRWH tuple will now be projected into two of the new relations. tuples, SNLRH and RW, each stored in the  Intuition: decomposing R means we will store corresponding relation instances of the relation schemes produced by the  Are there any potential problems with this approach? decomposition, instead of instances of R. 17 18

Recommend


More recommend