What and Why Normalization? � To remove potential redundancy in design Normalization � Redundancy causes several anomalies : insert, delete and update � Normalization uses concept of dependencies � Functional Dependencies � Idea used: Decomposition � Break R (A, B, C, D) into R1 (A, B) and R2 (B, C, D) � Use decomposition judiciously. Murali Mani Murali Mani Delete Anomaly Insert Anomaly Student sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 ER Note: We cannot delete a student that is the only student of a professor. Student sNumber sName pNumber pName Delete Anomaly: We are not able to perform a delete without losing some “valid” information. s1 Dave p1 MM s2 Greg p2 ER Note: In both cases, minimum cardinality of Professor in the corresponding ER schema is 0 Note: We cannot insert a professor who has no students. Insert Anomaly: We are not able to insert “valid” value/(s) Murali Mani Murali Mani Update Anomaly Keys : Revisited Student sNumber sName pNumber pName s1 Dave p1 MM � A key for a relation R (a1, a2, …, an) is a set s2 Greg p1 MM of attributes, K that uniquely determine the Note: To update the name of a professor, we have to update in multiple tuples. values for all attributes of R. Update anomalies are due to redundancy. Update Anomaly: To update a value, we have to update multiple rows. � A key K is minimal: no proper subset of K is a key. Note the maximum cardinality of Professor in the corresponding ER schema is * � A superkey need not be minimal sNumber pNumber � Prime Attribute : An attribute of a key (1,1) Has (0,*) Student Professor Advisor sName pName years Murali Mani Murali Mani 1
Functional Dependencies (FDs) Keys: Example Student Student sNumber sName address 1 Dave 144FL sNumber sName address 2 Greg 320FL 1 Dave 144FL 2 Greg 320FL Suppose we have the FD sName → address • for any two rows in the Student relation with the same value for sName, the Primary Key: <sNumber> value for address must be the same Candidate key: <sName> Some superkeys: {<sNumber, address>, • i.e., there is a function from sName to address <sName>, <sNumber>, Note: <sNumber, sName>, • We will assume no null values. <sNumber, sName, address>} • Any key (primary or candidate) or superkey of a relation R functionally Prime Attribute: {sNumber, sName} determines all attributes of R Murali Mani Murali Mani Properties of FDs Inferring FDs � Consider A, B, C, Z are sets of attributes � Why? � Suppose we have a relation R (A, B, C) and we have functional dependencies A → B, B → C, C � Reflexive (also called trivial FD): if A ⊇ B, then → A A → B � what is a key for R? � Transitive : if A → B, and B → C, then A → C � Should we split R into multiple relations? � Augmentation : if A → B, then AZ → BZ � We can infer A → ABC, B → ABC, C → ABC. � Union : if A → B, A → C, then A → BC Hence A, B, C are all keys. � Decomposition : if A → BC, then A → B, A → C Murali Mani Murali Mani Algorithm for inference of FDs Inferring FDs: Example 1 � Given R (A, B, C), and FDs A → B, B → C, C Computing the closure of set of attributes � → A, what are possible keys for R {A1, A2, …, An}, denoted {A1, A2, …, An} + Let X = {A1, A2, …, An} � Compute the closure of attributes: 1. If there exists a FD B1, B2, …, Bm → C, such � {A} + = {A, B, C} 2. that every Bi ∈ X, then X = X ∪ C � {B} + = {A, B, C} Repeat step 2 till no more attributes can be � {C} + = {A, B, C} 3. added. � So keys for R are <A>, <B>, <C> {A1, A2, …, An} + = X 4. Murali Mani Murali Mani 2
Decomposing Relations Inferring FDs: Example 2 StudentProf � Consider R (A, B, C, D, E) with FDs A → B, sNumber sName pNumber pName B → C, CD → E, does A → E? s1 Dave p1 MM s2 Greg p2 MM � Let us compute {A} + FDs: pNumber → pName � {A} + = {A, B, C} Student Professor � Therefore A → E is false sNumber sName pNumber pNumber pName s1 Dave p1 p1 MM s2 Greg p2 p2 MM Murali Mani Murali Mani Decomposition: Lossless Join Normalization Step Property � Consider relation R with set of attributes A R . Generating spurious tuples Consider a FD A → B (such that no other Student Professor attribute in (A R – A – B) is functionally sNumber sName pName pNumber pName determined by A). S1 Dave MM p1 MM � If A is not a superkey for R, we may S2 Greg MM p2 MM decompose R as: StudentProf � Create R’ (A R – B) sNumber sName pNumber pName � Create R’’ with attributes A ∪ B s1 Dave p1 MM � Key for R’’ = A s1 Dave p2 MM s2 Greg p1 MM s2 Greg p2 MM Murali Mani Murali Mani Normal Forms: BCNF BCNF example SCI (student, course, instructor) � Boyce Codd Normal Form (BCNF): For every non-trivial FD X → a in R, X is a superkey of FDs: student, course → instructor R. instructor → course Decomposition: SI (student, instructor) Instructor (instructor, course) Murali Mani Murali Mani 3
Dependency Preservation Normal Forms: 3NF We might want to ensure that all specified FDs are captured. BCNF does not necessarily preserve FDs. 2NF, 3NF preserve FDs. � Third Normal Form (3NF): For every non- Instructor trivial FD X → a in R, either a is a prime SI student instructor instructor course attribute or X is a superkey of R. Dave MM MM DB 1 Dave ER ER DB 1 SCI (from SI and Instructor) student instructor course SCI violates the FD Dave MM DB 1 student, course → instructor Dave ER DB 1 Murali Mani Murali Mani 3NF - example 3NF - example Lot (propNo, county, lotNum, area, price) Lot (propNo, county, lotNum, area, price, taxRate) County (county, taxRate) Candidate key: <county, lotNum> Candidate key for Lot: <county, lotNum> FDs: FDs: county → taxRate county → taxRate area → price area → price Decomposition: Decomposition: Lot (propNo, county, lotNum, area) Lot (propNo, county, lotNum, area, price) County (county, taxRate) County (county, taxRate) Area (area, price) Murali Mani Murali Mani Extreme Example Define Foreign Keys? Consider relation R (A, B, C, D) with primary key (A, B, C), and FDs B → D, and C → D. R violates 3NF. Decomposing it, we get 3 relations as: � Consider the normalization step: A relation R R1 (A, B, C), R2 (B, D), R3 (C, D) with set of attributes A R , and a FD A → B, Let us consider an instance where we need these 3 relations and how we do a and A is not a key for R, we decompose R natural join ⋈ as: R1 ⋈ R2: violates C → D R2 � Create R’ (A R – B) R1 A B C D B D � Create R’’ with attributes A ∪ B a1 b1 c1 d1 A B C b1 d1 � Key for R’’ = A a2 b2 c1 d2 a1 b1 c1 b2 d2 a3 b1 c2 d1 a2 b2 c1 � We can also define foreign key R3 a3 b1 c2 R1 ⋈ R2 ⋈ R3: no FD is violated � R’ (A) references R’’ (A) C D A B C D � Question: What is key for R’? c1 d1 a1 b1 c1 d1 c2 d2 Murali Mani Murali Mani 4
How does Normalization Help? Employee (ssn, name, lot, dept) Dept (did, dName) name FK: Employee (dept) did dName ssn lot REFERENCES Dept (did) Works Employee Dept (1, 1) For (0, *) Suppose: employees of a dept are in the same lot FD: dept → lot Decomposing: lot name Employee (ssn, name, dept) did dName ssn Dept (did, dName) DeptLot (dept, lot) Works Employee Dept (1, 1) For (0, *) (or) Employee (ssn, name, dept) Dept (did, dName, lot) Murali Mani 5
Recommend
More recommend