Lecture 2 Database Management Limitations of Relational Database Designs Systems • Provides a set of guidelines, does not result in a unique database schema Winter 2004 • Does not provide a way of evaluating alternative schemas CMPUT 391: Database Design Theory • Pitfalls: or Relational Normalization Theory – Repetition of information Dr. Osmar R. Zaïane – Inability to represent certain information – Loss of information � Normalization theory provides a mechanism for Chapter 8 University of Alberta analyzing and refining the schema produced by an of Textbook E-R design Based on slides by Lewis, Bernstein and Kifer. 1 2 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Redundancy and Other Problems Redundancy • Set valued attributes in the E-R diagram result in • Dependencies between attributes cause multiple rows in corresponding table redundancy • Example: Person Person ( SSN, Name, Address, Hobbies ) – Ex. All addresses in the same town have the same zip code – A person entity with multiple hobbies yields multiple rows in table Person Person • Hence, the association between Name and Address for the SSN Name Town Zip same person is stored redundantly Redundancy 1234 Joe Stony Brook 11790 – SSN is key of entity set, but ( SSN, Hobby ) is key of 4321 Mary Stony Brook 11790 corresponding relation 5454 Tom Stony Brook 11790 …………………. • The relation Person Person can’t describe people without hobbies 3 4 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta
Anomalies Example • Redundancy leads to anomalies: ER Model – Update anomaly : A change in Address must be SSN Name Address Hobby 1111 Joe 123 Main {biking, hiking} made in several places Hobbies Person – Deletion anomaly : Suppose a person gives up all hobbies. Do we: SIN Name Address • Set Hobby attribute to null? No, since Hobby is part of key Relational Model • Delete the entire row? No, since we lose other SSN Name Address Hobby information in the row 1111 Joe 123 Main biking – Insertion anomaly : Hobby value must be 1111 Joe 123 Main hiking supplied for any inserted row since Hobby is ……………. part of key Redundancy 5 6 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Decomposition Normalization Theory • Solution : use two relations to store Person Person information • Result of E-R analysis need further – Person1 Person1 ( SSN, Name, Address ) – refinement – Hobbies Hobbies ( SSN, Hobby ) – • Appropriate decomposition can solve • The decomposition is more general: people problems with hobbies can now be described • The underlying theory is referred to as • No update anomalies: normalization theory and is based on normalization theory – Name and address stored once functional dependencies (and other kinds, functional dependencies – A hobby can be separately supplied or like multivalued multivalued dependencies dependencies ) deleted 7 8 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta
Functional Dependencies Example • Definition: A functional dependency functional dependency (FD) on a relation schema R is a constraint X → Y , where X and Y are subsets ER Model name of attributes of R. ssn lot • Definition : An FD X → Y is satisfied satisfied in an instance r of hours_worked R if for every pair of tuples, t and s: if t and s agree on all rating Hourly Emps attributes in X then they must agree on all attributes in Y hourly_wages • Definition : A constraint on a relation schema R is a condition that has to be satisfied in every allowable Relational Model instance of R. • Hourly_Emps ( ssn, name, lot, rating, hrly_wages , hrs_worked ) � FDs must be identified based on semantics of application. • Some functional dependencies on Hourly_Emps: � Given a particular allowable instance r1 of R, we can check if it → violates some FD f , but we cannot tell if f holds over the schema R! – ssn is the key: S SNLRWH → � A key constraint is a special kind of functional dependency: all – rating determines hrly_wages : R W attributes of relation occur on the right-hand side of the FD: • Are there anomalies? • SSN → SSN, Name, Address 9 10 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Functional Dependency - Example Functional Dependencies • Address → ZipCode • Brokerage firm allows multiple clients to share an – Stony Brook’s ZIP is 11733 account, but each account is managed from a single • ArtistName → BirthYear office and a client can have no more than one account in an office – Picasso was born in 1881 • Autobrand → Manufacturer , Engine type – HasAccount HasAccount ( AcctNum, ClientId, OfficeId ) – • keys are ( ClientId, OfficeId ), ( AcctNum, ClientId ) – Pontiac is built by General Motors with – ClientId, OfficeId → AcctNum gasoline engine – AcctNum → OfficeId • Author, Title → PublDate • Thus, attribute values need not depend only on key values – Shakespeare’s Hamlet published in 1600 11 12 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta
Entailment (cont’d) Entailment, Closure, Equivalence • Satisfaction, entailment, and equivalence are semantic concepts – defined in terms of the actual relations in the • Definition : If F is a set of FDs on schema R and f is “real world.” another FD on R , then F entails entails f if every instance r of – They define what these notions are , not how to compute them R that satisfies every FD in F also satisfies f • How to check if F entails f or if F and G are equivalent? – Ex : F = {A → B, B → C} and f is A → C – Apply the respective definitions for all possible relations? • If Streetaddr → Town and Town → Zip then Streetaddr → Zip • Bad idea : might be infinite in number for infinite domains closure of F , denoted F + , is the set of • Definition : The closure • Even for finite domains, we have to look at relations of all arities all FDs entailed by F – Solution : find algorithmic, syntactic ways to compute these • Definition : F and G are equivalent equivalent if F entails G and G notions entails F • Important : The syntactic solution must be “correct” with respect to the semantic definitions • Correctness has two aspects: soundness soundness and completeness completeness 13 14 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Armstrong’s Axioms for FDs Armstrong’s Axioms for FDs (cont.) • This is the syntactic way of computing/testing • Two more rules (which can be derived from the the various properties of FDs axioms) can be useful: • Reflexivity : If Y ⊆ X then X → Y (trivial FD) – Union : If X → Y and X → Z then X → YZ – Name, Address → Name • Augmentation : If X → Y then X Z → YZ – Decomposition : If X → YZ then X → Y and – If Town → Zip then Town, Name → Zip, Name X → Z • Transitivity : If X → Y and Y → Z then X → Z 15 16 Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta Dr. Osmar Zaïane, 2004 CMPUT 391 – Database Management Systems University of Alberta
Recommend
More recommend