Database Management Course Content Systems • Introduction • Database Design Theory • Query Processing and Optimisation Winter 2003 • Concurrency Control • Data Base Recovery and Security CMPUT 391: Database Design Theory • Object-Oriented Databases • Inverted Index for IR Dr. Osmar R. Zaïane • Spatial Data Management • XML • Data Warehousing • Data Mining University of Alberta Chapter 19 • Parallel and Distributed Databases of Textbook Dr. Osmar R. Zaïane, 2001-2003 1 Dr. Osmar R. Zaïane, 2001-2003 2 2 CMPUT 391 – Database Management Systems University of Alberta CMPUT 391 – Database Management Systems University of Alberta Database Design Theory Objectives of Lecture 2 Database Design Theory • Database Design Process • Understand some limitations of Entity • Redundancy Anomalies Relationship Model • Functional Dependencies • Introduce Functional Dependencies in Relational Database Design • Armstrong Axioms and Derived Rules • Introduce Decomposition and • Normal Forms Normalization • Decomposition of Relations Dr. Osmar R. Zaïane, 2001-2003 Dr. Osmar R. Zaïane, 2001-2003 CMPUT 391 – Database Management Systems University of Alberta 3 CMPUT 391 – Database Management Systems University of Alberta 4
Database Design Process Choices of DBMS Manufacturing, Hospital, Bank, University, etc. Real World • Technical Factors Requirements – data model, user interfaces Analysis Functional Database Requirements Requirements – programming languages, E-R Modeling – application development tools Functional Analysis Conceptual Design – storage structures, access methods Choice of a DBMS Access Conceptual • Economic Factors Specifications Schema – software, hardware, database Data Model Mapping Logical Design – acquisition, maintenance – personnel, training, operation Logical Application Pgm Design Schema • Political Factors Physical Design Dr. Osmar R. Zaïane, 2001-2003 5 Dr. Osmar R. Zaïane, 2001-2003 6 CMPUT 391 – Database Management Systems University of Alberta CMPUT 391 – Database Management Systems University of Alberta Logical Database Design Physical Database Design • Purpose • System independent phase – to specify the appropriate file structures and indexes • Criteria – obtain a desirable database scheme in the – efficiency database model of the chosen database • Approach management system – analyzing the database queries and transactions, including expected frequency • System dependent phase – specifying the general user requirements – adjust the database scheme obtained in the • Guideline previous phase to conform to the chose – speeding natural join operations database management system – separate read-only and update transactions – index files for search and hashing for random access – DDL statements – focus on attributes used most frequently Dr. Osmar R. Zaïane, 2001-2003 Dr. Osmar R. Zaïane, 2001-2003 CMPUT 391 – Database Management Systems University of Alberta 7 CMPUT 391 – Database Management Systems University of Alberta 8
Implementation Bad Database Design • Pitfalls in Relational Database Design – Repetition of information • Coding – Inability to represent certain information – DDL (Data Definition Language) for database scheme – Loss of information – SDL (Storage Definition Language) for physical scheme Consider the following relation schemes: – develop application programs Branch = (branch-name, assets, branch-city) • Testing Borrow = (branch-name, loan-number, customer-name, amount) Deposit = (branch-name, account-number, customer-name, amount) • Operation and Maintenance Dr. Osmar R. Zaïane, 2001-2003 9 Dr. Osmar R. Zaïane, 2001-2003 10 CMPUT 391 – Database Management Systems University of Alberta CMPUT 391 – Database Management Systems University of Alberta Repetition of Information Repetition of Information Consider an alternative design with the single scheme below Consider an alternative design Lending = (branch-name, assets, branch-city, loan-number, customer-name, amount) Branch-Cust = (branch-name, assets, branch-city, customer-name) Cust-Loan = (customer-name, loan-number, amount) customer loan amount branch-name assets branch-city loan-number customer-name amount branch-name assets branch-city customer-name Jones 17 1000 Downtown 9000 Edmonton 17 Jones 1000 Downtown 9000 Edmonton Jones Smith 93 2000 Downtown 9000 Edmonton 93 Smith 2000 Downtown 9000 Edmonton Smith Hays 93 2900 Downtown 9000 Edmonton 93 Hays 2900 Downtown 9000 Edmonton Hays Jackson 23 1200 Redwood 21000 Edmonton 23 Jackson 1200 Redwood 21000 Edmonton Jackson Smith 23 2000 Redwood 21000 Edmonton 23 Smith 2000 Redwood 21000 Edmonton Smith Hays 19 2900 SUB 17000 Edmonton 19 Hays 2900 SUB 17000 Edmonton Hays Turner 19 500 SUB 17000 Edmonton 19 Turner 500 SUB 17000 Edmonton Turner Brooks 19 2200 SUB 17000 Edmonton 19 Brooks 2200 SUB 17000 Edmonton Brooks What if a customer wishes to open an account but not a loan ? What will happen if we do a join ? Dr. Osmar R. Zaïane, 2001-2003 Dr. Osmar R. Zaïane, 2001-2003 CMPUT 391 – Database Management Systems University of Alberta 11 CMPUT 391 – Database Management Systems University of Alberta 12
The Evils of Redundancy Database Design Theory • Redundancy is at the root of several problems associated with relational schemas: • Database Design Process – redundant storage, insert/delete/update anomalies • Integrity constraints, in particular functional • Redundancy Anomalies dependencies , can be used to identify schemas with • Functional Dependencies such problems and to suggest refinements. • Main refinement technique: decomposition • Armstrong Axioms and Derived Rules (replacing ABCD with, say, AB and BCD, or ACD • Normal Forms and ABD). • Decomposition of Relations • Decomposition should be used judiciously: – Is there reason to decompose a relation? – What problems (if any) does the decomposition cause? Dr. Osmar R. Zaïane, 2001-2003 13 Dr. Osmar R. Zaïane, 2001-2003 14 CMPUT 391 – Database Management Systems University of Alberta CMPUT 391 – Database Management Systems University of Alberta Functional Dependencies (FDs) Database Design Theory → • A functional dependency X Y holds over relation R if, for every allowable instance r of R: • Database Design Process ∈ π X π X π Y π Y ∈ ( t1 ) = ( t2 ) implies ( t1 ) = ( t2 ) – t1 r, t2 r, • Redundancy Anomalies – i.e., given two tuples in r , if the X values agree, then the Y values • Functional Dependencies must also agree. (X and Y are sets of attributes.) • An FD is a statement about all allowable relations. • Armstrong Axioms and Derived Rules – Must be identified based on semantics of application. • Normal Forms – Given some allowable instance r1 of R, we can check if it violates some FD f , but we cannot tell if f holds over R! • Decomposition of Relations → • K is a candidate key for R means that K R → – However, K R does not require K to be minimal ! Dr. Osmar R. Zaïane, 2001-2003 Dr. Osmar R. Zaïane, 2001-2003 CMPUT 391 – Database Management Systems University of Alberta 15 CMPUT 391 – Database Management Systems University of Alberta 16
Recommend
More recommend