Normalization Cs386 - Introduction to Database Systems Jay Urbain, PhD Credits: Data Management, Richard Watson Database Management Systems, Raghu Ramkrishnan, Johannes Gehrke
Normalization n An alternative/adjunct database design tool to data modeling. n Schema refinement. n A theoretical foundation for the relational model. n Application of a series of rules that gradually improve the design.
Functional dependency n A relationship between the attributes of an entity where: ¨ One or more attributes determine the value of another attribute n An identifier can functionally determine all of the attributes of an entity ¨ stock code -> firm name, stock price, stock quantity, stock dividend ¨ If we know stock code, we know the value of firm name, etc. n Multi-valued dependency ¨ Formula n (stock dividend, stock price) - > yield
Full functional dependency n Yield is fully functionally dependent on stock dividend and stock price because both of these attributes are required to determine the value of yield ¨ (stock dividend, stock price) - > yield n Determinant ¨ An attribute that fully functionally determines another attribute n e.g., stock code determines stock PE
Multidetermination n A given value can determine multiple values ¨ A multidetermines B ¨ A - > B ¨ E.g., Department multidetermines course
Attribute relationships n One-to-one ¨ A value of an attribute determines the value of another attribute and vice versa ¨ A - > B and B - > A ¨ e.g.: n CH - > Switzerland n Switzerland - > CH
Attribute relationships n One-to-many ¨ A value of one attribute determines the value of another attribute but not vice versa ¨ A - > B ¨ e.g.: n country name - > currency unit n currency unit NOT - > country name
Attribute relationships n Many-to-many ¨ Neither attribute determines the other ¨ A not - > B ¨ B not - > A n country name not - > language n language not - > country name ¨ French and Flemish is spoken in Belgium ¨ French is spoken in many countries
Normal forms n A classification of relations n Innermost is first normal form ¨ I.e. each normal form is a prerequisite for the next higher normal form.
First normal form (1NF) n Only scalar attributes ¨ All rows must have the same number of columns ¨ Single valued attributes only ¨ "A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets.” n E. F. Codd (Oct 1972). "Further normalization of the database relational model”.
Second normal form (2NF) n All columns are fully functionally dependent on the primary key ¨ Violated when a nonkey column is a fact about part of the primary key ¨ customer-credit in this case is not dependent on itemno order itemno customerid quantity customer-credit 12 57 25 OK 34 679 3 POOR
Third normal form (3NF) n No Transitive dependencies ¨ Violated when a nonkey column is a fact about another nonkey column ¨ A column is not fully functionally dependent on the primary key ¨ exchange rate in this case is a fact about nation stock stock code nation exchange rate MG USA 0.67 IR AUS 0.46
Boyce-Codd normal form (BCNF) n Violation of BCNF arises when a table ¨ has multiple candidate keys ¨ the candidate keys are composite ¨ the candidate keys overlap advisor client probtype consultant Alpha Marketing Gomez Alpha Production Raginiski
Fourth normal form (4NF) n A row should not contain two or more multivalued independent facts student studentid sport subject … 50 Football English … 50 Football Music … 50 Tennis Botany … 50 Karate Botany …
Fifth normal form (5NF) n 5NF if decomposition would not remove any redundancies. n Not 5NF if a table can be reconstructed from other tables ¨ There exists some rule that enables a relation to be inferred n Base case ¨ Consultants provide skills to one or more firms, and firms can use many consultants; a consultant has many skills and a skill can be used by many firms; and a firm can have a need for many skills and the same skill can be required by many firms advisor client probtype consultant Alpha Marketing Gomez Alpha Production Raginiski
Fifth normal form (5NF) n Designed to reduce redundancy in databases recording multi-valued facts by isolating semantically related multiple relationships. n The rule - If a consultant has a certain skill (e.g., database) and has a contract with the firm that requires that skill (e.g., IBM), then the consultant advises the firm on that skill (i.e., he advises IBM on database).
Domain key/ normal form (DK/NF) n Every constraint on the table is a logical consequence of the table's domain constraints and key constraints . n Key: unique identifier n Constraint: rule governing attribute values n Domain: set of values of the same data type
Referential integrity n Property of data which, when satisfied, requires every value of one attribute of a relation to exist as a value of another attribute in a different (or the same) relation (table). n Foreign Key constraint
Data modeling and normalization n Trick: look for invalid relationships: ¨ Expressed as 1:1 or single entity, when relationship is 1:m ¨ 1:m when relationship is m:m n Data modeling is often an easier path to good database design n A high-fidelity data model will be of high normal form n 5NF is likely to create the most problems ¨ Check for special rules
Key points n Normalization is one approach to data modeling. n Based on eliminating redundancies. ¨ Note: eliminating redundancies can increase the number of table joins which can reduce performance. n Use normalization to complement relational modeling . n A data model can have multiple representations. n Learning to model is difficult. n Learning to represent a model is relatively easy.
Recommend
More recommend