SEM Feedback • Mostly positive feedback • Some issues: Normalisation • Some examples don’t map to real life scenarios • Large gap until the 4pm lecture • 9am labs are too early! Database Systems Michael Pound SEM Feedback This Lecture • More detail on SQL insertion/injection attacks • Normalisation • I’ll be showing you how to do (and defend against) • Data Redundancy SQL insertion attacks in the security lecture. • Functional Dependencies • More information on data storage • Normal Forms • I’ll cover data storage in the lecture on efficiency • First, Second and Third Normal Forms and storage • Further reading • Solutions to labs • The Manga Guide to Databases, Chapter 3 • Will appear on the website • Dim the lights a little! • Database Systems, Chapter 14 Redundancy and Normalisation Normalisation • Redundant data • Normalisation • Normalisation is a formal process for • Can be determined from • Aims to reduce data something that you will often do naturally other data in the redundancy • When you create your tables, they’ll often be database • Redundancy is expressed normalised already • Leads to various in terms of functional • E/R diagrams help produce normalised tables problems dependencies • INSERT Anomalies • Normal forms are • Despite being somewhat common sense, it’s • UPDATE Anomalies defined that don’t good to have a formal process we can use • DELETE Anomalies contain specific types of functional dependency 1
First Normal Form Normalisation to 1NF • In most definitions of • A relation is said to be • To convert any relation into 1NF, split any non- the relational model first normal form (1NF) atomic values • All data values should be if all data values are 1NF Module Dept Lecturer Text atomic atomic Unnormalised M1 D1 L1 T1 • This means that table Module Dept Lecturer Texts M1 D1 L1 T2 entries should be single M1 D1 L1 T1, T2 M2 D1 L1 T1 values, not sets or M2 D1 L1 T1, T3 M2 D1 L1 T3 composite objects M3 D2 L2 T4 M3 D1 L2 T4 • Simplifies queries and M4 D2 L3 T1, T5 M4 D2 L3 T1 data comparisons M5 D2 L4 T6 M4 D2 L3 T5 M5 D2 L4 T6 Problems with 1NF Functional Dependencies • Redundancy is often • A set of attributes, A, • INSERT Anomalies caused by a functional functionally determines 1NF • Can’t add a module with dependency another set, B, if Module Dept Lecturer Text no texts whenever two rows of • A functional dependency M1 D1 L1 T1 the relation have the • UPDATE Anomalies (FD) is a link between two M1 D1 L1 T2 same values for all the sets of attributes in a M2 D1 L1 T1 • To change the lecture for attributes in A, then they relation M2 D1 L1 T3 also have the same values M1, we will need to • We can normalise a for all the attributes in B. M3 D1 L2 T4 update two rows relation by removing • In this case, we can say M4 D2 L3 T1 • DELETE Anomalies undesirable FDs there exists a functional M4 D2 L3 T5 • If we remove M3, we will dependency between A M5 D2 L4 T6 and B (A B), remove L2 as well Example Properties of FDs • Three notable functional dependencies exist in this • In any relation • Rules for FDs relation: • The primary key • Reflexivity: If B is a • {ID} {First, Last} functionally determines subset of A then • {moduleCode} {moduleName} any set of attributes in A B • {ID, moduleCode} {First, Last, moduleName} that relation • Augmentation: If A B K X then • K is the primary key, X is a A C B C set of attributes ID First Last moduleCode moduleName • Transitivity: If A B and • Same for candidate keys 111 Joe Smith G51PRG Programming B C then • Any set of attributes is A C 222 Anne Jones G51DBS Databases FD on itself X X 2
FDs and Normalisation FD Example • We define a set of • Not all FDs cause a • The Primary Key is 1NF {Module, Text} so 'normal forms' problem Module Dept Lecturer Text • {Module, Text} {Dept, • Each normal form • We identify various M1 D1 L1 T1 Lecturer} M1 D1 L1 T2 has fewer FDs than sorts of FD that do • 'Trivial' FDs, eg: M2 D1 L1 T1 the last • Each normal form M2 D1 L1 T3 • {Text, Dept} {Text} • Since FDs represent removes a type of FD M3 D1 L2 T4 • {Module} {Module} redundancy, each that causes problems M4 D2 L3 T1 • {Dept, Lecturer} { } M4 D2 L3 T5 normal form has less M5 D2 L4 T6 redundancy than the last FD Example FD Diagrams • Other FDs are • Rather than an entire table, FDs can be 1NF • {Module} {Lecturer} represented simply using the headings: Module Dept Lecturer Text • {Module} {Dept} M1 D1 L1 T1 • {Lecturer} {Dept} M1 D1 L1 T2 M2 D1 L1 T1 • These are non-trivial and Module Dept Lecturer Text M2 D1 L1 T3 the determinants (left M3 D1 L2 T4 hand side of the M4 D2 L3 T1 dependency) are not • {Module , Text} is a candidate key, so we put a double box around them • {Lecturer} {Dept}, so we have an arrow from Lecturer to Dept M4 D2 L3 T5 candidate keys. • {Module} {Dept} and {Module} {Lecturer} , so we have M5 D2 L4 T6 {Module} {Dept, Lecturer} Note: Trivial FDs and FDs dependent on an entire candidate key are not included Second Normal Form Normalising to 2NF 1NF • Partial FDs: • Second normal form: • ‘1NF’ is not in 2NF Module Dept Lecturer Text • A FD, A B is a partial • A relation is in second • We have the FD M1 D1 L1 T1 FD, if some attribute of A normal form (2NF) if it is {Module, Text} M1 D1 L1 T2 can be removed and the in 1NF and no non-key M2 D1 L1 T1 {Lecturer, Dept} FD still holds attribute is partially M2 D1 L1 T3 • But also • Formally, there is some dependent on a M3 D1 L2 T4 {Module} {Lecturer, Dept} candidate key proper subset of A , M4 D2 L3 T1 • And so Lecturer and • In other words, no C B C A , such that C B M4 D2 L3 T5 Dept are partially where C is a strict subset M5 D2 L4 T6 dependent on the of a candidate key and B primary key is a non-key attribute. Module Dept Lecturer Text 3
Normalising to 2NF Normalising to 2NF • We need to remove FD A B in order to • Suppose we have a relation • It turns out that we can split R with scheme S and the FD R into two parts: convert the relation to 2NF A B where • R1, with scheme A C A B = { } • R2, with scheme A B • The original relation can be • Let C = S - (A B) recovered as the natural Module Dept Lecturer Text • In other words: join of R1 and R2: • A – attributes on the left • R = R1 NATURAL JOIN R2 hand side of the FD • B – attributes on the right hand side of the FD • C – all other attributes Normalising to 2NF Normalising to 2NF • We need to remove FD A B in order to • We need to remove FD A B in order to convert the relation to 2NF convert the relation to 2NF Module Dept Lecturer Text Module Dept Lecturer Text A B A – The determinant of the functional dependency B – The dependant attributes of the functional dependency Normalising to 2NF Normalising to 2NF • We need to remove FD A B in order to • To convert to 2NF, create two relations A B and A C convert the relation to 2NF Module Dept Lecturer Text Module Dept Lecturer Text C A B C C – All remaining attributes in the relation 4
Normalising to 2NF Normalising to 2NF 2NFb Module Text 1NF 2NFa M1 T1 Module Dept Lecturer Module Dept Lecturer Text M1 T2 M1 D1 L1 M2 T1 M2 D1 L1 M2 T3 M3 D1 L2 M3 T4 M4 D2 L3 M4 T1 M5 D2 L4 2NF A B 2NF A C M4 T5 M5 T6 Module Dept Lecturer Module Text Module Dept Lecturer Module Text Problems Resolved in 2NF Problems Remaining in 2NF • INSERT Anomalies • INSERT Anomalies • We can’t add lecturers • We can now add a who don’t currently 2NFa 2NFa module without texts teach modules Module Dept Lecturer Module Dept Lecturer • UPDATE Anomalies • UPDATE Anomalies M1 D1 L1 M1 D1 L1 • We only need to change M2 D1 L1 • To change the M2 D1 L1 a single row when M3 D1 L2 department for L1, we M3 D1 L2 changing a module must change two rows M4 D2 L3 M4 D2 L3 lecturer • DELETE Anomalies M5 D2 L4 M5 D2 L4 • To delete module M3, we must delete L2 Transitive FDs and 3NF Normalising to 3NF • To move a relationship from • Transitive FDs: • Third normal form 2NFa 2NF to 3NF: • A FD, A C is a • A relation is in third Module Dept Lecturer • Given the transitive FD A transitive FD, if there is normal form (3NF) if it is B C some set B such that in 2NF and no non-key • We split the relation into A B and B C are attribute is transitively two new relations • 2NFa is not in 3NF • The first contains all of the dependent on a non-trivial FDs • There are FDs columns contained in B and candidate key {Module} {Lecturer} • A B non-trivial means: C {Lecturer} {Dept} B is not a subset of A • The second contains all of • So there is a transitive FD the columns which are not • Essentially from Primary key {Module} to contained in A, B or C and A B C {Dept} the columns contained in A and B 5
Recommend
More recommend