bcnf revisited 40 years normal forms
play

BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After - PowerPoint PPT Presentation

Full set of slides A. Skrobov BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After F. Ferrarotti, S. Hartmann, H. K ohler, S. Link, M. Vincent , The Boyce-Codd-Heath Normal Form for SQL , to appear in 18th Workshop on Logic,


  1. Full set of slides A. Skrobov BCNF revisited: 40 Years Normal Forms Part III : BCNF for SQL After F. Ferrarotti, S. Hartmann, H. K¨ ohler, S. Link, M. Vincent , The Boyce-Codd-Heath Normal Form for SQL , to appear in 18th Workshop on Logic, Language, Information and Computation (WoLLIC), 2011 Presentation prepared by Artyom Skrobov (329000723), tyomitch@cs.technion.ac.il Faculty of Computer Science Technion - IIT, Haifa 2011 1

  2. Full set of slides A. Skrobov Overview Part I • Normal forms and functional dependencies • BCNF and redundancy • BCNF and update anomalies Part II • BCNF and storage saving • Achieving BCNF • Other normal forms Part III • BCNF for SQL 2

  3. Full set of slides A. Skrobov Part III: BCNF for SQL SQL supports data model that diverges from the conventional relational model. • Bag semantics : duplicate rows allowed in a table. • Incomplete data : NULL values are allowed, and treated specially. Nevertheless, syntactic tools are needed to reason about dependencies and their inference in SQL data model. Such tools can be applied to reason about non-redundancy of the data, and thus to improve the DBMS performance, while maintaining data integrity. 3

  4. BCNF for SQL A. Skrobov Real-world databases To be practical, the relational model needs adaptation. • Bag (multiset) semantics : removing duplicate tuples from the relation after each operation is prohibitively expensive; thus, deduplication is only performed when requested explicitly. • NULL values : alleviate insertion and deletion anomalies in “kind-of”- denormalized schemata. Title Rating Director Apocalypse Now 8.6 Coppola Example: Blade Runner 8.3 Scott The Birds NULL Hitchcock [Title,Rating,Director] with { Title → Rating,Director } is in BCNF, but in the relational model we can’t represent a movie title without a rating; we can’t add a new title to the database until we know both its rating and the director; and we can’t delete an expired rating from the database without deleting the movie title as well. 4

  5. BCNF for SQL A. Skrobov Real-world databases The relational solution is to use two relations having a common key: Title Rating Apocalypse Now 8.6 Blade Runner 8.3 Title Director Apocalypse Now Coppola Blade Runner Scott The Birds Hitchcock Obviously it’s a waste of space if most rows are total (include all the data), since the decomposed schema stores each key multiple times. 5

  6. BCNF for SQL A. Skrobov Bag schemata • bag schema is a finite, non-empty set B of attributes . Each attribute A is associated with a domain of possible values, denoted dom ( A ) • bag tuple is a function t : B → ∪ A ∈ B dom ( A ) such that ∀ A ∈ B : t ( A ) ∈ dom ( A ) • bag (instance) is a finite multiset b of tuples. Title Rating Director Apocalypse Now 8.6 Coppola Example: Blade Runner 8.3 Scott Blade Runner 8.3 Scott Note that though this bag has duplicate rows, it still satisfies the classical definition of functional dependency Title → Rating,Director: for each movie title, there is only a single rating and only a single director. 6

  7. Bag schemata A. Skrobov FD’s vs. keys in bags • The classical definition of functional dependency (FD) still applies: for X, Y ⊆ B , a bag b is said to satisfy the FD X → Y iff ∀ t 1 , t 2 ∈ b : t 1 [ X ] = t 2 [ X ] → t 1 [ Y ] = t 2 [ Y ]; that is, if tuples having the same value for X must also have the same value for Y . • In a relation R , a set of attributes X ⊆ R was called a (super)key if X → R ; that is, if no two distinct tuples can have the same value for X . This definition needs generalization for bags. A bag b is said to satisfy the key constraint (a.k.a. unique constraint ) key ( X ) iff ∀ t 1 , t 2 ∈ b : t 1 [ X ] = t 2 [ X ] → t 1 = t 2 . Obviously, key ( X ) | = X → B ; but the reverse doesn’t hold: in the pre- vious example, | = b Title → B but �| = b key ( Title ) because of the duplicate rows. A bag having duplicate rows violates every key. 7

  8. Bag schemata A. Skrobov Axiomatization for total bags The following set of inference rules is sound and complete for bags: key ( X ) • (demotion) X → Y key ( Y ) , X → Y • (pullback) key ( X ) – These two rules specify the equivalence between key ( X ) and X → B . key ( X ) The superkey rule key ( XY ) is a special case of the pullback rule. • XY → X (reflexivity) X → Y • XZ → Y Z (augmentation) X → Y, Y → Z • (transitivity) X → Z – These three rules are unchanged from the axiomatization for FD’s on relations. Dubious: what is cited in place of the classic augmentation doesn’t look equivalent! 8

  9. Bag schemata A. Skrobov BCNF for total bags The semantics of BCNF remains unchanged: “no fact is stored twice”. The last example satisfies the relational definition of BCNF (for each non- trivial X → Y ∈ Σ ∗ , there must be X → R ∈ Σ ∗ ) but is blatantly redundant because of the duplicate row. For a tuple t ∈ b and an attribute A ∈ B , a replacement of t [ A ] is defined as a tuple t ′ such that t ′ [ B \ A ] = t [ B \ A ] , t ′ [ A ] � = t [ A ]. Given a set Σ of keys and FD’s, the data value t [ A ] is called redundant iff every replacement t ′ of t [ A ] results in a bag b ′ = ( b \ { t } ) ∪ { t ′ } that violates Σ. In the example, the name “Scott” in the last row is redundant, because any its replacement would violate Title → Director . BCNF is now defined for bags as: for each non-trivial X → Y ∈ Σ ∗ , there must be key ( X ) ∈ Σ ∗ (generalization of the original formulation, using the notion of key constraints) 9

  10. Bag schemata A. Skrobov BCNF for total bags An equivalent definition of BCNF: for each non-trivial X → Y ∈ Σ, there must be X → B ∈ Σ ∗ , and additionally, if Σ has non-trivial FD’s, then it must also have a key. Intuition: for BCNF, the schema must have a key, and therefore, must be a relation. Theorem: B , Σ is in BCNF iff no bag satisfying Σ has redundant data values. The implication of FD’s and keys on (total) bags is decidable in linear time; therefore, a schema can be checked for BCNF compliance in O ( | Σ FD ||×� Σ � ), where Σ FD is the set of functional dependencies defined over the schema. 10

  11. Partial schemata A. Skrobov Incomplete data • Incomplete data is represented as partial relations : some of the attributes in a tuple may not have a value. – E.g., in SQL, some columns are defined as nullable , and the others are constrained as NOT NULL. • A tuple that has values assigned to attributes X ⊆ R is called an X-total tuple , and a relation R consisting only of R -total tuples is a total relation . • A set X of non-nullable attributes is called a null-free subschema (NFS); the NFS is satisfied by a relation if all its tuples are X -total. NFS of a relation R is normally denoted as R s . 11

  12. Partial schemata A. Skrobov Incomplete data NULL value is included in every domain, and it’s special with regard to dealing ? with the equality NULL =NULL. It may be interpreted as “missing value” (not participating in any constraints) or as “existing but unknown value” (there must exist a matching total relation that satisfies the constraints). The prohibition of duplicate tuples is generalized for partial relations: a rela- tion R must not have tuples t 1 , t 2 that subsume one another, i.e., ∀ A ∈ R : ( t 1 [ A ] = t 2 [ A ]) ∨ ( t 2 [ A ] = NULL ) Partial bags are defined identically, but without the restriction of non-subsumption. 12

  13. Partial schemata A. Skrobov NULL consistency In the “missing value” interpretation, only X -total tuples are considered in joins R ⊲ ⊳ X = R ∩ S S , dependencies X → Y , X ։ Y , and constraints key ( X ). r Article Supplier Location Cost Example: Kiwi G6Kiwi NULL 1.50 t 1 Kiwi G6Kiwi NULL 2.50 t 2 The set of dependencies Σ = { Article → Supplier ; Article, Location → Cost ; Supplier ։ Location } would in a null-free schema imply Article → Cost and Article ։ Location . One can see that | = r Σ; specifically, | = r → and Article Supplier = r Supplier → Location in the normal sense, and | = r Location → Cost because | there are no Location -total tuples in r . Nevertheless, �| = r Article → Cost , indi- cating that the null-free inference rules aren’t sound for the partial schema. As for the “unknown value” interpretation, note that Article, Location → Cost requires t 1 [ Location ] � = t 2 [ Location ], while at the same time, Supplier ։ Location requires t 1 [ Location ] = t 2 [ Location ]. Therefore, �| = r Σ. 13

  14. Partial schemata A. Skrobov Logic of paradox • NULL-consistency substitutes the classical propositional logic with logic of paradox (LP) based on three boolean values: T (true), F (false), and P (paradoxical) – The latter is represented in SQL as boolean NULL – Equality of NULL value to any data value is P • A logic formula is valid if it evaluates to T or P . Then, it’s possible to map FD’s, MVD’s, and key constraints to logic for- mulae, and partial bags – to LP interpretations, so that the constructed interpretation violates the formulae iff the original bag violates the dependen- cies; in other words, LP is sufficient to reason about dependency implications. NFS is accounted for by restricting the interpretation to not assign P to the variables corresponding to non-nullable attributes. 14

Recommend


More recommend