Databases Relational algebra Lectures for mathematics students Zbigniew Jurkiewicz, Institute of Informatics UW March 5, 2017 Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Relational algebra Theoretical model for describing the semantics of relational databases, proposed by T. Codd (who authored the concept of relational databases). Algebra over the domain of relations, i.e. variables occuring in its expressions represent relations. Operators are defined to parallel typical operations needed to search for information in database tables. Originally postulated also as a query language for relational databases, each expression being equivalent to some query. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Relations Each relation is represented by name. With each relation name we associate its schema — a sequence of attributes (corresponding to the columns of the table being modeled), e.g. R ( A , B , C ) Student(id, first-name, last-name) Attribute names in a relation schema must be different. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Basic operations Ordinary binary set-theoretic operations: the union of sets ( ∪ ), the intersection of sets ( ∩ ) and the difference of sets ( − ), but both arguments must have the same schema Cartesian product R × S As arguments could have attributes with the same names, the column names in result schema have sometimes to be prefixed with names of relations of origin Thus for relations R ( A , B , C ) i S ( C , D , E ) the schema of their cartesian product will be R × S ( A , B , R . C , S . C , D , E ) Better yet, we should use renaming. Selection σ condition ( R ) : we select only the tuples of relation R satisfying condition . Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Basic operations Projection π attribute 1 ,..., attribute n ( R ) : we select only the indicated attributes from relation R . Renaming ρ S ( R ) : changes the relation name or the names of some of its attributes ρ R ( X , Y , Z ) R , sometimes both at once ρ S ( X , Y , ZX ) R . Join R ✶ θ S : similar to cartesian product, but joins only the pairs of tuples satisfying a given condition θ . We will look at different variants in a moment. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Selection Relation Animals : species name weight Parrot Kropka 3.50 Parrot Lulu 5.35 Parrot Hipek 3.50 Fox Fufu 6.35 Crocodile Czako 75.00 σ species = ′ Parrot ′ Animals : species name weight Parrot Kropka 3.50 Parrot Lulu 5.35 Parrot Hipek 3.50 Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Projection Duplicate rows are eliminated (we deal with relations !) π species , weight Animals : species weight Parrot 3.50 Parrot 5.35 Fox 6.35 Crocodile 75.00 Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Generalized projection In addition to the names of attributes, we allow expressions on attributes, e.g. arithmetical expressions. They should be explicitly named: A + B → C Some attributes may occur more than once. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Generalized projection R = A B 1 2 3 4 π A + B → C , A , A → A 1 R = C A A1 3 1 1 7 3 3 Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Cartesian product R2 = B C R1 = A B 1 2 5 6 3 4 7 8 9 10 R1 × R2 = A R1.B R2.B C 1 2 5 6 1 2 7 8 1 2 9 10 3 4 5 6 3 4 7 8 3 4 9 10 Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Theta-join θ S = σ θ ( R × S ) R ✶ θ stands for any expression on attributes from joined relations, e.g. A < C . A theta-join where the condition has the form of equality of two attributes is called equijoin . Dangling tuple : a tuple from one of relations, which matches no tuple from the other relation. Its contents will be eliminated from the result. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Theta-join Animals Species species name weight name continent Parrot Kropka 3.50 Parrot America Parrot Lulu 5.35 Fox Europe Parrot Hipek 3.50 Crocodile Africa Fox Fufu 6.35 Crocodile Czako 75.00 Animals species = name Species ✶ species Animals.name weight Species.name continent Parrot Kropka 3.50 Parrot Ameryka Parrot Lulu 5.35 Parrot Ameryka Parrot Hipek 3.50 Parrot Ameryka Fox Fufu 6.35 Fox Europa Crocodile Czako 75.00 Crocodile Africa Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Natural join Notation: R ✶ S . The relations to be joined should have at least one common attribute with the same name (otherwise we will have cartesian product). The condition is a conjunction of equalities for all pairs of common attributes. The result will contain only one attribute from each pair of common attributes. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Natural join Animals Species species name weight species continent Parrot Kropka 3.50 Parrot America Parrot Lulu 5.35 Fox Europe Parrot Hipek 3.50 Crocodile Africa Fox Fufu 6.35 Crocodile Czako 75.00 Animals ✶ Species species name weight continent Parrot Kropka 3.50 America Parrot Lulu 5.35 America Parrot Hipek 3.50 America Fox Fufu 6.35 Europe Crocodile Czako 75.00 Africa Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Renaming Used to name result relations: ρ RS ( A , B , X , C , D , E ) ( R × S ) . Simplified notation: R 1 ( A 1 , B , X , C , D , E ) := ( R × S ) . Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Compound expressions As in any algebra, the operations may be composed and nested to obtain compound expressions. The equivalence of expressions is often used with the optimization of queries to replace the entered expression with the equivalent, more effective one. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Compound expressions: self-join Animals species name weight Parrot Kropka 3.50 Parrot Lulu 5.35 Parrot Hipek 3.50 Fox Fufu 6.35 Crocodile Czako 75.00 Find the names of all pairs of animals of the same species π Z 1 . name , Z 2 . name ( ρ Z 1 Animals ρ Z 2 Animals ) ✶ Z 1 . species = Z 2 . species ∧ Z 1 . name < Z 2 . name (the second condition was added to prevent “duplicates”). Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Multisets (bags) According to mathematical definition of relation the duplicate tuples occuring in results of some operations (e.g. projection) should be eliminated. By allowing repetitions we can extend this algebra to multisets. This is harder than it sounds: there is a problem of the appropriate semantics for set theoretic operations of intersection and difference. The intuitive extensions of most operations are closed on relations with the exception of sum, which for two relations may return multiset. Some laws of relational algebra are not true for multisets, np. ( R ∪ S ) − T = ( R − T ) ∪ ( S − T ) There is an operator to convert multiset to relation by eliminating duplicates: δ ( R ) . Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Grouping To model statistical queries on databases we introduce grouping operator with optional computation of aggregate functions γ A , MIN ( B ) → MinB ( R ) Note that γ A 1 ,..., A n ( R ) = δ ( R ) if A i are all attributes of R . Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Sorting The sorting operator is written as τ C , B ( R ) . As it does not have sense for relational or multiset algebras (only for possible list algebra), it should always be the outermost operator of any expression! Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Outer joins They try to preserve dangling tuples when building the result The special value ⊥ stands for missing values from the other side; ◦ Full: R ✶ S ◦ Left: R ✶ L S We take only dangling tuples from the first argument; ◦ Right: R ✶ R S ; Of course instead of natural join theta-join could be used. Zbigniew Jurkiewicz, Institute of Informatics UW Databases Relational algebra Lectures for mathematics students
Recommend
More recommend