How do we query (specify what info we want from) the database? Find all the employees who earn more CS411 than $50,000 and pay taxes in Champaign-Urbana. Database Systems • Could write in C++/Java, but who would want to? 05: Relational Algebra • Instead use high-level query languages: Kazuhiro Minami – Theoretical: Relational algebra – Practical: SQL Relational algebra has 5 operations Union takes the set union of two relations Input = relation(s), output = relations OldDiagnosis NewDiagnosis Patient Disease Patient Disease –Set union: ∪ Winslett Strep Winslett Hantavirus Zhai Meningitis –Set difference: − Zhai Meningitis Han Ebola Chang Cholera –Selection: σ OldDiagnosis ∪ NewDiagnosis –Projection: π Patient Disease –Cartesian product: × Reminder: Input and output Winslett Strep sets have no relations need to Zhai Meningitis duplicates have the same Han Ebola Can add some syntactic sugar and/or define Winslett Hantavirus schema new operators in terms of these Chang Cholera
Difference takes the set Sometimes we’d like to name or difference of two relations rename the output (syntactic sugar) OldDiagnosis NewDiagnosis Patient Disease AllDiag(Patients, Diseases) := OldDiagnosis ∪ NewDiagnosis Patient Disease Winslett Strep Winslett Hantavirus or Zhai Meningitis Zhai Meningitis ρ AllDiag( Patients, Diseases) (OldDiagnosis ∪ NewDiagnosis) Han Ebola Chang Cholera WrongDiagnosis := OldDiagnosis − NewDiagnosis Patients Diseases Winslett Strep Zhai Meningitis Patient Disease Han Ebola Winslett Strep Winslett Hantavirus Han Ebola Chang Cholera Selection conditions can be Selection keeps only the tuples that relatively complex satisfy a particular condition = < > ≤ ≠ Diagnosis Attribute names Patient Disease Temperature Patient Patient = “Winslett” Find all patients who Winslett Strep 98.9 have a fever Salary < 40,000 Disease Zhai Meningitis 101.1 Han Ebola 96.6 Salary σ Temperature > 98.6 (Diagnosis) Winslett Hantavirus 98.6 Constants and or not Chang Cholera 102.3 “ Winslett” Patient = “Winslett” and Temperature > “Meningitis” Better for everyone’s sake if we write this as 98.6 40,000 σ[ Temperature > 98.6] ( Diagnosis) You can write it any of these two ways in this class.
Projection eliminates all but the listed Selection Example columns, and puts them in the listed order Employee Diagnosis SSN Name DepartmentID Salary Patient Disease Temperature ⊆ List all the patients and 999999999 John 1 30,000 Winslett Strep 98.9 their diagnoses 777777777 Tony 1 32,000 Zhai Meningitis 101.1 Han Ebola 96.6 888888888 Alice 2 45,000 Winslett Hantavirus 98.6 Find all employees with salary more than $40,000. π Disease, Patient (Diagnosis) σ Salary > 40000 (Employee) Disease Patient For convenience, we may write SSN Name DepartmentID Salary Strep Winslett π [Disease, Patient] (Diagnosis) 888888888 Alice 2 45,000 Meningitis Zhai Ebola Han Hantavirus Winslett The columns you project onto have Projection Example to actually exist Employee SSN Name DepartmentID Salary 999999999 John 1 30,000 π [Salary, Town] Diagnosis 777777777 Tony 1 32,000 π[ Disease] Employee 888888888 Alice 2 45,000 Π SSN, Name (Employee) Formally, π A1 , …, An ( R ) is a legal SSN Name relational algebra expression if each 999999999 John of A1 , …, An is an attribute of R 777777777 Tony 888888888 Alice
The cartesian product of two relations Cartesian Product Example is usually enormous Employee Diagnosis Name SSN RareDiseases John 999999999 Patient Disease Disease Tony 777777777 Winslett Strep Take each Zhai Meningitis Ebola possible Dependents Han Ebola Hantavirus combination of EmployeeSSN Dname one tuple from 999999999 Emily Diagnosis × RareDiseases the first relation 777777777 Joe and one tuple Patient Diagnosis.Disease RareDiseases.Disease from the second Employee x Dependents Winslett Strep Ebola relation Name SSN EmployeeSSN Dname Zhai Meningitis Ebola John 999999999 999999999 Emily Han Ebola Ebola John 999999999 777777777 Joe Winslett Strep Hantavirus (may need to Tony 777777777 999999999 Emily Zhai Meningitis Hantavirus rename some Tony 777777777 777777777 Joe Han Ebola Hantavirus attributes) Relational algebra = every expression you can make using these 5 operators (plus renaming) Derived RA Any relation name is a relational algebra expression. Operations If R and S are relational algebra expressions, then so are R – S, R ∪ S and R × S. If R is a relational algebra expression and θ is a selection condition, then σ [ θ ] R is a relational algebra expression. Intersection, join If R is a relational algebra expression and L is a list of attributes of R , then π [L] R is a relational algebra expression. Nothing else is a relational algebra expression.
Intersection can be defined in A join is a cartesian product followed terms of difference immediately by a selection OldDiagnosis NewDiagnosis OldDiagnosis NewDiagnosis Patient Disease Patient Disease Patient Disease Winslett Strep Patient Disease Winslett Hantavirus Winslett Strep Zhai Meningitis Winslett Hantavirus Zhai Meningitis Zhai Meningitis Han Ebola Zhai Meningitis Chang Cholera Han Ebola Chang Cholera Who has an old diagnosis that is different from one of their RightDiagnosis = OldDiagnosis ∩ NewDiagnosis new diagnoses? = OldDiagnosis – (OldDiagnosis – NewDiagnosis) π [#1] σ [#1 = #3 and #2 ≠ #4] (OldDiagnosis × NewDiagnosis) More generally, R ∩ S = R – (R – (S)). n n o o i i j j A A Patient Patient Disease Winslett Zhai Meningitis How does that work? BothDiagnoses = OldDiagnosis NewDiagnosis σ [Pat1 = Pat2 and Dis1 ≠ Dis2] (Temp) Patient Disease Patient Disease Winslett Strep Winslett Hantavirus Zhai Meningitis Zhai Meningitis Temp Han Ebola Chang Cholera Pat1 Dis1 Pat2 Dis2 Temp(Pat1, Dis1, Pat2, Dis2) = OldDiagnosis × NewDiagnosis Winslett Strep Winslett Hantavirus Pat1 Dis1 Pat2 Dis2 Zhai Meningitis Winslett Hantavirus Winslett Strep Winslett Hantavirus Han Ebola Winslett Hantavirus Zhai Meningitis Winslett Hantavirus Winslett Strep Zhai Meningitis Han Ebola Winslett Hantavirus Zhai Meningitis Zhai Meningitis Winslett Strep Zhai Meningitis Han Ebola Zhai Meningitis Zhai Meningitis Zhai Meningitis Winslett Strep Chang Cholera Han Ebola Zhai Meningitis 19 Zhai Meningitis Chang Cholera Winslett Strep Chang Cholera Han Ebola Chang Cholera Zhai Meningitis Chang Cholera Han Ebola Chang Cholera
FinalAnswer = BothDiagnoses = σ [Pat1 = Pat2 and Dis1 ≠ Dis2] (Temp) π [Pat1] BothDiagnoses BothDiagnoses Pat1 Dis1 Pat2 Dis2 Winslett Strep Winslett Hantavirus BothDiagnoses Pat1 Dis1 Pat2 Dis2 FinalAnswer Winslett Strep Winslett Hantavirus Pat1 Winslett There is a convenient Natural joins join on attributes with shorthand for joins the same name Relational algebra Employees Managers expressions Emp Dept Dept Mgr Winslett Complaint Complaint Mendez = σ θ ( R × S ) R ⋈ θ S Zhai Toy Toy Smith Han Toy Returns Chu I’ll let you write it as Employees ⋈ Managers R ⋈ [ q ] S A selection Emp Dept Mgr condition Winslett Complaint Mendez Zhai Toy Smith This is called a θ -join, or an equijoin when θ is = . Han Toy Smith
A natural join is an equijoin on all Natural joins don’t always attributes with the same name, followed make sense by removal of the duplicate attributes R ⋈ S = Emp(name, dept) Dept (name, mgr) π everything but the duplicate attributes σ R.A1=S.A1 and…and R.An=S.An ( R × S ) Emp ⋈ Dept is nonsensical A1 through An are all the attributes R and S have in common E M Emp Dept Sal Mgr Dept Your first real query: who makes more Jones Missiles 10K Mendez Tanks than their manager? Chu Tanks 20K Swami Explosives Swami Explosives 50K Jones Missiles Mendez Tanks 10K E(emp, dept, sal) M(mgr, dept) Benson Explosives 40K ESM = π [Emp, Sal, Mgr] (E ⋈ E ⋈ M ESM(emp, sal, mgr) = π [emp, sal, mgr] (E ⋈ M) M) Emp Dept Sal Mgr Emp Sal Mgr Jones 10K Jones Jones Missiles 10K Jones Chu 20K Mendez Chu Tanks 20K Mendez π [ESM.emp](ESM ⋈ [mgr = E.emp AND ESM.sal > E.sal] E) Swami Explosives 50K Swami Swami 50K Swami Mendez Tanks 10K Mendez Mendez 10K Mendez Benson Explosives 40K Swami Why??? Benson 40K Swami ESM ⋈ [ESM.Mgr = E.Emp AND ESM.Sal > E.Sal] E Emp Sal Mgr Emp Dept Sal 28 Chu 20K Mendez Mendez Tanks 10K
You can define relational algebra on bags Summary of relational algebra instead of sets (closer match to SQL) Basic primitives: • Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} σ [C] (E) – add the number of occurrences π [A 1 , …, A n ] (E) E1 × E2 • Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} E1 ∪ E2 – subtract the number of occurrences • Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c} E1 – E2 ρ [S(A 1 , …, A n )] (E) – minimum of the two numbers of occurrences • Selection: preserve the number of occurrences Abbreviations: • Projection: preserve the number of occurrences (no E1 ⋈ E2 duplicate elimination) E1 ⋈ C E2 • Cartesian product, join: no duplicate elimination E1 ∩ E2 More detail in the book (Chapter 5.1)
Recommend
More recommend