Optimization of Logical Queries Task: Consider the following relational schema: • Emp(eid, did, sal, hobby) • Dept(did, dname, floor, phone) • Finance(did, budget, sales, expenses) For the following SQL statement: 1. Translate the query into the relational algebra. 2. Remove redundant joins from the select-project-join subexpressions in the obtained logical query plan. 3. By means of the algebraic laws, further optimize the obtained expression. 1 Solution of the exercises
Optimization of Logical Queries Task (continued) SELECT D.floor FROM Dept D, Emp E WHERE (D.floor = 1 OR D.floor IN ( SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did) ) AND E.did = D.did AND E.did IN (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300) 2 Solution of the exercises
Optimization of Logical Queries Solution: translation into the relational algebra First, we normalize the query to a form with only EXISTS and NOT EXISTS subqueries: SELECT D.floor FROM Dept D, Emp E WHERE (D.floor = 1 OR EXIST ( SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did AND D2.floor = D.floor) ) AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) 3 Solution of the exercises
Optimization of Logical Queries Conjunctive Normal Form SELECT D.floor FROM Dept D, Emp E WHERE ( D.floor = 1 AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) ) OR ( EXIST ( SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did AND D2.floor = D.floor) AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) ) 4 Solution of the exercises
Optimization of Logical Queries Normalize to UNION Q1 = SELECT D.floor FROM Dept D, Emp E WHERE D.floor = 1 AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) 5 Solution of the exercises
Optimization of Logical Queries Normalize to UNION Q2 = SELECT D.floor FROM Dept D, Emp E WHERE EXIST ( SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did AND D2.floor = D.floor) AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) The new query is Q1 UNION Q2. 6 Solution of the exercises
Optimization of Logical Queries Translation of the innermost subqueries SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did This subquery is translated as follows: e 1 = π F 2 . did ,E. ∗ ,D. ∗ σ F 2 . did = E. did ∧ E 2 . did = D. did ∧ E 2 . eid = E. eid σ F 2 . expenses =300 ∧ E. did = F 2 . did ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp )) 7 Solution of the exercises
Optimization of Logical Queries Translation of the innermost subqueries SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did AND D2.floor = D.floor This subquery is translated as follows: e 2 = π D 2 . floor ,D. ∗ σ F 1 . budget > 150 ∧ D 2 . did = F 1 . did σ D 2 . floor = D. floor ( ρ D ( Dept ) × ρ D 2 ( Dept ) × ρ F 1 ( Finance )) 8 Solution of the exercises
Optimization of Logical Queries Translation of the Middle Queries Q1 = SELECT D.floor FROM Dept D, Emp E WHERE D.floor = 1 AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) The translation of the from part gives e 3 = ( ρ D ( Dept ) × ρ E ( Emp )) To de-correlate we compute: f = ˆ ⋉ π D. ∗ ,E. ∗ ( e 1 ) e 3 ⋊ Note that ˆ e 3 is empty and hence f = π D. ∗ ,E. ∗ ( e 1 ) To this expression we add the WHERE and SELECT clause: e 4 = π D. floor ( σ D. floor =1 ∧ E.did = D.did ( π D. ∗ ,E. ∗ ( e 1 )) 9 Solution of the exercises
Optimization of Logical Queries Translation of the Middle Queries Q2 = SELECT D.floor FROM Dept D, Emp E WHERE EXIST ( SELECT D2.floor FROM Dept D2, Finance F1 WHERE F1.budget > 150 AND D2.did = F1.did AND D2.floor = D.floor) AND E.did = D.did AND EXISTS (SELECT F2.did FROM Finance F2, Emp E2 WHERE F2.did = E.did AND E2.did = D.did AND E2.eid = E.eid AND F2.expenses = 300 AND E.did = F2.did) The translation of the from part gives e 5 = ( ρ D ( Dept ) × ρ E ( Emp )) To de-correlate we compute: f ′ = ˆ ⋉ ( π D. ∗ ,E. ∗ ( e 1 ) ⋊ ⋉ π D. ∗ ( e 2 )) = ( π D. ∗ ,E. ∗ ( e 1 ) ⋊ ⋉ π D. ∗ ( e 2 )) e 5 ⋊ 10 Solution of the exercises
To this expression we add the WHERE and SELECT clause: e 6 = π D. floor σ E.did = D.did ( π D. ∗ ,E. ∗ ( e 1 ) ⋊ ⋉ π D. ∗ ( e 2 )) 11 Solution of the exercises
Optimization of Logical Queries Translation of the Whole Query Q1 UNION Q2 Since the schemas of e 4 and e 6 are the same, the union is straightforward: e = e 4 ∪ e 6 Written in full: e = π D. floor σ D. floor =1 ∧ E.did = D.did π D. ∗ ,E. ∗ σ F 2 . did = E. did ∧ E 2 . did = D. did ∧ E 2 . eid = E. eid ∧ F 2 . expenses =300 ∧ E. did = F 2 . did ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp )) ∪ π D. floor σ E.did = D.did ( [ π D. ∗ ,E. ∗ σ F 2 . did = E. did ∧ E 2 . did = D. did ∧ E 2 . eid = E. eid ∧ F 2 . expenses =300 ∧ E. did = F 2 . did ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp ))] ⋉ [ π D. ∗ σ F 1 . budget > 150 ∧ D 2 . did = F 1 . did ∧ D 2 . floor = D. floor ⋊ ( ρ D ( Dept ) × ρ D 2 ( Dept ) × ρ F 1 ( Finance ))]) 12 Solution of the exercises
Optimization of Logical Queries Redundant Joins Removal The query comprises the following maximal select-project-join subexpressions: • π D. floor σ D. floor =1 ∧ E.did = D.did π D. ∗ ,E. ∗ σ ... ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp )) • [ π D. ∗ ,E. ∗ σ ... ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp ))] • ( ρ D ( Dept ) × ρ D 2 ( Dept ) × ρ F 1 ( Finance )) Note that “ F 1 . budget > 150 ” cannot be included in a select-project-join expression. Also note that the third expression does not contain redundant joins (Why?). 13 Solution of the exercises
Optimization of Logical Queries Redundant Joins Removal The first expression corresponds to: Q 1 (“1”) ← Dept ( a 1 , a 2 , “1” , a 4 ) , Emp ( b 1 , a 1 , b 3 , b 4 ) , Finance ( a 1 , c 2 , c 3 , “300”) , Emp ( b 1 , a 1 , d 3 , d 4 ) The first and third atoms cannot be removed (Why?) We check whether we can remove the second atom: Q 2 (“1”) ← Dept ( a 1 , a 2 , “1” , a 4 ) , Finance ( a 1 , c 2 , c 3 , “300”) , Emp ( b 1 , a 1 , d 3 , d 4 ) The corresponding canonical database: D 2 (“1”) = c 3 , “300”) , Emp ( ˙ a 1 , ˙ d 3 , ˙ { Dept ( ˙ a 1 , ˙ a 2 , “1” , ˙ a 4 ) , Finance ( ˙ a 1 , ˙ c 2 , ˙ b 1 , ˙ d 4 ) } Clearly (“1”) ∈ Q 1 ( D 2 ) because of the matching a 1 �→ ˙ a 2 �→ ˙ a 4 �→ ˙ a 1 a 2 a 4 b 1 �→ ˙ b 3 �→ ˙ b 4 �→ ˙ b 1 d 3 d 4 d 3 �→ ˙ d 4 �→ ˙ c 2 �→ ˙ c 3 �→ ˙ c 2 c 3 d 3 d 4 hence Q 2 ⊆ Q 1 . The other direction always holds. Hence Q 1 ≡ Q 2 14 Solution of the exercises
Optimization of Logical Queries Redundant Joins Removal No other atom can be removed (Why?). The optimal query is hence Q 2 (“1”) ← Dept ( a 1 , a 2 , “1” , a 4 ) , Finance ( a 1 , c 2 , c 3 , “300”) , Emp ( b 1 , a 1 , d 3 , d 4 ) Translating this query back to the relational algebra, we obtain: π D. floor ([ σ D. floor =1 ∧ E 2 .did = D.did ∧ F 2 . did = E 2 . did ∧ E 2 . did = D. did ∧ F 2 . expenses =300 ( ρ D ( Dept ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp ))]) 15 Solution of the exercises
Optimization of Logical Queries Redundant Joins Removal The second expression is: [ π D. ∗ ,E. ∗ σ F 2 . did = E. did ∧ E 2 . did = D. did ∧ E 2 . eid = E. eid ∧ F 2 . expenses =300 ∧ E. did = F 2 . did ( ρ D ( Dept ) × ρ E ( Emp ) × ρ F 2 ( Finance ) × ρ E 2 ( Emp ))] Translated: Q 3 ( a 1 , a 2 , a 3 , a 4 , b 1 , b 2 , b 3 , b 4 ) ← Dept ( a 1 , a 2 , a 3 , a 4 ) , Emp ( b 1 , b 2 , b 3 , b 4 ) , Finance ( a 1 , c 2 , c 3 , “300”) , Emp ( b 1 , a 1 , d 3 , d 4 ) We cannot remove the second atom, this time (why?) 16 Solution of the exercises
Optimization of Logical Queries Redundant Joins Removal Let us try to remove the fourth atom, let Q 4 ( a 1 , a 2 , a 3 , a 4 , b 1 , b 2 , b 3 , b 4 ) ← Dept ( a 1 , a 2 , a 3 , a 4 ) , Emp ( b 1 , b 2 , b 3 , b 4 ) , Finance ( a 1 , c 2 , c 3 , “300”) By evaluating Q 3 on the canonical database of Q 4 we see that Q 4 �⊆ Q 3 . Hence Q 3 �≡ Q 4 and the fourth atom can hence not be removed. This original SPJ expression was hence optimal. 17 Solution of the exercises
Recommend
More recommend