CMU SCS CMU SCS Today’s Class Carnegie Mellon Univ. • History & Background Dept. of Computer Science • Relational Algebra Equivalences 15-415/615 - DB Applications • Plan Cost Estimation • Plan Enumeration C. Faloutsos – A. Pavlo Lecture#15: Query Optimization Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Query Optimization 1970s – Relational Model • Remember that SQL is declarative. • Ted Codd saw the maintenance overhead for IMS/Codasyl. – User tells the DBMS what answer they want, not how to get the answer. • Proposed database abstraction based • There can be a big difference in on relations: Codd performance based on plan is used: – Store database in simple data structures. – See last week: 5.7 days vs. 45 seconds – Access it through high-level language. – Physical storage left up to implementation. Faloutsos/Pavlo CMU SCS 15-415/615 3 Faloutsos/Pavlo CMU SCS 15-415/615 4
CMU SCS CMU SCS IBM System R IBM System R • Skunkworks project at IBM Research in • First implementation of a query optimizer. San Jose to implement Codd’s ideas. • People argued that the DBMS could never • Had to figure out all of the things that we choose a query plan better than what a are discussing in this course themselves. human could write. • IBM never commercialized System R. • A lot of the concepts from System R’s optimizer are still used today. Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS Today’s Class Relational Algebra Equivalences • History & Background • A query can be expressed in different ways. • Relational Algebra Equivalences • The optimizer considers variations and • Plan Cost Estimation choose the one with the lowest cost. • Plan Enumeration • How do we know whether two queries are • Nested Sub-queries equivalent? Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8
CMU SCS CMU SCS Relational Algebra Equivalences Predicate Pushdown SELECT name, cid • Two relational algebra expressions are FROM student, enrolled WHERE student.sid = equivalent if they generate the same set of enrolled.sid tuples. AND enrolled.grade = ‘A’ π π name, cid name, cid σ ⨝ sid=sid grade=‘A’ ⨝ σ sid=sid grade=‘A’ student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Selections: WHERE student.sid = – Perform them early enrolled.sid AND enrolled.grade = ‘A’ – Break a complex predicate, and push down σ p1 ∧ p2 ∧ …pn (R) = σ p1 ( σ p2 ( σ … pn (R))…) π name, cid ( σ grade=‘A’ (student ⋈ enrolled )) • Simplify a complex predicate = – ( X=Y AND Y=3 ) → X=3 AND Y=3 π name, cid ( student ⋈ ( σ grade=‘A’ (enrolled ))) Faloutsos/Pavlo CMU SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12
CMU SCS CMU SCS Relational Algebra Equivalences Projection Pushdown SELECT name, cid • Projections: FROM student, enrolled WHERE student.sid = – Perform them early enrolled.sid • Smaller tuples AND enrolled.grade = ‘A’ • Fewer tuples (if duplicates are eliminated) π π – Project out all attributes except the ones name, cid name, cid requested or required (e.g., joining attr.) σ ⨝ sid=sid grade=‘A’ ⨝ σ sid=sid grade=‘A’ • This is not important for a column store… student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS Projection Pushdown Relational Algebra Equivalences SELECT name, cid FROM student, enrolled • Joins: WHERE student.sid = – Commutative, associative enrolled.sid R ⋈ S = S ⋈ R AND enrolled.grade = ‘A’ (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) π π name, cid name, cid σ ⨝ sid=sid grade=‘A’ • Q: How many different orderings are there π ⨝ π sid, cid sid=sid for an n -way join? sid, name σ grade=‘A’ student enrolled student enrolled Faloutsos/Pavlo CMU SCS 15-415/615 14 Faloutsos/Pavlo CMU SCS 15-415/615 15
CMU SCS CMU SCS Relational Algebra Equivalences Today’s Class • Joins: How many different orderings are • History & Background there for an n-way join? • Relational Algebra Equivalences • A: Catalan number ~ 4 n • Plan Cost Estimation – Exhaustive enumeration: too slow. • Plan Enumeration • We’ll see in a second how an optimizer limits the search space... Faloutsos/Pavlo CMU SCS 15-415/615 16 Faloutsos/Pavlo CMU SCS 15-415/615 17 CMU SCS CMU SCS Cost Estimation Cost Estimation – Statistics S R • For each relation R we keep: • How long will a query take? #1 – N R → # tuples – CPU : Small cost; tough to estimate #2 #3 – S R → size of tuple in bytes – Disk : # of block transfers – V(A,R) → # of distinct values – Memory : Amount of DRAM used of attribute ‘A’ – Network : # of messages … • How many tuples will be read/written? • What statistics do we need to keep? #N R Faloutsos/Pavlo CMU SCS 15-415/615 18 Faloutsos/Pavlo CMU SCS 15-415/615 19
CMU SCS CMU SCS Derivable Statistics Derivable Statistics S R • F R → max# records/block • SC(A,R) → Selection Cardinality avg# of records with A=given • B R → # blocks F R #1 → N R / V(A,R) • SC(A,R) → selection cardinality #2 • Note that this assumes data uniformity avg# of records with A=given #3 – 10,000 students, 10 colleges – how many … students in SCS? #B R Faloutsos/Pavlo CMU SCS 15-415/615 20 Faloutsos/Pavlo CMU SCS 15-415/615 21 CMU SCS CMU SCS Additional Statistics Statistics • For index i : • Where do we store them? – F i → average fanout (~50 -100) HT i • How often do we update them? – HT i → # levels of index i (~2-3) ~ log(#entries)/log( F i ) • Manual invocations: – LB i # → blocks at leaf level – Postgres/SQLite: ANALYZE – MySQL: ANALYZE TABLE Faloutsos/Pavlo CMU SCS 15-415/615 22 Faloutsos/Pavlo CMU SCS 15-415/615 23
CMU SCS CMU SCS Selection Statistics Selections – Complex Predicates • Selectivity sel(P) of predicate P : • We saw simple predicates ( name=“Kayne” ) == fraction of tuples that qualify • How about more complex predicates, like – salary > 10000 • Formula depends on type of predicate. – age=30 AND jobTitle=“Costermonger” – Equality • What is their selectivity? – Range – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 24 Faloutsos/Pavlo CMU SCS 15-415/615 25 CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Selectivity sel(P) of predicate P : • Assume that V(rating, sailors) has 5 == fraction of tuples that qualify distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Formula depends on type of predicate. – Equality – sel(A=constant) = SC(P) / V(A,R) – Range – Example: sel(rating=‘2’) = – Negation – Conjunction – Disjunction Faloutsos/Pavlo CMU SCS 15-415/615 25 26
CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and N R = 5 distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Equality Predicate : A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = count count V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Assume that V(rating, sailors) has 5 • Assume that V(rating, sailors) has 5 distinct values (0–4) and N R = 5 distinct values (0–4) and N R = 5 • Equality Predicate : A=constant • Equality Predicate : A=constant – sel(A=constant) = SC(P) / V(A,R) – sel(A=constant) = SC(P) / V(A,R) – Example: sel(rating=‘2’) = – Example: sel(rating=‘2’) = 1/5 SC(rating=‘2’)=1 SC(rating=‘2’)=1 count count V(rating,R)=5 V(rating,R)=5 0 1 2 3 4 0 1 2 3 4 26 26 rating rating
CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Range Query: • Range Query: – sel( A>a ) = (A max – a) / (A max – A min ) – sel( A>a ) = (A max – a) / (A max – A min ) – Example: sel(rating >= ‘2’) – Example: sel(rating >= ‘2’) = (4 – 2) / (4 – 0) = 1/2 rating min = 0 rating max = 4 count count 0 1 2 3 4 0 1 2 3 4 27 27 rating rating CMU SCS CMU SCS Selections – Complex Predicates Selections – Complex Predicates • Negation Query • Negation Query – sel(not P ) = 1 – sel( P ) – sel(not P ) = 1 – sel( P ) – Example: sel(rating != ‘2’) – Example: sel(rating != ‘2’) SC(rating=‘2’)=1 count count 0 1 2 3 4 0 1 2 3 4 28 28 rating rating
Recommend
More recommend