CMU SCS CMU SCS Administrivia Carnegie Mellon Univ. • HW5 is due Thursday March 17th. Dept. of Computer Science • You can pick up your mid-term from 15-415/615 - DB Applications Marilyn Walgora’s office (GHC 8120). • Christos is out of the country. • No office hours next week. C. Faloutsos – A. Pavlo Lecture#15: Query Optimization Faloutsos/Pavlo CMU SCS 15-415/615 2 CMU SCS CMU SCS Cost-based Query Sub-System Today’s Class Select * Queries From Blah B Where B.blah = blah • History & Background • Relational Algebra Equivalences Query Parser • Plan Cost Estimation • Plan Enumeration Query Optimizer Plan Plan Cost Catalog Manager Generator Estimator Schema Statistics Query Plan Evaluator Faloutsos/Pavlo CMU SCS 15-415/615 3 4
CMU SCS CMU SCS Query Optimization • Remember that SQL is declarative. – User tells the DBMS what answer they want, Quick DB History Lesson not how to get the answer. • There can be a big difference in performance based on plan is used: – See last week: 5.7 days vs. 45 seconds Faloutsos/Pavlo CMU SCS 15-415/615 5 Faloutsos/Pavlo CMU SCS 15-415/615 6 CMU SCS CMU SCS 1960s – IBM IMS Hierarchical Data Model • First database system. Schema Instance • Hierarchical data model. SUPPLIER 3, “Dirty Rick’s Supplies”, • Programmer-defined physical storage New York, NY (sno,sname,scity,sstate) format. • Tuple-at-a-time queries. PART 1001, “Battery Pack”, Large, 500, $100 (pno,pname,psize,qty,price) Faloutsos/Pavlo CMU SCS 15-415/615 7 Faloutsos/Pavlo CMU SCS 15-415/615 8
CMU SCS CMU SCS Hierarchical Data Model 1970s – CODASYL • COBOL people got together and Schema Instance proposed a standard based on a Duplicate Data network data model. SUPPLIER 3, “Dirty Rick’s Supplies”, New York, NY (sno,sname,scity,sstate) • Tuple-at-a-time queries. Bachman – This forces the programmer to do manual No Independence query optimization. PART 1001, “Battery Pack”, Large, 500, $100 (pno,pname,psize,qty,price) Faloutsos/Pavlo CMU SCS 15-415/615 9 Faloutsos/Pavlo CMU SCS 15-415/615 10 CMU SCS CMU SCS Network Data Model Network Data Model Schema Schema SUPPLIER PART SUPPLIER PART (sno,sname,scity,sstate) (pno,pname,psize) (sno,sname,scity,sstate) (pno,pname,psize) Complex Queries Supplies SuppliedBy Supplies SuppliedBy Supply Supply (qty,price) (qty,price) Faloutsos/Pavlo CMU SCS 15-415/615 11 Faloutsos/Pavlo CMU SCS 15-415/615 12
CMU SCS CMU SCS 1970s – Relational Model IBM System R • Ted Codd saw the maintenance • Skunkworks project at IBM Research in overhead for IMS/Codasyl. San Jose to implement Codd’s ideas. • Proposed database abstraction based • Had to figure out all of the things that we on relations: are discussing in this course themselves. Codd – Store database in simple data structures. • IBM never commercialized System R. – Access it through high-level language. – Physical storage left up to implementation. Faloutsos/Pavlo CMU SCS 15-415/615 13 Faloutsos/Pavlo CMU SCS 15-415/615 14 CMU SCS CMU SCS IBM System R Today’s Class • First implementation of a query optimizer. • History & Background • People argued that the DBMS could never • Relational Algebra Equivalences choose a query plan better than what a • Plan Cost Estimation human could write. • Plan Enumeration • A lot of the concepts from System R’s • Nested Sub-queries optimizer are still used today. Faloutsos/Pavlo CMU SCS 15-415/615 15 Faloutsos/Pavlo CMU SCS 15-415/615 16
CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences • A query can be expressed in different • Two relational algebra expressions are ways. equivalent if they generate the same set of tuples. • The optimizer considers variations and choose the one with the lowest cost. • How do we know whether two queries are equivalent? – Equivalence Rules (chapter 15.3) Faloutsos/Pavlo CMU SCS 15-415/615 17 Faloutsos/Pavlo CMU SCS 15-415/615 18 CMU SCS CMU SCS Relational Algebra Equivalences Relational Algebra Equivalences SELECT cname, amt SELECT cname, amt FROM customer, account FROM customer, account WHERE customer.acctno = WHERE customer.acctno = account.acctno account.acctno AND account.amt > 1000 AND account.amt > 1000 π π π cname, amt ( σ amt>1000 (customer ⋈ account )) cname, amt cname, amt σ ⨝ acctno=acctno = amt>1000 ⨝ σ acctno=acctno π cname, amt ( customer ⋈ ( σ amt>1000 (account )) amt>1000 CUSTOMER ACCOUNT CUSTOMER ACCOUNT Faloutsos/Pavlo CMU SCS 15-415/615 19 Faloutsos/Pavlo CMU SCS 15-415/615 20
CMU SCS CMU SCS Equivalence of Expressions Equivalence of Expressions σ ∪ = σ ∪ σ • Q: How to prove a transf. rule? ( 1 2 ) ( 1 ) ( 2 ) R R R R P P P σ = σ σ ( 1 2 ) ( 1 ) ( 2 ) R R R R P P P • Use relational tuple calculus to show that ∈ ⇔ t LHS LHS = RHS: ∈ ∪ ∧ ⇔ t ( R 1 R 2 ) P ( t ) σ ∪ = σ ∪ σ ( R 1 R 2 ) ( R 1 ) ( R 2 ) ∈ ∨ ∈ ∧ ⇔ P P P ( 1 2 ) ( ) t R t R P t LHS RHS ∈ ∧ ∨ ∈ ∧ ⇔ ( 1 ( )) ( 2 ) ( )) t R P t t R P t Faloutsos/Pavlo CMU SCS 15-415/615 21 Faloutsos/Pavlo CMU SCS 15-415/615 22 CMU SCS CMU SCS Equivalence of Expressions Equivalence of Expressions A B σ ∪ = σ ∪ σ R1 ( 1 2 ) ( 1 ) ( 2 ) • Q: How to disprove a rule? R R R R − X Christos squirrels P P P π = π − π ... A B R2 ( 1 2 ) ( 1 ) ( 2 ) R R R R A A A Christos knifefights ∈ ∧ ∨ ∈ ∧ ⇔ ( 1 ( )) ( 2 ) ( )) t R P t t R P t ∈ σ ∨ ∈ σ ⇔ ( ( 1 )) ( ( 2 )) t R t R P P A B ≠ ∈ σ ∪ σ ⇔ Ø ( 1 ) ( 2 ) t R R Christos squirrels P P ∈ t RHS QED Faloutsos/Pavlo CMU SCS 15-415/615 23 Faloutsos/Pavlo CMU SCS 15-415/615 24
CMU SCS CMU SCS Equivalence of Expressions Equivalence of Expressions • Selections: • Projections: – Perform them early – Perform them early – Break a complex predicate, and push • Smaller tuples σ = σ σ σ • Fewer tuples (if duplicates are eliminated) ( ) ( (... ( ))...) R R p 1 ^ p 2 ^... pn p 1 p 2 pn – Project out all attributes except the ones • Simplify a complex predicate requested or required (e.g., joining attr.) – ( X=Y AND Y=3 ) → X=3 AND Y=3 Faloutsos/Pavlo CMU SCS 15-415/615 25 Faloutsos/Pavlo CMU SCS 15-415/615 26 CMU SCS CMU SCS Equivalence of Expressions Equivalence of Expressions • Joins: • Joins: How many different orderings are there for an n-way join? – Commutative, associative = • A: Catalan number ~ 4^n R S S R = ( ) ( ) – Exhaustive enumeration: too slow. R S T R S T • We’ll see in a second how an optimizer limits the search space... • Q: How many different orderings are there for an n -way join? Faloutsos/Pavlo CMU SCS 15-415/615 27 Faloutsos/Pavlo CMU SCS 15-415/615 28
CMU SCS CMU SCS Today’s Class Query Optimization • History & Background • Bring query in internal form (eg., parse tree) • Relational Algebra Equivalences • … into “canonical form” (syntactic q-opt) • Plan Cost Estimation • Generate alternative plans. • Plan Enumeration • Estimate cost for each plan. • Pick the best one. Faloutsos/Pavlo CMU SCS 15-415/615 29 Faloutsos/Pavlo CMU SCS 15-415/615 30 CMU SCS CMU SCS Cost Estimation Cost Estimation – Statistics S R • For each relation R we keep: • How long will a query take? #1 – N R → # tuples – CPU : Small cost; tough to estimate. #2 #3 – S R → size of tuple in bytes – Disk : # of block transfers. – V(A,R) → # of distinct values – Network : # of messages of attribute ‘A’ • How many tuples will qualify? … • What statistics do we need to keep? #N R Faloutsos/Pavlo CMU SCS 15-415/615 31 Faloutsos/Pavlo CMU SCS 15-415/615 32
CMU SCS CMU SCS Derivable Statistics Derivable Statistics S R • F R → max# records/block • SC(A,R) → Selection Cardinality avg# of records with A=given • B R → # blocks F R #1 → N R / V(A,R) • SC(A,R) → selection cardinality #2 • Note that this assumes data uniformity avg# of records with A=given #3 – 10,000 students, 10 colleges – how many … students in SCS? #B R Faloutsos/Pavlo CMU SCS 15-415/615 33 Faloutsos/Pavlo CMU SCS 15-415/615 34 CMU SCS CMU SCS Additional Statistics Statistics • For index i : • Where do we store them? – F i → average fanout (~50 -100) HTi • How often do we update them? – HT i → # levels of index i (~2-3) ~ log(#entries)/log( F i ) – LB i # → blocks at leaf level Faloutsos/Pavlo CMU SCS 15-415/615 35 Faloutsos/Pavlo CMU SCS 15-415/615 36
Recommend
More recommend