query optimization
play

Query Optimization Lecture # 13 Database Systems Andy Pavlo AP - PowerPoint PPT Presentation

Query Optimization Lecture # 13 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Mid-term Exam is on Wednesday October 17 th See mid-term exam guide for more info.


  1. Query Optimization Lecture # 13 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

  2. 2 ADM IN ISTRIVIA Mid-term Exam is on Wednesday October 17 th → See mid-term exam guide for more info. Project #2 – Checkpoint #2 is due Friday October 19 th @ 11:59pm. CMU 15-445/645 (Fall 2018)

  3. 4 Q UERY O PTIM IZATIO N Remember that SQL is declarative. → User tells the DBMS what answer they want, not how to get the answer. There can be a big difference in performance based on plan is used: → See last week: 1.3 hours vs. 0.45 seconds CMU 15-445/645 (Fall 2018)

  4. 5 IBM SYSTEM R First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. A lot of the concepts from System R ’s optimizer are still used today. CMU 15-445/645 (Fall 2018)

  5. 6 Q UERY O PTIM IZATIO N Heuristics / Rules → Rewrite the query to remove stupid / inefficient things. → Does not require a cost model. Cost-based Search → Use a cost model to evaluate multiple equivalent plans and pick the one with the lowest cost. CMU 15-445/645 (Fall 2018)

  6. 7 Q UERY PLAN N IN G OVERVIEW System Cost Catalog Model SQL Query Abstract Annotated Annotated Syntax AST AST Tree Parser Binder Optimizer Rewriter (Optional) Name→Internal ID Query Plan CMU 15-445/645 (Fall 2018)

  7. 8 TO DAY'S AGEN DA Relational Algebra Equivalences Plan Cost Estimation Plan Enumeration Nested Sub-queries Mid-Term Review CMU 15-445/645 (Fall 2018)

  8. 9 RELATIO N AL ALGEBRA EQ UIVALEN CES Two relational algebra expressions are equivalent if they generate the same set of tuples. The DBMS can identify better query plans without a cost model. This is often called query rewriting . CMU 15-445/645 (Fall 2018)

  9. 10 PREDICATE PUSH DOWN SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' p p s.name,e.cid s.name,e.cid s ⨝ grade='A' s.sid=e.sid s ⨝ s.sid=e.sid grade='A' stud student ent enrolled enr lled student stud ent enrolled enr lled CMU 15-445/645 (Fall 2018)

  10. 11 RELATIO N AL ALGEBRA EQ UIVALEN CES SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' name, cid ( σ grade='A' ( student ⋈ enrolled )) = name, cid ( student ⋈ ( σ grade='A' ( enrolled ))) CMU 15-445/645 (Fall 2018)

  11. 12 RELATIO N AL ALGEBRA EQ UIVALEN CES Selections: → Perform filters as early as possible. → Reorder predicates so that the DBMS applies the most selective one first. → Break a complex predicate, and push down σ p1 ∧ p2 ∧ … p n ( R ) = σ p1 ( σ p2 (… σ p n ( R ))) Simplify a complex predicate → (X=Y AND Y=3) → X=3 AND Y=3 CMU 15-445/645 (Fall 2018)

  12. 13 RELATIO N AL ALGEBRA EQ UIVALEN CES Projections: → Perform them early to create smaller tuples and reduce intermediate results (if duplicates are eliminated) → Project out all attributes except the ones requested or required (e.g., joining keys) This is not important for a column store… CMU 15-445/645 (Fall 2018)

  13. 14 PRO J ECTIO N PUSH DOWN SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' p p s.name,e.cid s.name,e.cid ⨝ s.sid=e.sid ⨝ s.sid=e.sid p s sid,name p sid,cid s grade='A' grade='A' stud student ent enr enrolled lled student stud ent enr enrolled lled CMU 15-445/645 (Fall 2018)

  14. 15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates SELECT * FROM A WHERE 1 = 0; X Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  15. 15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  16. 15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  17. 15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Join Elimination SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  18. 15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Join Elimination SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  19. 16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A AS A1 WHERE EXISTS ( SELECT * FROM A AS A2 WHERE A1.id = A2.id); Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  20. 16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  21. 16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  22. 16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 150; Source: Lukas Eder CMU 15-445/645 (Fall 2018)

  23. 17 RELATIO N AL ALGEBRA EQ UIVALEN CES Joins: → Commutative, associative R ⋈ S = S ⋈ R (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) How many different orderings are there for an n - way join? CMU 15-445/645 (Fall 2018)

  24. 18 RELATIO N AL ALGEBRA EQ UIVALEN CES How many different orderings are there for an n - way join? Catalan number ≈4 n → Exhaustive enumeration will be too slow. We’ll see in a second how an optimizer limits the search space... CMU 15-445/645 (Fall 2018)

  25. 19 CO ST ESTIM ATIO N How long will a query take? → CPU: Small cost; tough to estimate → Disk: # of block transfers → Memory: Amount of DRAM used → Network: # of messages How many tuples will be read/written? What statistics do we need to keep? CMU 15-445/645 (Fall 2018)

  26. 20 STATISTICS The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog. Different systems update them at different times. Manual invocations: → Postgres/SQLite: ANALYZE → Oracle/MySQL: ANALYZE TABLE → SQL Server: UPDATE STATISTICS → DB2: RUNSTATS CMU 15-445/645 (Fall 2018)

  27. 21 STATISTICS For each relation R , the DBMS maintains the following information: → N R : Number of tuples in R . → V(A,R) : Number of distinct values for attribute A . CMU 15-445/645 (Fall 2018)

  28. 22 DERIVABLE STATISTICS The selection cardinality SC(A,R) is the average number of records with a value for an attribute A given N R / V(A,R) Note that this assumes data uniformity. → 10,000 students, 10 colleges – how many students in SCS? CMU 15-445/645 (Fall 2018)

  29. 23 SELECTIO N STATISTICS Equality predicates on unique keys are SELECT * FROM people easy to estimate. WHERE id = 123 What about more complex SELECT * FROM people predicates? What is their selectivity? WHERE val > 1000 SELECT * FROM people WHERE age = 30 AND status = 'Lit' CMU 15-445/645 (Fall 2018)

  30. 24 CO M PLEX PREDICATES The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. Formula depends on type of predicate: → Equality → Range → Negation → Conjunction → Disjunction CMU 15-445/645 (Fall 2018)

  31. 24 CO M PLEX PREDICATES The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. Formula depends on type of predicate: → Equality → Range → Negation → Conjunction → Disjunction CMU 15-445/645 (Fall 2018)

  32. 25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = count 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)

  33. 25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = count V(age,people)=5 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)

  34. 25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = 1/5 SC(age=2)=1 count V(age,people)=5 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)

  35. 26 SELECTIO N S CO M PLEX PREDICATES Range Query: → sel(A>=a) = (A max – a) / (A max – A min ) SELECT * FROM people → Example: sel(age>=2) = (4 – 2) / (4 – 0) WHERE age >= 2 = 1/2 count age min = 0 age max = 4 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)

  36. 27 SELECTIO N S CO M PLEX PREDICATES Negation Query: → sel(not P) = 1 – sel(P) SELECT * FROM people → Example: sel(age != 2) WHERE age != 2 SC(age=2)=1 count 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)

Recommend


More recommend