Query Optimization Lecture # 13 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018
2 ADM IN ISTRIVIA Mid-term Exam is on Wednesday October 17 th → See mid-term exam guide for more info. Project #2 – Checkpoint #2 is due Friday October 19 th @ 11:59pm. CMU 15-445/645 (Fall 2018)
4 Q UERY O PTIM IZATIO N Remember that SQL is declarative. → User tells the DBMS what answer they want, not how to get the answer. There can be a big difference in performance based on plan is used: → See last week: 1.3 hours vs. 0.45 seconds CMU 15-445/645 (Fall 2018)
5 IBM SYSTEM R First implementation of a query optimizer. People argued that the DBMS could never choose a query plan better than what a human could write. A lot of the concepts from System R ’s optimizer are still used today. CMU 15-445/645 (Fall 2018)
6 Q UERY O PTIM IZATIO N Heuristics / Rules → Rewrite the query to remove stupid / inefficient things. → Does not require a cost model. Cost-based Search → Use a cost model to evaluate multiple equivalent plans and pick the one with the lowest cost. CMU 15-445/645 (Fall 2018)
7 Q UERY PLAN N IN G OVERVIEW System Cost Catalog Model SQL Query Abstract Annotated Annotated Syntax AST AST Tree Parser Binder Optimizer Rewriter (Optional) Name→Internal ID Query Plan CMU 15-445/645 (Fall 2018)
8 TO DAY'S AGEN DA Relational Algebra Equivalences Plan Cost Estimation Plan Enumeration Nested Sub-queries Mid-Term Review CMU 15-445/645 (Fall 2018)
9 RELATIO N AL ALGEBRA EQ UIVALEN CES Two relational algebra expressions are equivalent if they generate the same set of tuples. The DBMS can identify better query plans without a cost model. This is often called query rewriting . CMU 15-445/645 (Fall 2018)
10 PREDICATE PUSH DOWN SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' p p s.name,e.cid s.name,e.cid s ⨝ grade='A' s.sid=e.sid s ⨝ s.sid=e.sid grade='A' stud student ent enrolled enr lled student stud ent enrolled enr lled CMU 15-445/645 (Fall 2018)
11 RELATIO N AL ALGEBRA EQ UIVALEN CES SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' name, cid ( σ grade='A' ( student ⋈ enrolled )) = name, cid ( student ⋈ ( σ grade='A' ( enrolled ))) CMU 15-445/645 (Fall 2018)
12 RELATIO N AL ALGEBRA EQ UIVALEN CES Selections: → Perform filters as early as possible. → Reorder predicates so that the DBMS applies the most selective one first. → Break a complex predicate, and push down σ p1 ∧ p2 ∧ … p n ( R ) = σ p1 ( σ p2 (… σ p n ( R ))) Simplify a complex predicate → (X=Y AND Y=3) → X=3 AND Y=3 CMU 15-445/645 (Fall 2018)
13 RELATIO N AL ALGEBRA EQ UIVALEN CES Projections: → Perform them early to create smaller tuples and reduce intermediate results (if duplicates are eliminated) → Project out all attributes except the ones requested or required (e.g., joining keys) This is not important for a column store… CMU 15-445/645 (Fall 2018)
14 PRO J ECTIO N PUSH DOWN SELECT s.name, e.cid FROM student AS s, enrolled AS e WHERE s.sid = e.sid AND e.grade = 'A' p p s.name,e.cid s.name,e.cid ⨝ s.sid=e.sid ⨝ s.sid=e.sid p s sid,name p sid,cid s grade='A' grade='A' stud student ent enr enrolled lled student stud ent enr enrolled lled CMU 15-445/645 (Fall 2018)
15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates SELECT * FROM A WHERE 1 = 0; X Source: Lukas Eder CMU 15-445/645 (Fall 2018)
15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Join Elimination SELECT A1.* FROM A AS A1 JOIN A AS A2 ON A1.id = A2.id; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
15 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Impossible / Unnecessary Predicates X SELECT * FROM A WHERE 1 = 0; SELECT * FROM A WHERE 1 = 1; SELECT * FROM A; Join Elimination SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A AS A1 WHERE EXISTS ( SELECT * FROM A AS A2 WHERE A1.id = A2.id); Source: Lukas Eder CMU 15-445/645 (Fall 2018)
16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 100 OR val BETWEEN 50 AND 150; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
16 CREATE TABLE A ( id INT PRIMARY KEY , M O RE EXAM PLES val INT NOT NULL ); Ignoring Projections SELECT * FROM A; Merging Predicates SELECT * FROM A WHERE val BETWEEN 1 AND 150; Source: Lukas Eder CMU 15-445/645 (Fall 2018)
17 RELATIO N AL ALGEBRA EQ UIVALEN CES Joins: → Commutative, associative R ⋈ S = S ⋈ R (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T) How many different orderings are there for an n - way join? CMU 15-445/645 (Fall 2018)
18 RELATIO N AL ALGEBRA EQ UIVALEN CES How many different orderings are there for an n - way join? Catalan number ≈4 n → Exhaustive enumeration will be too slow. We’ll see in a second how an optimizer limits the search space... CMU 15-445/645 (Fall 2018)
19 CO ST ESTIM ATIO N How long will a query take? → CPU: Small cost; tough to estimate → Disk: # of block transfers → Memory: Amount of DRAM used → Network: # of messages How many tuples will be read/written? What statistics do we need to keep? CMU 15-445/645 (Fall 2018)
20 STATISTICS The DBMS stores internal statistics about tables, attributes, and indexes in its internal catalog. Different systems update them at different times. Manual invocations: → Postgres/SQLite: ANALYZE → Oracle/MySQL: ANALYZE TABLE → SQL Server: UPDATE STATISTICS → DB2: RUNSTATS CMU 15-445/645 (Fall 2018)
21 STATISTICS For each relation R , the DBMS maintains the following information: → N R : Number of tuples in R . → V(A,R) : Number of distinct values for attribute A . CMU 15-445/645 (Fall 2018)
22 DERIVABLE STATISTICS The selection cardinality SC(A,R) is the average number of records with a value for an attribute A given N R / V(A,R) Note that this assumes data uniformity. → 10,000 students, 10 colleges – how many students in SCS? CMU 15-445/645 (Fall 2018)
23 SELECTIO N STATISTICS Equality predicates on unique keys are SELECT * FROM people easy to estimate. WHERE id = 123 What about more complex SELECT * FROM people predicates? What is their selectivity? WHERE val > 1000 SELECT * FROM people WHERE age = 30 AND status = 'Lit' CMU 15-445/645 (Fall 2018)
24 CO M PLEX PREDICATES The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. Formula depends on type of predicate: → Equality → Range → Negation → Conjunction → Disjunction CMU 15-445/645 (Fall 2018)
24 CO M PLEX PREDICATES The selectivity ( sel ) of a predicate P is the fraction of tuples that qualify. Formula depends on type of predicate: → Equality → Range → Negation → Conjunction → Disjunction CMU 15-445/645 (Fall 2018)
25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = count 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)
25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = count V(age,people)=5 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)
25 SELECTIO N S CO M PLEX PREDICATES Assume that V(age,people) has five distinct values (0 – 4) and N R = 5 SELECT * FROM people Equality Predicate: A=constant WHERE age = 2 → sel(A=constant) = SC(P) / V(A,R) → Example: sel(age=2) = 1/5 SC(age=2)=1 count V(age,people)=5 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)
26 SELECTIO N S CO M PLEX PREDICATES Range Query: → sel(A>=a) = (A max – a) / (A max – A min ) SELECT * FROM people → Example: sel(age>=2) = (4 – 2) / (4 – 0) WHERE age >= 2 = 1/2 count age min = 0 age max = 4 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)
27 SELECTIO N S CO M PLEX PREDICATES Negation Query: → sel(not P) = 1 – sel(P) SELECT * FROM people → Example: sel(age != 2) WHERE age != 2 SC(age=2)=1 count 0 1 2 3 4 age CMU 15-445/645 (Fall 2018)
Recommend
More recommend