Query Execution 12 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Carnegie Mellon University Fall 2019
2 ADM IN ISTRIVIA Homework #3 is due Wed Oct 9 th @ 11:59pm Mid-Term Exam is Wed Oct 16th @ 12:00pm Project #2 is due Sun Oct 20 th @ 11:59pm CMU 15-445/645 (Fall 2019)
3 Q UERY PLAN SELECT R.id, S.cdate The operators are arranged in a tree. FROM R JOIN S ON R.id = S.id Data flows from the leaves of the tree WHERE S.value > 100 p up towards the root. R.id, S.value The output of the root node is the ⨝ R.id=S.id result of the query. s value>100 R S CMU 15-445/645 (Fall 2019)
4 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2019)
5 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model CMU 15-445/645 (Fall 2019)
6 ITERATO R M O DEL Each query plan operator implements a Next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate Next() for t in child.Next() : FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 Next() for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id Next() for t in child.Next() : s if evalPred (t): emit (t) value>100 Next() Next() for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in R: for t in S: R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
7 ITERATO R M O DEL SELECT R.id, S.cdate for t in child.Next() : 1 FROM R JOIN S emit ( projection (t)) ON R.id = S.id WHERE S.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ R.id=S.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in R: for t in S: 3 5 R S emit (t) emit (t) CMU 15-445/645 (Fall 2019)
8 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators have to block until their children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. CMU 15-445/645 (Fall 2019)
9 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" its output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. → Can send either a materialized row or a single column. The output can be either whole tuples (NSM) or subsets of columns (DSM) CMU 15-445/645 (Fall 2019)
10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)
10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s if evalPred (t): out.add (t) All Tuples value>100 return out out = [ ] out = [ ] R S 3 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)
10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)
10 M ATERIALIZATIO N M O DEL out = [ ] for t in child.Output() : 1 SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S return out ON R.id = S.id out = [ ] WHERE S.value > 100 for t 1 in left.Output() : p 2 buildHashTable (t 1 ) for t 2 in right.Output() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) return out ⨝ R.id=S.id out = [ ] for t in child.Output() : s 4 if evalPred (t): out.add (t) value>100 return out out = [ ] out = [ ] R S 3 5 for t in R: for t in S: out.add (t) out.add (t) return out return out CMU 15-445/645 (Fall 2019)
11 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries only access a small number of tuples at a time. → Lower execution / coordination overhead. → Fewer function calls. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2019)
12 VECTO RIZATIO N M O DEL Like the Iterator Model where each operator implements a Next function in this model. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2019)
13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)
13 VECTO RIZATIO N M O DEL out = [ ] 1 for t in child.Next() : SELECT R.id, S.cdate out.add ( projection (t)) FROM R JOIN S if | out |> n : emit ( out ) ON R.id = S.id out = [ ] 2 WHERE S.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : R.id, S.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ R.id=S.id out = [ ] 4 for t in child.Next() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 Tuple Batch out = [ ] out = [ ] 3 5 R S for t in R: for t in S: out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2019)
14 VECTO RIZATIO N M O DEL Ideal for OLAP queries because it greatly reduces the number of invocations per operator. Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2019)
Recommend
More recommend