query processing
play

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP - PowerPoint PPT Presentation

Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018 2 ADM IN ISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on


  1. Query Processing Lecture # 10 Database Systems Andy Pavlo AP AP Computer Science 15-445/15-645 Carnegie Mellon Univ. Fall 2018

  2. 2 ADM IN ISTRIVIA Project #2 – Checkpoint #1 is due Monday October 9 th @ 11:59pm Mid-term Exam is on Wednesday October 17 th (in class) CMU 15-445/645 (Fall 2018)

  3. 3 UPCO M IN G DATABASE EVEN TS SQream DB Tech Talk → Thursday Oct 4 th @ 12:00pm → CIC 4 th Floor CMU 15-445/645 (Fall 2018)

  4. 4 Q UERY PLAN SELECT A.id, B.value The operators are arranged in a tree. FROM A, B Data flows from the leaves toward the WHERE A.id = B.id AND B.value > 100 root. p A.id, B.value The output of the root node is the result of the query. ⨝ A.id=B.id s value>100 A B CMU 15-445/645 (Fall 2018)

  5. 5 TO DAY'S AGEN DA Processing Models Access Methods Expression Evaluation CMU 15-445/645 (Fall 2018)

  6. 6 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Three approaches: → Iterator Model → Materialization Model → Vectorized / Batch Model CMU 15-445/645 (Fall 2018)

  7. 7 ITERATO R M O DEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Top-down plan processing. Also called Volcano or Pipeline Model. CMU 15-445/645 (Fall 2018)

  8. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  9. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  10. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  11. 8 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t) CMU 15-445/645 (Fall 2018)

  12. 9 ITERATO R M O DEL This is used in almost every DBMS. Allows for tuple pipelining . Some operators will block until children emit all of their tuples. → Joins, Subqueries, Order By Output control works easily with this approach. → Limit CMU 15-445/645 (Fall 2018)

  13. 10 M ATERIALIZATIO N M O DEL Each operator processes its input all at once and then emits its output all at once. → The operator "materializes" it output as a single result. → The DBMS can push down hints into to avoid scanning too many tuples. Bottom-up plan processing. CMU 15-445/645 (Fall 2018)

  14. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  15. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  16. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  17. 11 M ATERIALIZATIO N M O DEL SELECT A.id, B.value out = { } 5 FROM A, B for t in child.Output() : out.add ( projection (t)) WHERE A.id = B.id AND B.value > 100 out = { } p 4 for t 1 in left.Output() : buildHashTable (t 1 ) A.id, B.value for t 2 in right.Output() : if probe (t 2 ): out.add (t 1 ⨝ t 2 ) ⨝ A.id=B.id out = { } 3 for t in child.Output() : s if evalPred (t): out.add (t) value>100 out = { } out = { } 1 2 for t in A : for t in B : A B out.add (t) out.add (t) CMU 15-445/645 (Fall 2018)

  18. 12 M ATERIALIZATIO N M O DEL Better for OLTP workloads because queries typically only access a small number of tuples at a time. → Lower execution / coordination overhead. Not good for OLAP queries with large intermediate results. CMU 15-445/645 (Fall 2018)

  19. 13 VECTO RIZATIO N M O DEL Like Iterator Model, each operator implements a next function. Each operator emits a batch of tuples instead of a single tuple. → The operator's internal loop processes multiple tuples at a time. → The size of the batch can vary based on hardware or query properties. CMU 15-445/645 (Fall 2018)

  20. 14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

  21. 14 VECTO RIZATIO N M O DEL out = { } 1 for t in child.Output() : SELECT A.id, B.value out.add ( projection (t)) FROM A, B if | out |> n : emit ( out ) WHERE A.id = B.id out = { } AND B.value > 100 2 for t 1 in left.Output() : p buildHashTable (t 1 ) for t 2 in right.Output() : A.id, B.value if probe (t 2 ): out.add (t 1 ⨝ t 2 ) if | out |> n : emit ( out ) ⨝ A.id=B.id out = { } 4 for t in child.Output() : s if evalPred (t): out.add (t) if | out |> n : emit ( out ) value>100 out = { } out = { } A B 3 5 for t in A : for t in B : out.add (t) out.add (t) if | out |> n : emit ( out ) if | out |> n : emit ( out ) CMU 15-445/645 (Fall 2018)

  22. 15 VECTO RIZATIO N M O DEL Ideal for OLAP queries → Greatly reduces the number of invocations per operator. → Allows for operators to use vectorized (SIMD) instructions to process batches of tuples. CMU 15-445/645 (Fall 2018)

  23. 16 PRO CESSIN G M O DELS SUM M ARY Iterator / Volcano → Direction: Top-Down → Emits: Single Tuple → Target: General Purpose Vectorized Materialization → Direction: Top-Down → Direction: Bottom-Up → Emits: Tuple Batch → Emits: Entire Tuple Set → Target: OLAP → Target: OLTP CMU 15-445/645 (Fall 2018)

  24. 17 ACCESS M ETH O DS SELECT A.id, B.value An access method is a way that the FROM A, B DBMS can access the data stored in a WHERE A.id = B.id AND B.value > 100 table. p → Not defined in relational algebra. A.id, B.value Three basic approaches: ⨝ A.id=B.id → Sequential Scan s → Index Scan → Multi-Index / "Bitmap" Scan value>100 A B CMU 15-445/645 (Fall 2018)

  25. 18 SEQ UEN TIAL SCAN For each page in the table: for page in table.pages: → Retrieve it from the buffer pool. for t in page.tuples: → Iterate over each tuple and check whether if evalPred (t): to include it. // Do Something! The DBMS maintains an internal cursor that tracks the last page / slot it examined. CMU 15-445/645 (Fall 2018)

  26. 19 SEQ UEN TIAL SCAN : O PTIM IZATIO N S This is almost always the worst thing that the DBMS can do to execute a query. Sequential Scan Optimizations: → Prefetching → Parallelization → Buffer Pool Bypass → Zone Maps → Late Materialization → Heap Clustering CMU 15-445/645 (Fall 2018)

Recommend


More recommend