advanced database systems
play

ADVANCED DATABASE SYSTEMS Query Execution & Processing @ - PowerPoint PPT Presentation

Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019 CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter


  1. Lect ure # 15 ADVANCED DATABASE SYSTEMS Query Execution & Processing @ Andy_Pavlo // 15- 721 // Spring 2019

  2. CMU 15-721 (Spring 2019) 2 ARCH ITECTURE OVERVIEW Networking Layer SQL Query SQL Parser Planner Binder Rewriter Optimizer / Cost Models Compiler Scheduling / Placement We Are Here Execution Engine Concurrency Control Operator Execution Indexes Storage Manager Storage Models Logging / Checkpoints

  3. CMU 15-721 (Spring 2019) 3 O PERATO R EXECUTIO N Query Plan Processing Application Logic Execution (UDFs) Parallel Join Algorithms Vectorized Operators Query Compilation

  4. CMU 15-721 (Spring 2019) 4 Q UERY EXECUTIO N A query plan is comprised of operators . An operator instance is an invocation of an operator on some segment of data. A task is the execution of a sequence of one or more operator instances.

  5. CMU 15-721 (Spring 2019) 5 EXECUTIO N O PTIM IZATIO N We are now going to start discussing ways to improve the DBMS's query execution performance for data sets that fit entirely in memory. There are other bottlenecks to target when we remove the disk.

  6. CMU 15-721 (Spring 2019) 6 O PTIM IZATIO N GOALS Approach #1: Reduce Instruction Count → Use fewer instructions to do the same amount of work. Approach #2: Reduce Cycles per Instruction → Execute more CPU instructions in fewer cycles. → This means reducing cache misses and stalls due to memory load/stores. Approach #3: Parallelize Execution → Use multiple threads to compute each query in parallel.

  7. CMU 15-721 (Spring 2019) 7 MonetDB/X100 Analysis Processing Models Parallel Execution

  8. CMU 15-721 (Spring 2019) 8 M O N ETDB/ X10 0 Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads. → Show how DBMS are designed incorrectly for modern CPU architectures. Based on these findings, they proposed a new DBMS called MonetDB/X100. → Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION CIDR 2 2005

  9. CMU 15-721 (Spring 2019) 8 M O N ETDB/ X10 0 Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads. → Show how DBMS are designed incorrectly for modern CPU architectures. Based on these findings, they proposed a new DBMS called MonetDB/X100. → Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION CIDR 2 2005

  10. CMU 15-721 (Spring 2019) 8 M O N ETDB/ X10 0 Low-level analysis of execution bottlenecks for in- memory DBMSs on OLAP workloads. → Show how DBMS are designed incorrectly for modern CPU architectures. Based on these findings, they proposed a new DBMS called MonetDB/X100. → Renamed to Vectorwise and acquired by Actian in 2010. → Rebranded as Vector and Avalance MONETDB/X100: HYPER- PIPELINING QUERY EXECUTION CIDR 2 2005

  11. CMU 15-721 (Spring 2019) 9 CPU OVERVIEW CPUs organize instructions into pipeline stages . The goal is to keep all parts of the processor busy at each cycle by masking delays from instructions that cannot complete in a single cycle. Super-scalar CPUs support multiple pipelines. → Execute multiple instructions in parallel in a single cycle if they are independent. → Flynn's Taxonomy: Single Instruction stream, Single Data stream ( SISD )

  12. CMU 15-721 (Spring 2019) 10 DBM S / CPU PRO BLEM S Problem #1: Dependencies → If one instruction depends on another instruction, then it cannot be pushed immediately into the same pipeline. Problem #2: Branch Prediction → The CPU tries to predict what branch the program will take and fill in the pipeline with its instructions. → If it gets it wrong, it has to throw away any speculative work and flush the pipeline.

  13. CMU 15-721 (Spring 2019) 11 BRAN CH M ISPREDICTIO N Because of long pipelines, CPUs will speculatively execute branches. This potentially hides the long stalls between dependent instructions. The most executed branching code in a DBMS is the filter operation during a sequential scan. But this is (nearly) impossible to predict correctly.

  14. CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S SELECT * FROM table WHERE key >= $(low) AND key <= $(high) Source: Bogdan Raducanu

  15. CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) i = 0 for t in table : key = t.key if ( key≥ low ) && ( key≤ high ): copy (t, output[i]) i = i + 1 Source: Bogdan Raducanu

  16. CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) i = 0 for t in table : key = t.key if ( key≥ low ) && ( key≤ high ): copy (t, output[i]) i = i + 1 Source: Bogdan Raducanu

  17. CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if ( key≥ low ) && ( key≤ high ): key = t.key copy (t, output[i]) m = ( key≥ low ? 1 : 0) && ⮱ ( key≤ high ? 1 : 0) i = i + 1 i = i + m Source: Bogdan Raducanu

  18. CMU 15-721 (Spring 2019) 12 SELECTIO N SCAN S Scalar (Branching) Scalar (Branchless) i = 0 i = 0 for t in table : for t in table : key = t.key copy (t, output[i]) if ( key≥ low ) && ( key≤ high ): key = t.key copy (t, output[i]) m = ( key≥ low ? 1 : 0) && ⮱ ( key≤ high ? 1 : 0) i = i + 1 i = i + m Source: Bogdan Raducanu

  19. CMU 15-721 (Spring 2019) 13 SELECTIO N SCAN S Source: Bogdan Raducanu

  20. CMU 15-721 (Spring 2019) 14 EXCESSIVE IN STRUCTIO NS The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value. → This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably. Example: Postgres' addition for NUMERIC types.

  21. CMU 15-721 (Spring 2019) 14 EXCESSIVE IN STRUCTIO NS The DBMS needs to support different data types, so it must check a values type before it performs any operation on that value. → This is usually implemented as giant switch statements. → Also creates more branches that can be difficult for the CPU to predict reliably. Example: Postgres' addition for NUMERIC types.

  22. CMU 15-721 (Spring 2019) 15 PRO CESSIN G M O DEL A DBMS's processing model defines how the system executes a query plan. → Different trade-offs for different workloads. Approach #1: Iterator Model Approach #2: Materialization Model Approach #3: Vectorized / Batch Model

  23. CMU 15-721 (Spring 2019) 16 ITERATO R M O DEL Each query plan operator implements a next function. → On each invocation, the operator returns either a single tuple or a null marker if there are no more tuples. → The operator implements a loop that calls next on its children to retrieve their tuples and then process them. Also called Volcano or Pipeline Model.

  24. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

  25. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

  26. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : s if evalPred (t): emit (t) value>100 for t in A : for t in B : emit (t) A B emit (t)

  27. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : Single Tuple s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 emit (t) A B emit (t)

  28. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t)

  29. CMU 15-721 (Spring 2019) 17 ITERATO R M O DEL SELECT A.id, B.value for t in child.Next() : 1 FROM A, B emit ( projection (t)) WHERE A.id = B.id AND B.value > 100 for t 1 in left.Next() : p 2 buildHashTable (t 1 ) for t 2 in right.Next() : A.id, B.value if probe (t 2 ): emit (t 1 ⨝ t 2 ) ⨝ A.id=B.id for t in child.Next() : 4 s if evalPred (t): emit (t) value>100 for t in A : for t in B : 3 5 emit (t) A B emit (t)

Recommend


More recommend