database system implementation
play

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY EXECUTION & SCHEDULING 2 TODAYS AGENDA Process Models Query Parallelization Data Placement Scheduling 3 QUERY EXECUTION A query plan is


  1. DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #17: QUERY EXECUTION & SCHEDULING

  2. 2 TODAY’S AGENDA Process Models Query Parallelization Data Placement Scheduling

  3. 3 QUERY EXECUTION A query plan is comprised of operators . An operator instance is an invocation of an operator on some segment of data. A task is the execution of a sequence of one or more operator instances.

  4. 4 PROCESS MODEL A DBMS’s process model defines how the system is architected to support concurrent requests from a multi-user application. A worker is the DBMS component that is responsible for executing tasks on behalf of the client and returning the results. ARCHITECTURE OF A DATABASE SYSTEM Foundations and Trends in Databases 2007

  5. 5 PROCESS MODELS Approach #1: Process per DBMS Worker Approach #2: Process Pool Approach #3: Thread per DBMS Worker

  6. 6 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  7. 7 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  8. 8 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  9. 9 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  10. 10 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  11. 11 PROCESS PER WORKER Each worker is a separate OS process. → Relies on OS scheduler. → Use shared-memory for global data structures. → A process crash doesn’t take down entire system. → Examples: IBM DB2, Postgres, Oracle Dispatcher Worker

  12. 12 PROCESS POOL A worker uses any process that is free in a pool → Still relies on OS scheduler and shared memory. → Bad for CPU cache locality. → Examples: IBM DB2, Postgres (2015) Dispatcher Worker Pool

  13. 13 PROCESS POOL A worker uses any process that is free in a pool → Still relies on OS scheduler and shared memory. → Bad for CPU cache locality. → Examples: IBM DB2, Postgres (2015) Dispatcher Worker Pool

  14. 14 PROCESS POOL A worker uses any process that is free in a pool → Still relies on OS scheduler and shared memory. → Bad for CPU cache locality. → Examples: IBM DB2, Postgres (2015) Dispatcher Worker Pool

  15. 15 THREAD PER WORKER Single process with multiple worker threads. → DBMS has to manage its own scheduling. → May or may not use a dispatcher thread. → Thread crash (may) kill the entire system. → Examples: IBM DB2, MSSQL, MySQL, Oracle (2014) Worker Threads

  16. 16 THREAD PER WORKER Single process with multiple worker threads. → DBMS has to manage its own scheduling. → May or may not use a dispatcher thread. → Thread crash (may) kill the entire system. → Examples: IBM DB2, MSSQL, MySQL, Oracle (2014) Worker Threads

  17. 17 PROCESS MODELS Using a multi-threaded architecture has several advantages: → Less overhead per context switch. → Don’t have to manage shared memory. The thread per worker model does not mean that you have intra-query parallelism. I am not aware of any new DBMS built in the last 7-8 years that doesn’t use threads.

  18. 18 SCHEDULING For each query plan, the DBMS has to decide where, when, and how to execute it. → How many tasks should it use? → How many CPU cores should it use? → What CPU core should the tasks execute on? → Where should a task store its output? The DBMS always knows more than the OS.

  19. 19 INTER-QUERY PARALLELISM Improve overall performance by allowing multiple queries to execute simultaneously. → Provide the illusion of isolation through concurrency control scheme. The difficulty of implementing a concurrency control scheme is not significantly affected by the DBMS’s process model.

  20. 20 INTRA-QUERY PARALLELISM Improve the performance of a single query by executing its operators in parallel. Approach #1: Intra-Operator (Horizontal) → Operators are decomposed into independent instances that perform the same function on different subsets of data. Approach #2: Inter-Operator (Vertical) → Operations are overlapped in order to pipeline data from one stage to the next without materialization.

  21. 21 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

  22. 22 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

  23. 23 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B

  24. 24 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B 1 2 3

  25. 25 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A 1 A 2 A 3 A B 1 2 3

  26. 26 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  27. 27 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  28. 28 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  29. 29 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  30. 30 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  31. 31 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 A B 1 2 3

  32. 32 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

  33. 33 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p ⨝ s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

  34. 34 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

  35. 35 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Probe HT Probe HT Probe HT Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

  36. 36 INTRA-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B Exchange WHERE A.id = B.id ⨝ AND A.value < 99 AND B.value > 100 p Exchange Probe HT Probe HT Probe HT Build HT Build HT Build HT p p p p p p ⨝ s s s s s s s s A 1 A 2 A 3 B 1 B 2 B 3 A B 1 2 3 1 2 3

  37. 37 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

  38. 38 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s A B

  39. 39 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id AND A.value < 99 AND B.value > 100 p ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

  40. 40 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id 2 p AND A.value < 99 AND B.value > 100 for r ∊ incoming: p emit ( p r) ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

  41. 41 INTER-OPERATOR PARALLELISM SELECT A.id, B.value FROM A, B WHERE A.id = B.id 2 p AND A.value < 99 AND B.value > 100 for r ∊ incoming: p emit ( p r) ⨝ s s 1 ⨝ for r 1 ∊ outer: for r 2 ∊ inner: A B emit (r 1 ⨝ r 2 )

Recommend


More recommend