parallel query execution in polardb for mysql
play

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny - PowerPoint PPT Presentation

Parallel Query Execution in POLARDB for MySQL ystein Grvlen Benny Wang Alibaba Cloud Agenda What is Parallel Query? Parallel Query Design How to use Parallel Query Parallel Query Performance Future Work Agenda


  1. Parallel Query Execution in POLARDB for MySQL Øystein Grøvlen Benny Wang Alibaba Cloud

  2. Agenda What is Parallel Query? • Parallel Query Design • How to use Parallel Query • Parallel Query Performance • Future Work •

  3. Agenda What is Parallel Query? • Parallel Query Design • How to use Parallel Query • Parallel Query Performance • Future Work •

  4. What is Parallel Query? Parallel Query is an innovative method to accelerate MySQL queries from Alibaba Cloud. • Traditionally, 1 MySQL query runs with just 1 thread, and can not take advantage of multiple cores on modern processors. • Parallel Query takes advantage of modern processors to distribute work across many or all available cores: • 8 parallel threads can be up to 8 times faster • 32 parallel threads can be up to 32 times faster

  5. Why Parallel Query? • 2003: CPUs stopped getting faster • 2004-2019 focus on more cores, sockets. • PQ lets MySQL take advantage of last 15 years of progress.

  6. How to Use Parallel Query Parallel Query runs against your existing InnoDB data. No data extraction to another system is required. No query modifications are required. Parallel Query within InnoDB (no extraction needed) is an amazing feature exclusive to Alibaba Cloud

  7. Query with Parallelism SELECT count(*) FROM production.product; Serial execution plan: SQL Thread 1: Scan, Count Client 1 active thread 63 idle threads .

  8. Parallel Execution Plan Thread 1: Scan, Count Thread 2: Scan, Count With 64 parallel Thread 3: Scan, Count threads, each thread SQL Thread 4: Scan, Count Sum Client does < 2% of the work. Thread 5: Scan, Count Thread 6: Scan, Count Thread 7: Scan, Count . . . Thread 64: Scan, Count

  9. Agenda What is Parallel Query? • Parallel Query Design • How to use Parallel Query • Parallel Query Performance • Future Work •

  10. Parallel Query Architecture " ! ⑅ LEADER … ⑅ T T Gather … … MESSAGE QUEUE " " " " " ! ! ! ! ! … … ⑅ WORKERs ⑅ ⑅ ⑅ ⑅ … ⑅ … … … … ⑅ ⑅ ⑅ ⑅ T T T T T T T T T T T T T T T … … … … … PARALLEL SCAN

  11. Partitioning 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

  12. Partitioning InnoDB partitions the B-tree 2 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Partition 1 Partition 2

  13. Partitioning Workers see only one partition (at a time) 2 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Partition 1 Partition 2

  14. Partitioning 2 nd level of B-tree may be used to split into many partitions 6 partitions 11 17 25 5 8 14 20 22 28 31 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Part. 5 Part. 4 Part. 6 Part. 1 Part. 3 Part. 2

  15. Partitioning • Server will normally request 100 partitions per worker thread • “Fast” workers may process more partitions than “slow” workers • Partitions of more equal size • When finished with one partition, a worker may be automatically attached to a new partition.

  16. Parallel Query Sort Parallel threads run local sort SELECT col1, col2, col3 FROM t1 ORDER BY 1,2; Thread 1: Scan, Sort Thread 2: Scan, Sort Thread 3: Scan, Sort Merge SQL Thread 4: Scan, Sort Sort Client Thread 5: Scan, Sort Thread 6: Scan, Sort 1. Parallel data access (table scan or index) Thread 7: Scan, Sort 2. Parallel order by of the data handled by each worker . . . Thread 64: Scan, Sort 3. Final merge sort of the results and return to client.

  17. Parallel Query GROUP BY Parallel threads run local group SELECT col1, col2, SUM (col3) FROM t1 GROUP BY 1,2; Thread 1: Scan, Group Thread 2: Scan, Group Thread 3: Scan, Group Merge SQL Thread 4: Scan, Group Groups Client Thread 5: Scan, Group 1. Parallel data access (table scan or index) Thread 6: Scan, Group Thread 7: Scan, Group 2. Parallel group by of the data handled by each worker . . . 3. Final merge of the local group by and return results Thread 64: Scan, Group DISTINCT operation will be similar to GROUP BY.

  18. Parallel Query Nested-Loops JOIN Parallel scan and join SELECT * FROM t1 JOIN t3 ON t1.id = t3. id ; Thread 1: Scan, Join Thread 2: Scan, Join Thread 3: Scan, Join SQL Thread 4: Scan, Join Merge Client Thread 5: Scan, Join 1. Parallel data access (table scan or index) of driving table Thread 6: Scan, Join 2. Parallel join of the local data handled by each worker Thread 7: Scan, Join . . . 3. Final merge of the and return to client Thread 64: Scan, Join

  19. Parallel Query Usage System variables to control parallel query To enable parallel execution for a session: set max_parallel_degree = n Maximum n worker threads will be used. MySQL may still decide to not use parallelization. If so, parallel execution may be forced with set force_parallel_mode = on

  20. Parallel Query Usage Hints to control parallel query To force parallel query execution for a single query: SELECT /*+ PARALLEL () */ * FROM ... To force the use of a specific number of worker threads, n : SELECT /*+ PARALLEL (n) */ * FROM ... More hints are on the way.

  21. Agenda What is Parallel Query? • Parallel Query Design • How to use Parallel Query • Parallel Query Performance • Future Work •

  22. Parallel Query Resource control Control number of parallel workers To control worker threads for parallel query: set max_parallel_workers = n Only maximum n worker threads will be allowed in total. MySQL will decide to whether to use parallel query and how many workers for current query. total number_of_workers <= max_parallel_workers; actual_workers_number <= max_data_partitions; actual_workers_number <= max_parallel_degree;

  23. Parallel Query Resource control Monitor the running status of parallel query mysql> select * from performance_schema.events_parallel_query_current\G *************************** 2. row *************************** *************************** 1. row *************************** THREAD_ID: 95 THREAD_ID: 94 PARENT_THREAD_ID: 94 PARENT_THREAD_ID: 0 PARALLEL_TYPE: WORKER PARALLEL_TYPE: GATHER EVENT_ID: 2 EVENT_ID: 11 END_EVENT_ID: NULL END_EVENT_ID: NULL EVENT_NAME: parallel_query EVENT_NAME: parallel_query STATE: COMPLETED STATE: COMPLETED PLANNED_DOP: 0 PLANNED_DOP: 16 ACTUAL_DOP: 0 ACTUAL_DOP: 16 NUMBER_OF_PARTITIONS: 0 NUMBER_OF_PARTITIONS: 36 PARTITIONED_OBJECT: PARTITIONED_OBJECT: t1 ROWS_SCANED: 718 ROWS_SCANED: 10189 ROWS_SENT: 8 ROWS_SENT: 77 ROWS_SORTED: 0 ROWS_SORTED: 0 EXECUTION_TIME: 423644016 EXECUTION_TIME: 435373818 NESTING_EVENT_ID: 1 NESTING_EVENT_ID: 9 NESTING_EVENT_TYPE: STATEMENT NESTING_EVENT_TYPE: STATEMENT **************************** 3. row***************************

  24. Parallel Query Resource control Constrain total memory To control maximum memory consumed by parallel query: set query_memory_hard_limit = n set query_memory_soft_limit = n If the total memory usage is above query_memory_hard_limit , the parallel query will be aborted and an error will be reported. If the total memory usage is above query_memory_soft_limit , parallel query execution will not be chosen. The following memory buffers we considered mainly • Internal temporary table • Sort buffer • Join cache

  25. Parallel Query Resource control Constrain total memory - example mysql> set global query_memory_soft_limit = 99 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> set join_buffer_size = 100 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> select x.b from t1 x, t1 y where x.a = y.a; … mysql> show warnings; +---------+-------+----------------------------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+-------+----------------------------------------------------------------------------------------------------------------+ | Warning | 13250 | Could not choose parallel plan due to total query memory temporarily overflow, consider increasing @@query_memory_soft_limit value. | +---------+-------+----------------------------------------------------------------------------------------------------------------+ 1 row in set (0.01 sec) n set, 1 warning (0.00 sec)

  26. Parallel Query Resource control Control total memory - example mysql> set global query_memory_hard_limit = 99 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> set join_buffer_size=100 * 1024 * 1024; Query OK, 0 rows affected (0.00 sec) mysql> select x.b from t1 x, t1 y where x.a = y.a; ERROR 13249 (HY000): Failed to allocate memory for query due to overflow, consider increasing @@query_memory_hard_limit value.

  27. Parallel Query Resource control Monitor memory status SHOW GLOBAL STATUS WHERE variable_name = ‘total_running_parallel_workers’; SHOW GLOBAL STATUS WHERE variable_name = ‘total_used_query_memory';

Recommend


More recommend