are hybrid physical
play

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ - PowerPoint PPT Presentation

Columnstore and B+ tree Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree B+ tree B+ tree & Columnstore on same table = Hybrid design 4 ? C O L C O L B+ tree B+ tree Are Hybrid


  1. Columnstore and B+ tree – Are Hybrid Physical Designs Important? 1

  2. B+ tree 2

  3. C O L B+ tree 3

  4. ? C O L C O L B+ tree B+ tree B+ tree & Columnstore on same table = Hybrid design 4

  5. ? C O L C O L B+ tree B+ tree Are Hybrid Designs important and which workloads can benefit? 5

  6. Hybrid design = B+ tree & Columnstore C O L B+ tree 6

  7. Hybrid design = B+ tree & Columnstore

  8. Mix: Scans & Concurrency Selectivity Sort order Updates Updates C O L C O L B+ tree B+ tree 8

  9. ▪ ▪ ▪ ▪ ▪ ▪ 9

  10. SELECT sum(col1) FROM table WHERE col1 < {1} 100,000 10 GB, 1 int col Execution time (millisec) 10,000 1,000 100 10 B+ tree hot 1 0 0 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Selectivity (%) 10

  11. SELECT sum(col1) FROM table WHERE col1 < {1} 100,000 10 GB, 1 int col Execution time (millisec) 10,000 0.2% 1,000 100 10 B+ tree hot 1 0 0 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Selectivity (%) Skip data effectively & run single-threaded 11

  12. SELECT sum(col1) FROM table WHERE col1 < {1} 100,000 10 GB, 1 int col Execution time (millisec) 10,000 1,000 0.03% 100 10 Col hot B+ tree hot 1 0 0 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Selectivity (%) Superior performance of Columnstore scans 12

  13. SELECT sum(col1) FROM table WHERE col1 < {1} 100,000 10 GB, 1 int col Execution time (millisec) 10,000 1,000 0.03% 100 10 Col cold B+ tree cold 1 Col hot B+ tree hot 0 0 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Selectivity (%) B+ tree helps for low selectivity & slower storage 13

  14. SELECT sum(col1) FROM table WHERE col1 < {1} 100,000 10% 10 GB, 1 int col Execution time (millisec) 10,000 1,000 0.03% 100 10 Col cold B+ tree cold 1 Col hot B+ tree hot 0 0 8.00E-06 4.00E-05 0.0002 0.001 0.008 0.03 0.06 0.09 0.3 0.7 2 10 25 40 60 100 Selectivity (%) B+ tree helps for low selectivity & slower storage 14

  15. SELECT col1, sum(col2) FROM table 90,000 Execution time (millisec) GROUP BY col1 80,000 20 GB, 2 int col, 70,000 vary number of distinct values in col1 60,000 B+ tree sorted on col1 50,000 40,000 B+ tree Col 30,000 20,000 10,000 0 100 10,000 1,000,000 # of groups 15

  16. SELECT col1, sum(col2) FROM table 90,000 Execution time (millisec) GROUP BY col1 80,000 20 GB, 2 int col, 70,000 vary number of distinct values in col1 60,000 B+ tree sorted on col1 50,000 40,000 B+ tree Col 30,000 20,000 10,000 0 100 10,000 1,000,000 # of groups Scanning & hashing Col faster than reading sorted B+ tree 16

  17. SELECT col1, sum(col2) FROM table 90,000 Execution time (millisec) GROUP BY col1 80,000 20 GB, 2 int col, 70,000 vary number of distinct values in col1 60,000 B+ tree sorted on col1 50,000 40,000 B+ tree Col 30,000 20,000 10,000 0 100 10,000 1,000,000 # of groups Sort order of B+ tree beneficial for scarce query memory 17

  18. Primary Columnstores ▪ ▪ ▪ ▪ 18

  19. Secondary Columnstores Key not in ▪ ▪ Delete ▪ buffer 19

  20. Update top 10 rows Hybrid design Pri. B+ tree Pri. B+ tree with Sec. Col Pri. Col 1,000,000 TPC-H 30 GB, 10 concurrent queries, Read Committed TPC-H 30 GB, 10 concurrent queries, Read Committed Execution time (millisec) 100,000 10,000 1,000 100 10 1 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Percentage (%) for scans and updates 20

  21. Update top 10 rows Hybrid design Pri. B+ tree Pri. B+ tree with Sec. Col Pri. Col 1,000,000 TPC-H 30 GB, 10 concurrent queries, Read Committed TPC-H 30 GB, 10 concurrent queries, Read Committed Execution time (millisec) 100,000 10,000 1,000 100 10 1 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Percentage (%) for scans and updates B+ trees cheaper than Columnstores for pure updates 21

  22. Update top 10 rows, Select sum of quantity & price for a single shipdate from lineitem table Pri. B+ tree Pri. B+ tree with Sec. Col Pri. Col 1,000,000 TPC-H 30 GB, 10 concurrent queries, Read Committed TPC-H 30 GB, 10 concurrent queries, Read Committed 100,000 Total execution time 10,000 (millisec) 1,000 100 10 1 scan: 0,update: 100 scan: 1, update: 99 scan: 5, update: 95 Percentage (%) for scans and updates Secondary Columnstore: balance small updates & large scans 22

  23. Hybrid design = B+ tree & Columnstore 1. 1. Micro-benchmarks

  24. Database Server Query Workload , what-if DTA Optimizer Constraints (e.g. storage budget) Create Index Output Drop Index … 24

  25. Database Server Query Workload , what-if DTA Optimizer Constraints (e.g. storage budget) Create Index Output Drop Index … 25

  26. Database Server Query Workload , what-if DTA Optimizer Constraints (e.g. storage budget) Create Index Output Drop Index … 26

  27. Database Server Query Workload , what-if DTA Optimizer Constraints (e.g. storage budget) Create Index Output Drop Index … 27

  28. 1. Costing queries on Hypothetical Columnstores 28

  29. 1. Costing queries on Hypothetical Columnstores 2. Per-column size estimation - stay within storage budget - per-column access cost - hard problem 29

  30. 1. Optimal designs searched over: combined space of Columnstores & B+ trees 30

  31. 1. Optimal designs searched over: combined space of Columnstores & B+ trees 2. Released as part of SQL Server 2017 CTP 31

  32. Hybrid design = B+ tree & Columnstore 1. 1. Micro-benchmarks 2. 2. Auto Recommend Hybrid Designs

  33. ▪ ▪ ▪ ▪ ▪ DTA ▪ ▪ ▪ ▪ ▪ 33

  34. 100% Percentage of query plans 80% 60% 40% 20% 0% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Hybrid Query Plans 34

  35. 100% Percentage of query plans 80% 60% 40% 20% 0% Cust1 Cust2 Cust3 Cust4 Cust5 TPC-DS Hybrid Query Plans Hybrid Query Plans (Same Table) 35

  36. 50 46 TPC-DS benchmark Hybrid Vs B+ tree only 100 GB 40 Hybrid Vs Columnstore only # of Queries 27 30 23 20 16 11 11 10 10 10 7 10 5 5 4 4 4 1 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (CPU time) 36

  37. 50 46 TPC-DS benchmark Hybrid Vs B+ tree only 100 GB 40 Hybrid Vs Columnstore only # of Queries 27 30 23 20 16 11 11 10 10 10 7 10 5 5 4 4 4 1 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (CPU time) 37

  38. 50 46 TPC-DS benchmark Hybrid Vs B+ tree only 100 GB 40 Hybrid Vs Columnstore only # of Queries 27 30 23 20 16 11 11 10 10 10 7 10 5 5 4 4 4 1 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (CPU time) 38

  39. 50 46 40 # of Queries 27 30 23 20 16 11 11 10 10 10 7 10 5 5 4 4 4 1 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (CPU time) Hybrid designs significantly improve decision support workload 39

  40. ▪ ▪ ▪ 20X ▪ 10X B+tree seek Col scan hash join date_dim date_dim nested- loop join B+tree seek web_sales Col scan concatenate web_sales B+tree seek date_dim nested-loop Col scan join B+tree seek catalog_sales Columnstore-only design Hybrid design catalog_sales

  41. 1K warehouses 20 Hybrid Vs B+ tree only (Snapshot Isolation) No. of Queries 11 10 4 3 3 3 2 1 0 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (wall-clock time) 41

  42. 1K warehouses 20 Hybrid Vs B+ tree only (Snapshot Isolation) No. of Queries 11 10 4 3 3 3 2 1 0 0 0.5 0.8 1.2 1.5 2 5 10 >10 Bins of Speedup (wall-clock time) 2X slowdown of transactions & 10X speed-up of analytics 42

  43. ▪ ▪ ▪ 10X 0X ▪ DTA ▪ ▪ ▪ 44

  44. ▪ ▪ ▪ ▪ ▪ ▪ ▪ 47

  45. VECTORIZED Selectivity C O L values COMPRESSED LATE MATERIALIZATION Sortedness SORTED GLOBALLY FAST MODIFICATIONS Concurrency … B+ tree MEMORY EFFICIENT 48

  46. point lookups streaming via & short scans sortedness B+ tree balance scans Leverage both & updates B+ tree & Col C O L Batch-mode & large & fast compression scans 49

  47. ▪ ▪ ▪ 50

  48. A B A B A B 30 0.0 3 0 0 0 A B 3 1 30 0.1 1 0 0, 1 0, 3 0 0 0 0.0 3 0 1, 1 1, 3 10 0.0 1 0 3 1 3, 4 3 1 30 0.1 3 1 3 1 30 0.1 3 1 GEE estimator groups that occur once in the sample are scaled by total_size / sample_size (e.g. [0,1] and [1,1]), other groups are counted once in total 51

  49. ▪ ▪ ▪ ▪ 52

  50. Work- DB # of Max Avg. # # of Avg. # Avg. # load Size tables table of cols queries of joins of ops (GB) size per per (GB) table plan Cust1 172 23 63.8 14.1 36 7.2 29.1 Cust2 44.6 614 44.6 23.5 40 8.1 28.3 Cust3 138.4 3394 79.8 26.3 40 8.7 24.1 Cust4 93 22 54.8 20.3 24 6.9 24.4 Cust5 9.83 474 1.52 5.5 47 23.1 57.7 TPC-DS 87.7 24 34.9 17.2 97 7.9 28.2 TPC-CH 11 27 53

  51. ▪ ▪ ▪ ▪ 54

  52. ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ 55

  53. Key not in ▪ ▪ ▪ Delete buffer 57

Recommend


More recommend