15 721
play

15-721 ADVANCED DATABASE SYSTEMS Lecture #18 Parallel Join - PowerPoint PPT Presentation

15-721 ADVANCED DATABASE SYSTEMS Lecture #18 Parallel Join Algorithms (Hashing) Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017 2 TODAYS AGENDA Background Parallel


  1. 15-721 ADVANCED DATABASE SYSTEMS Lecture #18 – Parallel Join Algorithms (Hashing) Andy Pavlo / / Carnegie Mellon University / / Spring 2016 @Andy_Pavlo // Carnegie Mellon University // Spring 2017

  2. 2 TODAY’S AGENDA Background Parallel Hash Join Hash Functions Hash Table Implementations Evaluation CMU 15-721 (Spring 2017)

  3. 3 PARALLEL JOIN ALGORITHMS Perform a join between two relations on multiple threads simultaneously to speed up operation. Two main approaches: → Hash Join → Sort-Merge Join We won’t discuss nested-loop joins… CMU 15-721 (Spring 2017)

  4. 4 OBSERVATION Many OLTP DBMSs don’t implement hash join. But a index nested-loop join with a small number of target tuples is more or less equivalent to a hash join. CMU 15-721 (Spring 2017)

  5. 5 HASHING VS. SORTING 1970s – Sorting 1980s – Hashing 1990s – Equivalent 2000s – Hashing 2010s – ??? CMU 15-721 (Spring 2017)

  6. 6 PARALLEL JOIN ALGORITHMS SO SORT VS. S. HASH SH REV EVISI SITED ED: FAST ST JOIN I JO N IMPLEMENT NTATION O N ON N MO MODERN MUL MULTI-CO CORE CPU CPUS VLDB 2009 → Hashing is faster than Sort-Merge. → Sort-Merge will be faster with wider SIMD. MASSI SSIVEL ELY PARA RALLEL S SORT ORT- MAI AIN-MEMORY ORY HASH JOI OINS ON ON MERG RGE JOI OINS I IN MAIN MEMORY ORY MULTI-CO MUL CORE CPU CPUS: : TUNING T TO MULTI-CORE D MUL E DATABASE SY SE SYST STEM EMS THE U UNDERL RLYI YING H HARD RDWARE RE VLDB 2012 ICDE 2013 → Sort-Merge is already faster, → New optimizations and results even without SIMD. for Radix Hash Join. Source: Cagri Balkesen CMU 15-721 (Spring 2017)

  7. 7 JOIN ALGORITHM DESIGN GOALS Goal #1: Minimize Synchronization → Avoid taking latches during execution. Goal #2: Minimize CPU Cache Misses → Ensure that data is always local to worker thread. CMU 15-721 (Spring 2017)

  8. 8 IMPROVING CACHE BEHAVIOR Factors that affect cache misses in a DBMS: → Cache + TLB capacity. → Locality (temporal and spatial). Non-Random Access (Scan): → Clustering to a cache line. → Execute more operations per cache line. Random Access (Lookups): → Partition data to fit in cache + TLB. Source: Johannes Gehrke CMU 15-721 (Spring 2017)

  9. 9 PARALLEL HASH JOINS Hash join is the most important operator in a DBMS for OLAP workloads. It’s important that we speed it up by taking advantage of multiple cores. → We want to keep all of the cores busy, without becoming memory bound DESIGN AND EVALUATION OF MAIN MEMORY HASH JOIN ALGORITHMS FOR MULTI-CORE CPUS SIGMOD 2011 CMU 15-721 (Spring 2017)

  10. 10 CLOUDERA IMPALA % of Total CPU Time Spent in Query Operators Workload: TPC-H Benchmark HASH JOIN 25.0% SEQ SCAN 49.6% UNION 3.1% AGGREGATE 19.9% OTHER 2.4% CMU 15-721 (Spring 2017)

  11. 11 HASH JOIN (R ⨝ S) Phase #1: Partition ( optional ) → Divide the tuples of R and S into sets using a hash on the join key. Phase #2: Build → Scan relation R and create a hash table on join key. Phase #3: Probe → For each tuple in S , look up its join key in hash table for R . If a match is found, output combined tuple. CMU 15-721 (Spring 2017)

  12. 12 PARTITION PHASE Split the input relations into partitioned buffers by hashing the tuples’ join key(s). → The hash function used for this phase should be different than the one used in the build phase. → Ideally the cost of partitioning is less than the cost of cache misses during build phase. Contents of buffers depends on storage model: → NSM : Either the entire tuple or a subset of attributes. → DSM : Only the columns needed for the join + offset. CMU 15-721 (Spring 2017)

  13. 13 PARTITION PHASE Approach #1: Non-Blocking Partitioning → Only scan the input relation once. → Produce output incrementally. Approach #2: Blocking Partitioning (Radix) → Scan the input relation multiple times. → Only materialize results all at once. CMU 15-721 (Spring 2017)

  14. 14 NON-BLOCKING PARTITIONING Scan the input relation only once and generate the output on-the-fly. Approach #1: Shared Partitions → Single global set of partitions that all threads update. → Have to use a latch to synchronize threads. Approach #2: Private Partitions → Each thread has its own set of partitions. → Have to consolidate them after all threads finish. CMU 15-721 (Spring 2017)

  15. 15 SHARED PARTITIONS Data Table A B C CMU 15-721 (Spring 2017)

  16. 15 SHARED PARTITIONS Data Table hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  17. 15 SHARED PARTITIONS Data Table Partitions hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  18. 15 SHARED PARTITIONS Data Table Partitions hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  19. 16 PRIVATE PARTITIONS Data Table Partitions hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  20. 16 PRIVATE PARTITIONS Data Table Partitions hash P (key) A B C # p # p # p CMU 15-721 (Spring 2017)

  21. 16 PRIVATE PARTITIONS Data Table Partitions Combined hash P (key) A B C P 1 # p P 2 # p ⋮ # p P n CMU 15-721 (Spring 2017)

  22. 17 RADIX PARTITIONING Scan the input relation multiple times to generate the partitions. Multi-step pass over the relation: → Step #1: Scan R and compute a histogram of the # of tuples per hash key for the radix at some offset. → Step #2: Use this histogram to determine output offsets by computing the prefix sum . → Step #3: Scan R again and partition them according to the hash key. CMU 15-721 (Spring 2017)

  23. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 CMU 15-721 (Spring 2017)

  24. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 Radix 9 2 3 8 1 4 CMU 15-721 (Spring 2017)

  25. 18 RADIX The radix is the value of an integer at a particular position (using its base). Input 89 12 23 08 41 64 Radix 8 1 2 0 4 6 CMU 15-721 (Spring 2017)

  26. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 CMU 15-721 (Spring 2017)

  27. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 Prefix Sum 1 CMU 15-721 (Spring 2017)

  28. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 + Prefix Sum 1 3 CMU 15-721 (Spring 2017)

  29. 19 PREFIX SUM The prefix sum of a sequence of numbers ( x 0 , x 1 , …, x n ) is a second sequence of numbers ( y 0 , y1, …, y n ) that is a running total of the input sequence. Input 1 2 3 4 5 6 + + + + + Prefix Sum 1 3 6 10 15 21 CMU 15-721 (Spring 2017)

  30. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  31. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  32. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p # 0 7 p # 0 3 p # 1 1 p 1 # 1 5 p # 1 0 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  33. 20 RADIX PARTITIONS Step #1: Inspect input, create histograms # 0 7 p # 1 8 p 0 hash P (key) # 1 9 p Partition 0: 2 # 0 7 Partition 1: 2 p # 0 3 p # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  34. 20 RADIX PARTITIONS Step #2: Compute output offsets Partition 0 , CPU 0 # 0 7 p # 1 8 p 0 Partition 0, CPU 1 hash P (key) # 1 9 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 Partition 1: 2 p # 0 3 p Partition 1, CPU 1 # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  35. 20 RADIX PARTITIONS Step #3: Read input and partition Partition 0 , CPU 0 # 0 7 0 7 p # 1 8 p 0 Partition 0, CPU 1 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 Partition 1: 2 p # 0 3 p Partition 1, CPU 1 # 1 1 p 1 # 1 5 p Partition 0: 1 # 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  36. 20 RADIX PARTITIONS Step #3: Read input and partition Partition 0 , CPU 0 # 0 7 0 7 p # 1 8 0 7 p 0 Partition 0, CPU 1 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 , CPU 0 # 0 7 1 8 Partition 1: 2 p # 0 3 1 9 p Partition 1, CPU 1 # 1 1 1 1 p 1 # 1 5 1 5 p Partition 0: 1 # 1 0 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

  37. 20 RADIX PARTITIONS Partition 0 # 0 7 0 7 p # 1 8 0 7 p 0 hash P (key) # 1 9 0 3 p Partition 0: 2 Partition 1 # 0 7 1 8 Partition 1: 2 p # 0 3 1 9 p # 1 1 1 1 p 1 # 1 5 1 5 p Partition 0: 1 # 1 0 1 0 Partition 1: 3 p Source: Spyros Blanas CMU 15-721 (Spring 2017)

Recommend


More recommend