Locality-Adaptive Parallel Hash Joins using Hardware Transactional - PowerPoint PPT Presentation

Locality-Adaptive Parallel Hash Joins using Hardware Transactional Memory ANIL SHANBHAG , HOLGER PIRK, SAM MADDEN MIT CSAIL

History of Parallel Hash Joins Shared Hash Table Radix Partitioning based Join based Join Pictures from “Main-Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware” Balkesen et al. 2 MIT CSAIL

Motivation Data can have spatial locality May arise because of : ◦ Periodic bulk updates => Locality in date and correlated attributes ◦ Trickle loading in OLTP systems => Locality in date ◦ Automatically assigned IDs => monotonically increasing counters From “Column Imprints: A Secondary Index Structure” Sidirourgos et. al, SIGMOD 13 3 MIT CSAIL

Motivation Simple experiment: Compare the time of hash building phase of 3 approaches: ◦ Global hash table using atomics (Atomic) ◦ Parallel Radix Join (PRJ) ◦ Global hash table with no conc. Control (NoCC) NoCC is incorrect; existing approaches are > 3x slower than it. 4 MIT CSAIL

Can we do as good as NoCC ? Yes we can ! Rest of this talk: ◦ Using HTM to achieve better performance ◦ Making HTM-based hash join self-tuning ◦ Adaptively fall back to Radix Join 5 MIT CSAIL

Hardware Transactional Memory Sequence of instructions with ACI(D) properties Balance Transfer { Balance Transfer { Balance Transfer { _xbegin() Lock() A.lock() B.lock() A_balance -= 10 A_balance -= 10 A_balance -= 10 B_balance += 10 B_balance += 10 B_balance += 10 xend() Unlock() B.unlock() A.unlock() } } } Using HTM Using Global Lock Using Fine Grained Locks Intel Haswell uses L1 Cache as staging 6 MIT CSAIL

HTM vs using atomics Gap between HTM and NoCC is the overhead of using HTM HTM does better than Atomic always. The larger gap for shuffled data shows the overhead of doing atomic operation vs optimistic load/store. 7 MIT CSAIL

Reducing Transaction Overhead To reduce the transaction overhead, do multiple insertions per transaction. Shuffled Data Sorted Data 8 MIT CSAIL

Wrt Data Locality 9 MIT CSAIL

Our Hash Table So-Far 10 MIT CSAIL

Adaptive Transaction Size Selection Transaction size remains a variable that would require manual tuning Optimal performance hinges on appropriate selection of the transaction size Our simple adaption strategy: ◦ Start with TS = 16 ◦ Process input in batches of 16k tuples and monitor abort rate ◦ If abort rate > high-watermark: TS /= 2 ◦ Else if abort rate < low-watermark: TS *= 2 We chose 0.4% as low and 2% as high 11 MIT CSAIL

Fallback for fully-shuffled data With sufficient locality, the HTM-based approach performs best For large shuffle windows, radix join performs better Key Insight: Larger shuffle windows also coincide with high transaction abort rates Hybrid approach: ◦ Process first batch of 16k tuples on each thread and inspect abort rate (takes ~ 4ms) ◦ If abort rate > threshold: Switch to do radix join We found threshold = 4% appropriate for our experiments 12 MIT CSAIL

Build Phase Performance 13 MIT CSAIL

Complete Hash Join (with probe) Also compare against No-Partitioning Join (implemented by Balkesen et al.) and Sort Merge Join based on TimSort HTM-Adaptive matches/beats all the approaches 14 MIT CSAIL

Conclusion HTM is great for low-overhead fine-grained concurrency control HTM-based hash building with adaptive transaction size comes very close to memory bandwidth for data with locality Abort rates can be used to detect lack of locality and fallback to radix join The resulting join algorithm is the best global hash table based approach ◦ Beats radix join by 3x on data with locality ◦ Falls back to radix join in the absence of it. 15 MIT CSAIL

Thank You J 16 MIT CSAIL

Performance on Uniform Data 17 MIT CSAIL

Abort Code ? 18 MIT CSAIL

Locality-Adaptive Parallel Hash Joins using Hardware Transactional - PowerPoint PPT Presentation

Locality-Adaptive Parallel Hash Joins using Hardware Transactional Memory ANIL SHANBHAG , HOLGER PIRK, SAM MADDEN MIT CSAIL History of Parallel Hash Joins Shared Hash Table Radix Partitioning based Join based Join Pictures from

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

Scott W. Gaylord Professor of Law Elon University School of Law James Madison, Federalist No.

Definition A transaction is a collection of instructions (or 9: Transactions operations) that

Chapter 10: Signals CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45 Soc Sci 2, Rm.

ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads Kangnyeon Kim Tianzheng

Hardware Enclaves & In Intel SGX CS261 Hardware Enclaves HW abstractions for

Modularization of Multimodal Interaction Specification Matthias Denecke, Kohji Dohsaka, Mikio

0.1 Abortion and the marginal child A pregnant woman aborts the pregnancy because the child

Locations of ART Clinics in the United States and Puerto Rico, 2013 Number of ART clinics in the

Locality-Adaptive Parallel Hash Joins using Hardware Transactional - PowerPoint PPT Presentation

Locality-Adaptive Parallel Hash Joins using Hardware Transactional Memory ANIL SHANBHAG , HOLGER PIRK, SAM MADDEN MIT CSAIL History of Parallel Hash Joins Shared Hash Table Radix Partitioning based Join based Join Pictures from

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

SQL Workshop Joins Doug Shook Inner Joins Joins are used to combine data from multiple

Hash Functions in Action Hash Functions in Action Lecture 12 Hash Functions Hash Functions

Hash Functions in Action Hash Functions in Action Lecture 11 Hash Functions Hash Functions

Hash Functions Hash Functions 1 Cryptographic Hash Function Crypto hash function h(x) must

Hash Functions and Hash Tables (2.5.2) A hash function h maps keys of a given type to

Hash Pile Ups: Using Collisions to Identify Unknown Hash Functions R. Joshua Tobin and David

SQL$Joins Max$Masnick August&amp;7,&amp;2015 What%are%joins?

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

Generics Asumu Takikawa RacketCon 2012 1 What are generics? 2 What are generics? hash-ref

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

JOINS IN SQL By Rohit Dhanwani OBJECTIVES Define and use different types of joins INNER

S9557 EFFECTIVE, SCALABLE MULTI-GPU JOINS Tim Kaldewey, Nikolay Sakharnykh and Jiri Kraus, March

Scott W. Gaylord Professor of Law Elon University School of Law James Madison, Federalist No.

Definition A transaction is a collection of instructions (or 9: Transactions operations) that

Chapter 10: Signals CMPS 105: Systems Programming Prof. Scott Brandt T Th 2-3:45 Soc Sci 2, Rm.

ERMIA: Fast Memory-Optimized Database System for Heterogeneous Workloads Kangnyeon Kim Tianzheng

Hardware Enclaves &amp; In Intel SGX CS261 Hardware Enclaves HW abstractions for

Modularization of Multimodal Interaction Specification Matthias Denecke, Kohji Dohsaka, Mikio

0.1 Abortion and the marginal child A pregnant woman aborts the pregnancy because the child

Locations of ART Clinics in the United States and Puerto Rico, 2013 Number of ART clinics in the

SQL$Joins Max$Masnick August&7,&2015 What%are%joins?

Hardware Enclaves & In Intel SGX CS261 Hardware Enclaves HW abstractions for