Accelerating the merge phase of sort-merge join Accelerating the merge phase of sort-merge join FPL 2019 – The 29th International Conference on Field-Programmable Logic and Applications Philippos Papaphilippou, Holger Pirk, Wayne Luk Dept. of Computing, Imperial College London, UK {pp616, pirk, w.luk}@imperial.ac.uk Source code: philippos.info/mergejoin 9/9/2019 Philippos Papaphilippou 1
Accelerating the merge phase of sort-merge join The task: equi-join A-Key B-Key Value A-Key Value B-Key Value A1 B1 2 A1 2 B1 2 A1 B2 2 A2 2 B2 2 A2 B1 2 A3 3 ⨝ = B3 3 A2 B2 2 A4 3 B4 5 A3 B3 3 A5 3 B5 6 A4 B3 3 A6 11 A5 B3 3 ● Equi-join – Join two tables based on key equality – Cartesian product when there are more than 1 keys in one of the 2 tables ● Popular algorithms – Hash-join → Random access pattern – Sort-merge join → Streaming access pattern → FPGA friendly 9/9/2019 Philippos Papaphilippou 3
Accelerating the merge phase of sort-merge join Challenges in related work ● Input properties – Presence of duplicate keys → complicates the hardware and access patterns – Long input → limited storage inside the FPGA – Wide input → moving big rows is expensive – Some designs are inapplicable or slow down ● Data movement – Narrow inter-chip (CPU ↔ FPGA) communication – Induced latency ● Scalability – Future technologies (High-throughput) – Big data → arbitrarily long tables 9/9/2019 Philippos Papaphilippou 4
Accelerating the merge phase of sort-merge join Abstracted solution ● High-Throughput Stream processor ● Inputs – Sorted keys of table A – Sorted keys of table B ● Output – Index ranges where the key was the same ● Expand on demand (late materialisation) 9/9/2019 Philippos Papaphilippou 5
Accelerating the merge phase of sort-merge join Proposal Building blocks Round-robin module – Co-grouping engine – Modified FLiMS – 9/9/2019 Philippos Papaphilippou 6
Accelerating the merge phase of sort-merge join Round-robin module ● Stream processor CAS network (bitonic sorter) ● Rearranges sparse input, before writing in multiple Barrel Shifters banks ● Round-robin effect, but in parallel MSB + + + + SR 9/9/2019 Philippos Papaphilippou 7
Accelerating the merge phase of sort-merge join Co-grouping engine ● Stream processor 1 cycle delay ● Provides ranges of indexes, <index start , index end , key> f g 0 where the key was the same 0 ● Input: Sorted keys <index, key> f g 1 Round 1 ● Output: Unique keys, Robin ... ... index ranges f g P-1 P-1 9/9/2019 Philippos Papaphilippou 8
Accelerating the merge phase of sort-merge join Join module ● Task: merge 2 co-grouped streams ● Output: tuples of the form <index Astar t , index Aend , index Bstart , index Bend , key> ● Main idea: Sort them together – Based on a high-throughput H/W ● merge sorter (FLiMS [FPT’18]) Match same-key groups, by only looking – at consecutives 9/9/2019 Philippos Papaphilippou 9
Accelerating the merge phase of sort-merge join Advantages ● Input agnostic – Index-based – Big data analytics ● Stream processor – FPGA-friendly ● Modular design – Novel building blocks – Can be combined with other: H/W sorters, filters, ... ● High-throughput design – Scalable for future architectures – Lower resources than related work 9/9/2019 Philippos Papaphilippou 10
Accelerating the merge phase of sort-merge join Evaluation on a heterogeneous system ● Platform Zynq UltraScale+ device – 16384 Empty space: 3 Operating system: Petalinux – no more key matches than Output size (# of rows) the number of distinct keys Communication: DMA transfers – 12288 FPGA speedup ● Speedup of up to 3.1 times 2.5 1-port (H/W) vs 1-thread (S/W) 8192 – ● Input design space exploration 2 4096 Fraction of distinct keys (%) – Fraction of key matches (%) – 1.5 0 (directly related to the output size) ● Speedup variation factors 0 20 40 60 80 100 CPU performance – Distinct keys in A, B (%) Length of the DMA transfers (CPU→ FPGA) – 9/9/2019 Philippos Papaphilippou 11
Accelerating the merge phase of sort-merge join END Thank you for your attention! Source code for Ultra96: philippos.info/mergejoin 9/9/2019 Philippos Papaphilippou 12
Recommend
More recommend