Toward GPU Accelerated Data Stream Processing Marcus Pinnecke, David Broneske and Gunter Saake University of Magdeburg, Germany May 27, 2015
Background and Motivation Fundamentals, Windowing, GPU Acceleration in DBMS/SPS
Data Stream Processing Application requirements Examples ■ System Monitoring and Fraud Prevention — Log files about load, network activity, storage ■ Social Media — Identify topics of interest online, such as top-k hash tags on Twitter ■ … Requirements ■ Real-time response Data Stream Processing ■ Continuous processing and analysis ■ High-volume data, potentially infinite ■ High-velocity data (many changes) Toward GPU Accelerated Data Stream Processing 1
Data Stream Processing Processing Model and Windowing Infinite streams of data, but… ■ Limited main memory and ■ Only sequential access Solutions ■ Reduction of data amount (e.g., sampling) or ■ Buffering ( windowing ) Toward GPU Accelerated Data Stream Processing 2
Data Stream Processing Processing Model and Windowing stream of windows stream of events Windows infinite finite finite finite Time-Based Count-Based • More common for real applications • Variable number of events per window • Problematic due to limited GPU memory Toward GPU Accelerated Data Stream Processing 3
Data Stream Processing Bottleneck — Example Join Algorithm ■ Number of join candidates depends on number of events inside window ⨝ Toward GPU Accelerated Data Stream Processing 4
Data Stream Processing Bottleneck — Example Join Algorithm ■ Number of join candidates depends on number of events inside window ■ Many events in the same instant for time-based windows ■ Decrease of throughput ⨝ Toward GPU Accelerated Data Stream Processing 4
Data Stream Processing Bottleneck — Back Pressure Data flow systems (e.g., stream processing) suffer of back pressure Back pressure ■ Upwards-propagated decrease of throughput ■ To the level of the slowest component Results is need for load shedding. Toward GPU Accelerated Data Stream Processing 5
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck throughput slowest σ … component ⨝ σ … Toward GPU Accelerated Data Stream Processing 6
Data Stream Processing Bottleneck — Solutions ■ Parallelization of operators C A C B C ■ Distributed computation Site 2 Site 1 A B C more computation resources Toward GPU Accelerated Data Stream Processing 7
GPU? CPU Site 2 Site 1 A B C In DBMS? Toward GPU Accelerated Data Stream Processing 7
Database Management Systems GPUs in DMBS ■ … Efficient co-processor ■ … Might outperform CPUs for certain operations ■ … Computations are highly parallel (SIMD) ■ … Huge corpus on research results Some conclusions ■ Data transfer costs to and from graphic card are critical ■ Operation should match GPU architecture (e.g., branch free) ■ Operation must be expensive enough to amortize transfer costs ■ Column-oriented architectures save transfer costs Toward GPU Accelerated Data Stream Processing 8
GPU Acceleration for Data Stream Processing Challenges Limited memory on graphic cards VS (time-based) windows can be huge event representation (tuple) does not match the GPU architecture Toward GPU Accelerated Data Stream Processing 9
GPU-ready Stream Processing Our 1 st contribution: Handle graphic card memory limitation for very large windows via bucketing
GPU-ready Stream Processing Bucketing We suggest Portioning streams of variable-length window of tuples into a stream of “ Buckets ” Bucket: fixed-size window portions with column-oriented event representation Toward GPU Accelerated Data Stream Processing 10
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 Bucketing Operator Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 Bucketing 5 4 3 2 1 3 2 1 7 4 6 5 3 4 3 2 1 5 Operator 7 4 6 5 3 4 3 2 1 5 5 4 3 2 1 3 2 1 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) 3 events, column-oriented Bucket-at-a-Time Let’s say bucket size 3 3 2 1 3 2 1 Bucketing 5 4 3 5 2 4 1 7 6 8 6 5 4 4 3 3 5 2 1 Operator 7 6 8 6 5 4 4 3 3 5 2 1 5 4 3 5 2 4 1 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 5 4 3 2 1 5 4 3 2 1 Bucketing 5 4 3 8 7 7 6 6 8 6 4 4 5 3 5 2 3 3 4 2 5 1 1 Operator 5 4 3 8 7 7 6 6 8 6 4 4 5 3 5 2 3 3 4 2 5 1 1 5 4 3 2 1 5 4 3 2 1 5 events, column-oriented Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 5 4 3 2 1 5 4 3 5 4 3 2 1 5 4 3 Bucketing 7 6 5 4 3 8 7 7 6 6 8 6 4 2 3 4 5 5 1 3 Operator 8 7 7 6 6 8 6 4 2 3 4 5 5 1 3 7 6 5 4 3 5 4 3 2 1 5 4 3 2 1 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 5 4 3 2 1 5 4 3 5 4 3 2 1 5 4 3 Bucketing 8 7 6 8 8 7 7 6 6 8 6 2 3 4 4 5 1 5 3 Operator 8 7 7 6 6 8 6 2 3 4 4 5 1 5 3 8 7 6 8 7 6 5 4 3 5 4 3 2 1 7 6 5 4 3 5 4 3 2 1 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 5 4 3 2 1 8 7 6 5 4 3 5 4 3 2 1 8 7 6 5 4 3 Bucketing Bucketing 8 7 6 8 7 4 6 6 3 7 2 5 3 4 5 7 4 1 8 6 6 5 8 3 8 7 6 Operator Operator 8 7 4 6 6 3 7 2 5 3 4 5 7 4 8 1 6 6 5 8 3 8 7 6 8 7 6 8 7 6 5 4 3 5 4 3 2 1 8 7 6 5 4 3 5 4 3 2 1 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Bucketing (2) Bucket-at-a-Time Let’s say bucket size 3 5 4 8 7 6 8 7 6 5 4 3 5 4 8 7 6 8 7 6 5 4 3 Bucketing 7 4 8 3 7 2 8 5 3 4 5 7 6 1 6 6 8 Operator 7 4 8 3 7 2 8 5 3 4 5 7 6 1 6 6 8 8 7 6 8 7 6 5 4 3 5 4 3 2 8 7 6 8 7 6 5 4 3 5 4 3 2 Let’s say bucket size 5 Bucket-at-a-Time 11
GPU-ready Stream Processing Benefits through Bucketing ■ Each operator requests its own bucket size k ■ The bucket size is independent of the actual window length We suggest a technique called bucketing , that portions each stream of vary- ■ Memory allocation on graphic card has an upper bound for input length window of tuples (events) into a stream of fixed-size window ■ Bucketing flips event representation portions with column-orientated event representation (Buckets) ■ Processing entire columns ■ Window length > bucket size, the window is split into portions ■ Single bucketing-operator can be subscribed by many operators Toward GPU Accelerated Data Stream Processing 12
GPU-ready Stream Processing Buckets versus Windows Windowing Bucketing We suggest a technique called bucketing , that portions each stream of vary- ■ Bounding infinite stream ■ Portioning windows Purpose length window of tuples (events) into a stream of fixed-size window ■ Stream of events ■ Stream of windows Consumes portions with column-orientated event representation (Buckets) Produces ■ Stream of windows ■ Stream of buckets ■ Might be huge ■ Has upper bound #Events Events Represention ■ Tuples ■ Column-wise Toward GPU Accelerated Data Stream Processing 13
GPU-ready Stream Processing Achieve bucketing Slice subscriber 1 Slice subscriber 2 Slice subscriber 3 n Stream Schema Length Actual View Ring Buffer 1 Ring Buffer 2 … Ring Buffer n Toward GPU Accelerated Data Stream Processing 14
GPU-ready Stream Processing Achieve bucketing Slice subscriber 1 Slice subscriber 2 Slice subscriber 3 n Stream Schema Length Actual View Ring Buffer 1 ( a b c ) 1 1 1 Ring Buffer 2 … Ring Buffer n Toward GPU Accelerated Data Stream Processing 14
GPU-ready Stream Processing Achieve bucketing Slice subscriber 1 Slice subscriber 2 Slice subscriber 3 n Stream Schema Length Actual View a Ring Buffer 1 1 b Ring Buffer 2 1 … c Ring Buffer n 1 Toward GPU Accelerated Data Stream Processing 14
Recommend
More recommend