Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with - PowerPoint PPT Presentation

Window-Based Hybrid Stream Processing for Heterogeneous Architectures github.com/lsds/saber Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa & Peter Pietzuch Large-Scale Distributed Systems Group Department of Computing, Imperial College London http://lsds.doc.ic.ac.uk LSDS Large-Scale Distributed Systems Group

High-Throughput Low-Latency Analytics NovaSparks Facebook Insights Google Zeitgeist Feedzai 9GB 40K 40K 150M of page metrics/s user queries/s card trans/s stock options/s In less than 1 ms In less than 10 s Within ms In 25 ms window t+1 t LSDS Large-Scale Distributed Systems Group 2

Exploit Single-Node Heterogeneous Hardware Servers with CPUs and GPUs now common – 10x higher linear memory access throughput – Limited data transfer throughput PCIe Bus Command Queue Processor 1 ... N Socket 1 Socket 2 C 1 C 5 C 1 C 5 10s of C 2 C 6 C 2 C 6 streaming processors C 3 C 7 C 3 C 7 1000s of C 4 C 8 C 4 C 8 cores L3 L3 10s GB of L2 Cache RAM DMA DRAM DRAM Use both CPU & GPU resources for stream processing LSDS Large-Scale Distributed Systems Group 3

With Well-Defined High-Level Queries CQL: SQL-based declarative language for continuous queries [Arasu et al. , VLDBJ’06] Credit card fraud detection example: – Find attempts to use same card in different regions within 5-min window CQL offers correct window semantics <\> Self-join W.cid se select ct di distinct Payments [ ra range 300 seconds] as as W, from fr Payments [ pa by 1 row] as as L partition-by W.cid = L.cid an and W.region != L.region wh where LSDS Large-Scale Distributed Systems Group 4

SABER Window-Based Hybrid Stream Processing Engine for CPUs & GPUs Challenges & Contributions 1. How to parallelise sliding-window queries across CPU and GPU? Decouple query semantics from system parameters 2. When to use CPU or GPU for a CQL operator? Hybrid processing: offload tasks to both CPU and GPU 3. How to reduce GPU data movement costs? Amortise data movement delays with deep pipelining LSDS Large-Scale Distributed Systems Group 5

How to Parallelise Window Computation? Problem: Window semantics affect system throughput and latency – Pick task size based on window size? size: 4 sec 6 5 4 3 2 1 slide: 1 sec Task T 1 Output window results in order Task T 2 Window-based parallelism results in redundant computation LSDS Large-Scale Distributed Systems Group 6

How to Parallelise Window Computation? Problem: Window semantics affect system throughput and latency – Pick task size based on window size? On window slide? size: 4 sec 6 5 4 3 2 1 slide: 1 sec T 1 T 2 Compose window results from partial results T 3 T 4 T 5 Slide-based parallelism limits GPU parallelism LSDS Large-Scale Distributed Systems Group 7

SABER’s Window Processing Model Idea: Decouple task size from window size/slide – Pick based on underlying hardware features • e.g. PCIe throughput T 3 T 2 T 1 5 tuples/task 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 size: 7 rows w 1 slide: 2 rows w 2 w 3 w 4 w 5 – Task contains one or more window fragments • E.g. closing/pending/opening windows in T 2 LSDS Large-Scale Distributed Systems Group 8

Merging Window Fragment Results Idea: Decouple task size from window size/slide – Assemble window fragment results – Output them in correct order Worker A: T 1 w 1 w 2 w 2 result w 3 w 1 result w 1 w 2 w 3 Slot 2 Slot 1 Output result w 4 Result Stage w 5 circular buffer Worker B : T 2 Worker A stores T 1 results, merges window fragment results and forwards complete windows downstream LSDS Large-Scale Distributed Systems Group 9

SABER Window-Based Hybrid Stream Processing Engine for CPUs & GPUs Challenges & Contributions 1. How to parallelise sliding-window queries across CPU and GPU? Decouple query semantics from system parameters 2. When to use CPU or GPU for a CQL operator? Hybrid processing: offload tasks to both CPU and GPU 3. How to reduce GPU data movement costs? Amortise data movement delays with deep pipelining LSDS Large-Scale Distributed Systems Group 10

SABER’s Hybrid Stream Processing Model Idea: Enable tasks to run on both processors – Scheduler assigns tasks to idle processors Past behavior: Task Queue: CPU CPU GPU comes first T 10 T 9 T 8 T 7 T 6 T 5 T 4 T 3 T 2 T 1 GPU Q A 3 ms 2 ms Q A Q A Q B Q A Q B Q B Q B Q B Q A Q B Q B 3 ms 1 ms 0 3 6 9 12 First-Come First-Served T 1 T 4 T 8 T 10 CPU GPU T 2 T 3 T 5 T 6 T 7 T 9 Idle FCFS ignores effectiveness of processor for given task LSDS Large-Scale Distributed Systems Group 11

Heterogeneous Look-Ahead Scheduler (HLS) Idea: Idle processor skips tasks that could be executed faster by another processor – Decision based on observed query task throughput Past behavior: Task Queue: CPU CPU GPU comes first T 10 T 9 T 8 T 7 T 6 T 5 T 4 T 3 T 2 T 1 GPU Q A 3 ms 2 ms Q A Q A Q B Q A Q B Q B Q B Q B Q A Q B Q B 3 ms 1 ms 0 0 3 3 6 6 9 9 12 12 HLS T 3 T 7 T 10 CPU T 1 T 2 T 4 T 5 T 6 T 8 T 9 GPU HLS fully utilises processors LSDS Large-Scale Distributed Systems Group 12

The SABER Architecture Java C & OpenCL 15K LOC 4K LOC T 1 op T 2 T 1 T 2 CPU T 2 T 1 GPU op α α Dispatching stage Scheduling & execution stage Result stage Dispatch Dequeue tasks Merge & forward partial fixed-size tasks based on HLS window results LSDS Large-Scale Distributed Systems Group 13

Is Hybrid Stream Processing Effective? Different queries result in different CPU:GPU processing split that is hard to predict offline select group-by avg group-by cnt group-by avg aggr avg group-by avg select group-by cnt Throughput (10 6 tuples/s) 50 SABER (CPU contrib.) Intel Xeon 2.6 GHz 40 16 cores 30 SABER (GPU contrib.) 20 NVIDIA Quadro K5200 10 2,304 cores 0 CM2 SG1 SG2 LRB3 LRB4 Cluster Mgmt. Smart Grid LRB LSDS Large-Scale Distributed Systems Group 14

Is Hybrid Stream Processing Effective? Aggregate throughput of CPU and GPU always higher than its counterparts GPU is faster CPU is faster Not additive due to queue contention 6 0.3 SABER (CPU only) Throughput (GB/s) SABER (GPU only) 4 0.2 SABER 2 0.1 0 0 Aggregation Group-by θ -join LSDS Large-Scale Distributed Systems Group 15

Is Heterogeneous Look-Ahead Scheduling Effective? W 1 W 2 5 project project FCFS Throughput (GB/s) 4 Static group-by cnt aggr sum 3 HLS 2 CPU GPU CPU GPU 1 π 5x 1.5x π 0 α 1.5x γ 6x W 1 W 2 W1 W2 W 1 benefits from static scheduling but HLS fully utilises GPU: – GPU also runs ~%1 of of group-by tasks W 2 benefits from FCFS but HLS better utilises GPU: – HLS CPU:GPU split is 1:2.5 for project and 1:0.5 for α ggr LSDS Large-Scale Distributed Systems Group 16

Summary Window processing model Decouples query semantics from system parameters Hybrid stream processing model Can achieve aggregate throughput of heterogeneous processors Hybrid Look-ahead Scheduling (HLS) Allows use of both CPU and GPU opportunistically for arbitrary workloads Thank you! Any Questions? Alexandros Koliousis github.com/lsds/saber LSDS Large-Scale Distributed Systems Group 17

Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with - PowerPoint PPT Presentation

Window-Based Hybrid Stream Processing for Heterogeneous Architectures github.com/lsds/saber Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa & Peter

Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with Matthias Weidlich, Raul Castro

Crossbow: Scaling Deep Learning on Multi-GPU Servers Peter Pietzuch with Alexandros Koliousis,

of Antioch 100 BC Nico Ordozgoiti Venus de Milo 2015 Alexandros of Antioch 100 BC Nico

Transactions 09 Transactions Alexandros Labrinidis University of Pittsburgh 2 Alexandros

A Pixel Format Guide to the galaxy Alexandros Frantzis alexandros.frantzis@collabora.com

AV Access Randomized Trial: Outcomes Through Six Months Alexandros Mallios, MD Institut

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

Recommender Systems Alexandros Karatzoglou Research Scientist @ Telefonica Research, Barcelona

Posterior Arthroscopic Tibiotalocalcaneal Fusion: Surgical Technique and early resul ts Alexandros

PROVING AND DISCOVERING WITH JAVA GEMETRY EXPERT (JGEX) Kostas Georgios-Alexandros, Bampatsias

Assistance to agricultural development in Africa: Decline and Reversal Alexandros Sarris

Determinants of Index Insurance Uptake Alexandros Sarris Professor, department of Economics,

OF COMPANIES WATER AND ENERGY TECHNOLOGIES Focused on biogas Dr.-Ing. Alexandros D. Yfantis

Adding Unusual Transports to The Serval Project Alexandros Tsiridis & Joseph Hill

Can Social Group-Formation Norms Influence Behavior?: An Experimental Study Alexandros Rigos

in developing countries Alexandros Sarris Emeritus professor of economics, National and

Academic Tribes and Territories Responding diversities of a special kind Pavel Zgaga University

BRAND JOURNALISM: USING CONTENT TO MANAGE & GROW REPUTATION IN A POST TRUTH WORLD October 26

Youth Engagement Team Conference on Ending Homelessness INNOVATIVE INTEGRATION FOR WORKING

WHATS A BRAND? Brands live in peoples minds. They live in the minds of everyone who

Todays Agenda Who We Are The Zeitgeist Understanding Demographic Trends How And

Michael Kubler ZDay 2017 Price of Zero Transition Global Debt $69,621,552,095,568 Global

U CITY PCL 3Q 2018 EARNINGS PRESENTATION 16 NOV 2018 Prepared by Investor Relations Department

Roundtable on Governance & Law: Challenges & Opportunities Philippe Destatte Director

Sambuz

Useful Links

Newsletter

Mail Us

Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with - PowerPoint PPT Presentation

Window-Based Hybrid Stream Processing for Heterogeneous Architectures github.com/lsds/saber Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with Matthias Weidlich, Raul Castro Fernandez, Alexander L. Wolf, Paolo Costa & Peter

Alexandros Koliousis a.koliousis@imperial.ac.uk Joint work with Matthias Weidlich, Raul Castro

Crossbow: Scaling Deep Learning on Multi-GPU Servers Peter Pietzuch with Alexandros Koliousis,

of Antioch 100 BC Nico Ordozgoiti Venus de Milo 2015 Alexandros of Antioch 100 BC Nico

Transactions 09 Transactions Alexandros Labrinidis University of Pittsburgh 2 Alexandros

A Pixel Format Guide to the galaxy Alexandros Frantzis alexandros.frantzis@collabora.com

AV Access Randomized Trial: Outcomes Through Six Months Alexandros Mallios, MD Institut

LOCKING CS 2550 / Spring 2006 Principles of Database Systems 10 Locking Alexandros

Recommender Systems Alexandros Karatzoglou Research Scientist @ Telefonica Research, Barcelona

Posterior Arthroscopic Tibiotalocalcaneal Fusion: Surgical Technique and early resul ts Alexandros

PROVING AND DISCOVERING WITH JAVA GEMETRY EXPERT (JGEX) Kostas Georgios-Alexandros, Bampatsias

Assistance to agricultural development in Africa: Decline and Reversal Alexandros Sarris

Determinants of Index Insurance Uptake Alexandros Sarris Professor, department of Economics,

OF COMPANIES WATER AND ENERGY TECHNOLOGIES Focused on biogas Dr.-Ing. Alexandros D. Yfantis

Adding Unusual Transports to The Serval Project Alexandros Tsiridis &amp; Joseph Hill

Can Social Group-Formation Norms Influence Behavior?: An Experimental Study Alexandros Rigos

in developing countries Alexandros Sarris Emeritus professor of economics, National and

Academic Tribes and Territories Responding diversities of a special kind Pavel Zgaga University

BRAND JOURNALISM: USING CONTENT TO MANAGE &amp; GROW REPUTATION IN A POST TRUTH WORLD October 26

Youth Engagement Team Conference on Ending Homelessness INNOVATIVE INTEGRATION FOR WORKING

WHATS A BRAND? Brands live in peoples minds. They live in the minds of everyone who

Todays Agenda Who We Are The Zeitgeist Understanding Demographic Trends How And

Michael Kubler ZDay 2017 Price of Zero Transition Global Debt $69,621,552,095,568 Global

U CITY PCL 3Q 2018 EARNINGS PRESENTATION 16 NOV 2018 Prepared by Investor Relations Department

Roundtable on Governance &amp; Law: Challenges &amp; Opportunities Philippe Destatte Director

Sambuz

Useful Links

Newsletter

Mail Us

Adding Unusual Transports to The Serval Project Alexandros Tsiridis & Joseph Hill

BRAND JOURNALISM: USING CONTENT TO MANAGE & GROW REPUTATION IN A POST TRUTH WORLD October 26

Roundtable on Governance & Law: Challenges & Opportunities Philippe Destatte Director