Spinning Relations: High-Speed Networks for Distributed Join - PowerPoint PPT Presentation

Spinning Relations: High-Speed Networks for Distributed Join Processing Philip Frey, Romulo Goncalves, Martin Kersten, Jens Teubner

Problem Statement We address a core database problem, but for large problem sizes: Process a join R � θ S (arbitrary join predicate). R and S are large (many gigabytes, even terabytes). Traditional approach: Use a big machine and/or suffer the severe disk I/O bottleneck of block nested loops join. Can do distributed evaluation only for certain θ or certain data distributions (or suffer high network I/O cost). Today: Assume a cluster of commodity machines only. Leverage modern high-speed networks (10 Gb/s and beyond). Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 2 / 11

Modern Networks: High Speed? It is actually very hard to saturate modern ( e.g. , 10 Gb/s) networks. System 1 System 2 underutilized network CPU CPU RAM NIC NIC RAM High CPU demand ◮ Rule of thumb: 1 GHz CPU per 1 Gb/s network throughput (!) Memory bus contention ◮ Data typically has to cross the memory bus three times → ≈ 3 GB/s bus capacity needed for 10 Gb/s network Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 3 / 11

RDMA: Remote Direct Memory Access RDMA -capable network cards (RNICs) can saturate the link using direct data placement (avoid unnecessary bus transfers), OS bypassing (avoid context switches), and TCP offloading (avoid CPU load). System 1 System 2 fully utilized network CPU CPU RAM RNIC RNIC RAM Data is read/written on both ends using intra-host DMA . Asynchronous transfer after work request issued by CPU. Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 4 / 11

Cyclo-Join Idea 1 distribute input S Host H 1 2 join locally RDMA Host H 2 R 3 R 3 R 3 R 3 RDMA 3 rotate R 4 S 1 R 4 R 4 S 2 R 2 R 4 R 2 RDMA R 2 R 2 S 0 Host H 0 RDMA S 3 R 5 input R Host H 3 R 5 R 5 R 1 R 5 S 5 R 1 R 1 S 4 RDMA R 1 R 0 RDMA R 0 R 0 R 0 Host H 5 Host H 4 RDMA: join and rotate Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 5 / 11

Analysis Cyclo-join has similarities to block nested loops join . Cut input data into blocks R i and S j . Join all combinations R i � S j in memory . As such, cyclo-join can be paired with any in-memory join algorithm , can be used to distribute the processing of any join predicate . Cyclo-join fits into a “cloud-style” environment: additional nodes can be hooked in as needed, arbitrary assignment host ↔ task, cyclo-join consumes and produces distributed tables → n -way joins. Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 6 / 11

Cyclo-Join Put Into Practice We implemented a prototype of cyclo-join : four processing nodes ◮ Intel Xeon quad-core 2.33 GHz ◮ 6 GB RAM per node; memory bandwidth: 3.4 GB/s (measured) 10 Gb/s Ethernet ◮ Chelsio T3 RDMA-enabled network cards ◮ Nortel 10 Gb/s Ethernet switch in-memory hash join ◮ hash phase physically re-organizes data (on each node) → better cache efficiency during join phase ◮ I/O complexity: O ( | R | + | S | ) Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 7 / 11

Experiments Experiment 1: Distribute evaluation of a join where | R | = | S | = 1 . 8 GB. 80 hash buildup synchronization wall-clock time [s] join execution 60 MonetDB 40 (single-host) 20 0 1 host 2 hosts 3 hosts 4 hosts 1 . 8 � 1 . 8 1 . 8 � 1 . 8 1 . 8 � 1 . 8 1 . 8 � 1 . 8 # hosts / sizes of S � R [GB] Main benefit: reduced hash buildup time . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 8 / 11

Experiments Experiment 2: Scale up and join larger S (hash buildup ignored here) . 4 0.26 synchronization join execution wall-clock time [s] 3.54 0.58 3 0.80 2.83 2 2.08 1.35 1 0 1 host 2 hosts 3 hosts 4 hosts 1 . 8 � 1 . 8 3 . 6 � 1 . 8 5 . 4 � 1 . 8 7 . 2 � 1 . 8 # hosts / sizes of S � R [GB] � System scales like a machine with large RAM would. � CPUs have to wait for network transfers (“synchronization”). Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 9 / 11

Memory Transfers Need to wait for network: Does that mean RDMA doesn’t work at all? 1 . 8 GB 10 Gb/s = 1 . 44 s time memory bandwidth [GB/s] 5 0.58 3 2.83 RDMA trans. 4 2 bus bandwidth 1 3 0 2 3 hosts 5 . 4 � 1 . 8 join R i � S j 0.58 s 1 2.83 s time 0 0 1 2 3 4 The culprit is the local memory bus ! If RDMA hadn’t saved us some bus transfers, this would be worse . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 10 / 11

Conclusions I demonstrated cyclo-join : ring topology to process large joins , use distributed memory to process arbitrary joins , hardware acceleration via RDMA is crucial: ◮ reduce CPU load and memory bus contention . Cyclo-join is part of the Data Cyclotron project: support for more local join algorithms , process full queries in a merry-go-round setup . Jens Teubner · Spinning Relations: High-Speed Networks for Distributed Join Processing 11 / 11

Spinning Relations: High-Speed Networks for Distributed Join - PowerPoint PPT Presentation

Spinning Relations: High-Speed Networks for Distributed Join Processing Philip Frey, Romulo Goncalves, Martin Kersten, Jens Teubner Problem Statement We address a core database problem, but for large problem sizes: Process a join R S

Allocation of Spinning Reserve Costs 14 May 2014 Background Spinning Reserve is the capacity of

notes from the Flax Presentation by Bruce Engebertson April 29, 2017 Bruce began spinning wool at

Spinning black hole binaries for ET: SNR estimates and parameter estimation calculations Eliu

Asymptotic results for highly anisotropic spinning disks Ciprian D. Coman University of Glasgow,

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

High-speed Serial Interface Lect. 1 Introduction 1 High-Speed Circuits and Systems Lab.,

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

Chlorambucil plus anti-CD20 MoAb Peter Hillmen peter.hillmen@nhs.net St Jamess University

Financial Disclosure A Case-based Approach to Caring I have the following financial interests or

Lecture 20 Next lecture: Design Patterns 1 Structural patterns (controlling heap layout)

Nonlinear Shi, Registers: A Survey and Open Problems Tor

Non-holonomic Planning Jane Li Assistant Professor Mechanical Engineering & Robotics

Does Your Code Measure Up? By: Adam Culp Twitter: @ adamculp https://joind.in/ 13300 Does Your

Achieving Secure Contjnuous Delivery Chris Rutuer / Lucian Corlan July 2016 Problem statement -

A Creative Movement Workshop for Early Childhood Educators By Faye Lim & Bernice Lee We are

Spinning Relations: High-Speed Networks for Distributed Join - PowerPoint PPT Presentation

Spinning Relations: High-Speed Networks for Distributed Join Processing Philip Frey, Romulo Goncalves, Martin Kersten, Jens Teubner Problem Statement We address a core database problem, but for large problem sizes: Process a join R S

Allocation of Spinning Reserve Costs 14 May 2014 Background Spinning Reserve is the capacity of

notes from the Flax Presentation by Bruce Engebertson April 29, 2017 Bruce began spinning wool at

Spinning black hole binaries for ET: SNR estimates and parameter estimation calculations Eliu

Asymptotic results for highly anisotropic spinning disks Ciprian D. Coman University of Glasgow,

Cedar Rapids RLR &amp; Speed Des Moines RLR &amp; Speed

Speed, speed, speed D. J. Bernstein University of Illinois at Chicago; Ruhr University Bochum

SPEED OF THOUGHT SPEED OF THOUGHT 120m/s SPEED OF THOUGHT COMMUNICATIVE The Artist is Absent:

High-speed Serial Interface Lect. 1 Introduction 1 High-Speed Circuits and Systems Lab.,

Networks and Distributed Systems Olaf Landsiedel Networks and Distributed Systems What is

Distributed Systems (ICE 601) Distributed Transactions Dongman Lee ICU Class Overview

Unleashing Talent in A Distributed Workforce C O R E N E T 2 0 2 0 HACKATHON: DISTRIBUTED W O R K

Distributed Databases Chapter 16 1 What is a Distributed Database? Database whose relations

POWERED STARTUPS Speed@BDD Presentation July 2017 SPEED@BDD IN A NUTSHELL Speed@BDD is a

Speed Bump? http://www.skepticalscience.com/graphics.php?g=47 Speed Bump?

MCC Speed Management Policy Agenda Purpose of the Speed Management Policy Results of

Lab 9. Speed Control of a D.C. motor Sensing Motor Speed (Tachometer Frequency Method) Motor

Chlorambucil plus anti-CD20 MoAb Peter Hillmen peter.hillmen@nhs.net St Jamess University

Financial Disclosure A Case-based Approach to Caring I have the following financial interests or

Lecture 20 Next lecture: Design Patterns 1 Structural patterns (controlling heap layout)

Nonlinear Shi, Registers: A Survey and Open Problems Tor

Non-holonomic Planning Jane Li Assistant Professor Mechanical Engineering &amp; Robotics

Does Your Code Measure Up? By: Adam Culp Twitter: @ adamculp https://joind.in/ 13300 Does Your

Achieving Secure Contjnuous Delivery Chris Rutuer / Lucian Corlan July 2016 Problem statement -

A Creative Movement Workshop for Early Childhood Educators By Faye Lim &amp; Bernice Lee We are

Cedar Rapids RLR & Speed Des Moines RLR & Speed

Non-holonomic Planning Jane Li Assistant Professor Mechanical Engineering & Robotics

A Creative Movement Workshop for Early Childhood Educators By Faye Lim & Bernice Lee We are