Data Processing at the Speed of 100 Gbps using Apache Crail - PowerPoint PPT Presentation

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research

Apache Crail (crail.apache.org)

Ephemeral Data HDFS, Input data S3 Map-reduce job Broadcast Map Shuffle Reduce HDFS, Output data S3

Ephemeral Data HDFS, Input data S3 Broadcast Apache Crail Map Shuffle Reduce HDFS, Output data S3

Ephemeral Data HDFS, Input data S3 Broadcast Apache Crail Map Shuffle Reduce HDFS, Intermediate S3 data HDFS, S3

Ephemeral Data ML pre-processing normalized ML training (map-reduce job) images (Tensorflow job) Input data HDFS, HDFS, HDFS, S3 S3 S3 Apache Crail

Ephemeral Data ML pre-processing normalized ML training (map-reduce job) images (Tensorflow job) Input data HDFS, HDFS, HDFS, HDFS, S3 S3 S3 S3 Apache Crail

Ephemeral Data ML pre-processing normalized ML training (map-reduce job) images (Tensorflow job) Input data HDFS, HDFS, S3 S3 Apache Crail

Why/when to use Crail

Why/when to use Crail No Crail needed 100MB/s 10ms 10Gb/s 20us

Why/when to use Crail 10GB/s 10us 200Gb/s 1us No 100x Crail Crail needed land 100MB/s 10ms 10Gb/s 20us

Why/when to use Crail 10GB/s 10us 200Gb/s 1us No 100x Crail Crail needed land Throughput (Gbit/s) 100 100MB/s 10ms 88.3s Spark/Crail 80 hardware limit 10Gb/s Terasort Spark/Vanilla 60 20us 12.8 TB data 40 128 nodes 527.6s 20 0 0 100 200 300 400 500 Elapsed time (seconds)

Performance Challenge Sorting Application Sorter Serializer Data Processing Framework sockets filesystem Netty TCP/IP block layer JVM Ethernet iSCSI NIC SSD

Performance Challenge Process chunk In reduce task Sorting Application Sorter Serializer Data Processing Framework sockets filesystem Netty TCP/IP block layer JVM Ethernet iSCSI NIC SSD Fetch chunk HotNets’16 Over the network

Performance Challenge Sorting Application Sorter Serializer Data Processing Framework sockets filesystem Netty TCP/IP block layer JVM Ethernet iSCSI NIC SSD HotNets’16

Performance Challenge software overhead are spread over the entire stack Sorting Application Sorter Serializer Data Processing Framework sockets filesystem Netty TCP/IP block layer JVM Ethernet iSCSI NIC SSD HotNets’16

Crail Overview Multiple interfaces Multiple storage backends (pluggable, open interface)

Crail Overview Multiple interfaces Multiple storage backends (pluggable, open interface) primary high-performance storage backends

Crail Architecture & API MultiFile

Crail Architecture & API optimized MultiFile for shuffle data key-value semantics append-only file

Crail Architecture & API Java: MultiFile C++:

Crail Architecture & API Java: MultiFile Node type C++:

Crail Architecture & API Java: MultiFile non-blocking & asynchronous C++:

Where does the performance come from?

User-Level I/O: Metadata 1 2 1 2 Crail client library

User-Level I/O: Metadata 1 2 1 2 Crail client library No threads No context switches

User-Level I/O: Data 1 2 2 1

zero-copy, User-Level I/O: Data transfer only data that is requested Application 1 2 2 1

Crail Deployment Modes compute/storage storage flash storage co-located disaggregation disaggregation

YCSB KeyValue Workload GET GET Value size: Value size: 1KB 100KB latency [us] latency [us] Crail offers Get latencies of ~12us and 30us for DRAM and NVM for 100 byte KV pairs Crail offers Get latencies of ~30us and 40us for DRAM and NVM for 1000 byte KV pairs

Spark GroupBy (80M keys, 4K) 100 Throughput (Gbit/s) Spark/ 1 core 80 4 cores Vanilla 8 cores 60 40 20 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Throughput (Gbit/s) 100 Spark/ Elapsed time (seconds) 1 core 80 4 cores Crail 8 cores 60 2x 40 2.5x 5x 20 0 0 10 20 30 40 50 60 70 80 90 100 110 120 Elapsed time (seconds) Spark shuffling via Crail on a single core is 2x faster than vanilla Spark on 8 cores per executor (8 executors)

DRAM & Flash Disaggregation Crail enables disaggregation of temporary data at no cost

DRAM/Flash Tiering 120 Runtime (seconds) Map 100 Vanilla Spark Reduce 80 (100% Memory) 60 40 20 0 100/0 100/0 80/20 60/40 40/60 20/80 0/100 Memory to Flash Ratio Using flash only increases the sorting time by around 48%

Conclusions ● Apache Crail: Fast distributed “tmp” put your #assignedhashtag here by setting the footer in view-header/footer User-level I/O – Storage disaggregation – Memory/flash convergence – ● Applications Intra-job scratch space (shuffle, broadcast, etc.) – Multi-job pipelines – ● Coming soon Native Crail (C++) – Tensorflow-Crail –

Data Processing at the Speed of 100 Gbps using Apache Crail - PowerPoint PPT Presentation

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache Crail (crail.apache.org) Apache Crail (crail.apache.org) Ephemeral Data HDFS, Input data S3 Map-reduce job Broadcast Map Shuffle Reduce

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

10 Gbps (or) 1 Gbps Ethernet Tester PacketExpert 818 West Diamond Avenue - Third Floor,

1 Gbps and 10 Gbps IP WAN Link Emulator - IPLinkSim Single Stream IP WAN Link Emulator 818

1 Gbps and 10 Gbps WAN Emulator IPNetSim Multi Stream IP WAN Emulator 818 West Diamond

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Apex: Next Gen Big Data Analytics Thomas Weise <thw@apache.org> @thweise PMC Chair

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Large-scale Physical Model Tests of Micropile Stabilized Slopes J. Erik Loehr and Andrew

Classic DEVS An Introduction Using PythonPDEVS Yentl Van Tendeloo, Hans Vangheluwe Introduction

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway

Interplanetary Shock Detection and Geomagnetic Storm Evolution JOSH HAGOOD / ROB STEENBURGH

Microcontroller Driven Electroluminescent Display Jamie Buckmann Christopher Stedman Advised

Atomic Layer Deposition Atomic Layer Deposition (ALD) Erwin Kessels w.m.m.kessels@tue.nl

LaserMethane mini (LMm) SA3C32A SA3C50A www.safetyscan.org - engineering@safetyscan.org -

Electroluminescence The result of radiative recombination of electrons and holes in a material.

Data Processing at the Speed of 100 Gbps using Apache Crail - PowerPoint PPT Presentation

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache Crail (crail.apache.org) Apache Crail (crail.apache.org) Ephemeral Data HDFS, Input data S3 Map-reduce job Broadcast Map Shuffle Reduce

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

10 Gbps (or) 1 Gbps Ethernet Tester PacketExpert 818 West Diamond Avenue - Third Floor,

1 Gbps and 10 Gbps IP WAN Link Emulator - IPLinkSim Single Stream IP WAN Link Emulator 818

1 Gbps and 10 Gbps WAN Emulator IPNetSim Multi Stream IP WAN Emulator 818 West Diamond

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Apex: Next Gen Big Data Analytics Thomas Weise &lt;thw@apache.org&gt; @thweise PMC Chair

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Large-scale Physical Model Tests of Micropile Stabilized Slopes J. Erik Loehr and Andrew

Classic DEVS An Introduction Using PythonPDEVS Yentl Van Tendeloo, Hans Vangheluwe Introduction

Achieving Portable Performance for GTC-P with OpenACC on GPU, multi-core CPU, and Sunway

Interplanetary Shock Detection and Geomagnetic Storm Evolution JOSH HAGOOD / ROB STEENBURGH

Microcontroller Driven Electroluminescent Display Jamie Buckmann Christopher Stedman Advised

Atomic Layer Deposition Atomic Layer Deposition (ALD) Erwin Kessels w.m.m.kessels@tue.nl

LaserMethane mini (LMm) SA3C32A SA3C50A www.safetyscan.org - engineering@safetyscan.org -

Electroluminescence The result of radiative recombination of electrons and holes in a material.

Apache Apex: Next Gen Big Data Analytics Thomas Weise <thw@apache.org> @thweise PMC Chair

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb