Improving Spark Performance with Zero-copy Buffer Management and - PowerPoint PPT Presentation

Improving Spark Performance with Zero-copy Buffer Management and RDMA Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China

Latency matters in big data Impala Query Dremel Query [2012] [2010] Hive Query In-memory Spark Query [2009] [2010] Spark Streaming MapReduce Batch Job [2013] [2004] 10 min 100 ms 1 ms 10 sec Job Latencies Big Data: Not only capable , but also interactively [Kay@SOSP13]

Overview of our work • NetSpark: A reliable Spark package that takes advantage of the RDMA over Converged Ethernet (RoCE) fabric • A combination of memory management optimizations for JVM-based applications to take advantage of RDMA more efficiently • Improving latency-sensitive task performance, while staying fully compatible with the off-the-shelf Spark

Background:   Remote Direct Memory Access (RDMA) Lower CPU utilization and lower latency

An over view of NetSpark transfer model Machine B Machine B Machine B Machine A Machine A Machine A Executor Executor Executor Executor Executor Executor JVM heap JVM heap JVM heap JVM heap JVM heap JVM heap Object Object Object Object deserialization serialization serialization serialization Byte Byte Byte JVM o ff -heap JVM o ff -heap JVM o ff -heap Array Array Array JVM o ff -heap JVM o ff -heap JVM o ff -heap Byte Byte User Space User Space User Space Array Array DMA Read DMA Read DMA Write DMA Write RNIC RNIC RNIC Network transfer Network transfer RNIC RNIC RNIC

Zero-copy network transfer Traditional Way Our Way Object Object Serialize JVM Heap JVM Heap Byte Array Network API Serialize (Copy) JVM Off-heap JVM Off-heap Byte Array Byte Array System call DMA READ (Copy) Kernel Space RNIC Byte Array

Implementation: SPARK executors Executor(Spark) Executor(NetSpark) … … Thread Thread Thread Thread Thread Thread 1 2 N 1 2 N SendingConnections SendingConnections BlockManager BlockTransferService( TCP ) BlockManager BlockTransferService( RDMA ) ReceivingConnections ReceivingConnections BufferManager

RDMA buffer management • RDMA require a fixed physical memory address • for Java: off-heap • Significant allocate/de-allocate cost • Need to register to RDMA • High overhead Simple solution: Pre-allocate RDMA buffer space to avoid allocation / register overhead

RDMA Buffer Management (cont’d) • A small number of large-enough fixed-size off-heap buffers • Like the Linux kernel buffer, but @ user space • But … need to copy from heap to off-heap

Serializing directly into the off-heap RDMA buffer • Rewrite Java InputStream and OutputStream to take advantage of the new buffer manager • Details in the paper •

Evaluation: Testbed 1. 3 switches, 34 servers Switch 2. RoCE, 10GE 3 X 40Gb Ethernet Switch 3. Using priority flow control 10Gb Ethernet Sever … … … for RDMA to avoid packets loss Network topology of our testbed

    Evaluation: Experiment Setup Compared four different executor implementation 1. Java NIO max 2. Netty 75 50 3. Naive RDMA 25 min 4. NetSpark   latency (Spark version: 1.5.0)

Group-by performance on small dataset • Spark example • 2.5GB data shuffled About 17% improvement over the naive RDMA

Why do we have an improvement? • CPU block time • Measurements from SPARK log • Following Kay@NSDI15

Group by on larger data - entire reduce stage A larger dataset about 107.3GB   for shuffle ~40% faster over Netty  

PageRank on a large graph Twitter Graph Dataset   [Kwak@www2010] 41million nodes 1.5 billion edges 20% faster than Netty 10% faster than naive RDMA

Conclusion • NetSpark: A reliable Spark package that takes advantage of the RDMA over Converged Ethernet (RoCE) fabric • A combination of memory management optimizations for JVM-based applications to take advantage of RDMA more efficiently • Improving latency-sensitive task performance, while staying fully compatible with the off-the-shelf Spark Wei Xu weixu@tsinghua.edu.cn

Improving Spark Performance with Zero-copy Buffer Management and - PowerPoint PPT Presentation

Improving Spark Performance with Zero-copy Buffer Management and RDMA Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China Latency matters in big data Impala Query Dremel Query [2012]

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Iteratively Improving Spark Application Performance William C. Benton Red Hat, Inc. Forecast

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Storage, Data Organization, and Buffering Walid G. Aref Memory Hierarchy Archival Storage

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

CMPSC 311- Introduction to Systems Programming Module: Strings Professor Patrick McDaniel Fall

Caller Frame Arguments 7+ Return Addr Old %rbp Saved Shared Registers Libraries + Local

Measuring the Annoyance in Streaming Media Caused by Buffers and Interrupts Andrew Roskuski

Screen-Space Triangulation for Interactive Point Rendering Reinhold Preiner Institute of

Enlightenment as Standalone Wayland Compositor Christopher Michael & Stefan Schmidt FOSDEM

Buffer Trees Lars Arge. The Buffer Tree: A New Technique for Optimal I/O Algorithms . In

Improving Spark Performance with Zero-copy Buffer Management and - PowerPoint PPT Presentation

Improving Spark Performance with Zero-copy Buffer Management and RDMA Hu Li, Charley Chen and Wei Xu Institute for Interdisciplinary Information Sciences Tsinghua University, China Latency matters in big data Impala Query Dremel Query [2012]

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Iteratively Improving Spark Application Performance William C. Benton Red Hat, Inc. Forecast

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Spark Technology 1. Spark main objectives 2. RDD concepts and operations 3. SPARK application

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

External buffer Raslan Darawsheh Mellanox External buffer First was introduced by Olivier

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Storage, Data Organization, and Buffering Walid G. Aref Memory Hierarchy Archival Storage

Lab 2: Buffer Overflows Fengwei Zhang SUSTech CS 315 Computer Security 1 Buffer Overflows

CMPSC 311- Introduction to Systems Programming Module: Strings Professor Patrick McDaniel Fall

Caller Frame Arguments 7+ Return Addr Old %rbp Saved Shared Registers Libraries + Local

Measuring the Annoyance in Streaming Media Caused by Buffers and Interrupts Andrew Roskuski

Screen-Space Triangulation for Interactive Point Rendering Reinhold Preiner Institute of

Enlightenment as Standalone Wayland Compositor Christopher Michael &amp; Stefan Schmidt FOSDEM

Buffer Trees Lars Arge. The Buffer Tree: A New Technique for Optimal I/O Algorithms . In

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Enlightenment as Standalone Wayland Compositor Christopher Michael & Stefan Schmidt FOSDEM