Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , - PowerPoint PPT Presentation

Leverage GPU Acceleration for Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , Madhusudanan Kandasamy * , Gita Koblents - + IBM Research – Tokyo * IBM India - IBM Canada 1

Spark is Becoming Popular for Parallel Computing ▪ Write a Scala/Java/Python program using parallel functions with distributed in-memory data structures on a cluster – Can call APIs in domain specific libraries (e.g. machine learning) val dataset = …((x1, y1), (x2, y2), …)… // input points val model = KMeans.fit(dataset) // train k-means model ... val vecs = model.clusterCenters.map(vec => (vec(0)*2, vec(1)*2)) // x2 to all centers Driver MLlib Spark SparkSQL GraphX (machine Streaming results (SQL) (graph) learning) (real-time) tasks Executor Spark Runtime (written in Java and Scala) http://spark.apache.org/ Executor Data In-memory Data Latest version is 2.1.1 Executor Executor released in 2017/4 Java virtual machine Data Data Data Source (HDFS, DB, File, etc.) Cluster of Machines 2 Leverage GPU Acceleration for your Program on Apache Spark

Spark is Becoming a Friend of GPUs 3 Leverage GPU Acceleration for your Program on Apache Spark

What You Will Learn from This Talk (1/2) ▪ How to easily accelerate your code using GPUs on a cluster – Hand-tuned GPU program in CUDA _global_ void yourGPUKernal(double *in, double *out, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; out[i] = in[i] * PI; } val mapFunction = new CUDAFunction (…, “ yourGPUKernel.ptx ”) val output = data.mapExtFunc (…, mapFunction) – Spark program with automatic translation to GPU code val output = data.map(p => Point(p.x * 2, p.y * 2)) 4 Leverage GPU Acceleration for your Program on Apache Spark

What You Will Learn from This Talk (2/2) ▪ How to easily accelerate your code using GPUs on a cluster – Hand-tuned GPU program in CUDA – Spark program ▪ Achieve good performance results using one P100 card over 160-CPU-thread parallel execution on POWER8 – 3.6x for CUDA-based mini-batch logistic regression – 1.7x for Spark vector multiplication ▪ Address ease of programming for non-experts, not address the state-of-the- art performance by Ninja programmers 5 Leverage GPU Acceleration for your Program on Apache Spark

Comparison of Two Approaches ▪ Non-expert programmers can use GPU without writing GPU code GPU program Spark program Prepare highly-optimized algorithms for GPU in Write more generic code in an Use case domain specific library (e.g. application MLlib) GPU code Hand-tuned by programmer Automatically generated How to write GPU code CUDA Spark code (Scala/Java) Changing Spark and Java Spark Enhancement Plug-in compiler GPU memory management, data copy between CPU and Automatically performed Automatically performed GPU, data conversion between Spark and GPU 6 Leverage GPU Acceleration for your Program on Apache Spark

Outline ▪ Goal ▪ Motivation ▪ How to Execute Your GPU Program on Spark ▪ How to Execute Your Spark Program on GPU ▪ Performance Evaluation ▪ Conclusion 7 Leverage GPU Acceleration for your Program on Apache Spark

Why We Want to Use Spark for Parallel Programming ▪ High productivity – Ease of writing a parallel programming on a cluster – At Scale ▪ Write once, run any cluster – Rich set of domain specific libraries ▪ Computation-intensive applications in non-HPC area – Data analytics (e.g. The Weather Company) – Log analysis (e.g. Cable TV company) – Natural language processing (e.g. Real-time Sentiment Analysis) 8 Leverage GPU Acceleration for your Program on Apache Spark

Programmability of CUDA vs. Spark on a node ▪ CUDA requires programmers to explicitly write operations for – managing device memories void fooCUDA(N, float *A, float *B, int N) { int sizeN = N * sizeof(float); – copying data cudaMalloc(&d_A, sizeN); cudaMalloc(&d_B, sizeN); cudaMemcpy(d_A, A, sizeN, HostToDevice); between CPU and GPU GPU<<<N, 1>>>(d_A, d_B, N); cudaMemcpy(B, d_B, sizeN, DeviceToHost); – expressing parallelism cudaFree(d_B); cudaFree(d_A); } // code for GPU __global__ void GPU(float* d_a, float* d_b, int n) { int i = threadIdx.x; if (n <= i) return; d_b[i] = d_a[i] * 2.0; } ▪ Spark enables programmers to just focus on val datasetA = ... val datasetB = datasetA.map(e => e * 2.0) – expressing parallelism 9 Leverage GPU Acceleration for your Program on Apache Spark

Hand-tuned your GPU Program in a Nutshell ▪ This is available at https://github.com/IBMSparkGPU/GPUEnabler – Blog entry: http://spark.tc/gpu-acceleration-on-apache-spark-2/ ▪ It is implemented as Spark package – Can be drop-in into your version of Apache Spark ▪ The Spark package accepts PTX (an assembly language file that can be generated by a CUDA file) as GPU program – Convert data between Spark and GPU, manage GPU memory, and copy data between GPU and CPU ▪ The Spark package launches GPU program from map() or reduce() parallel function 11 Leverage GPU Acceleration for your Program on Apache Spark

How to Write and Execute Your GPU Program 1. Write a GPU program and create a PTX __global__ void multiplyBy2(int *inx, int *iny, int *outx, int *outy, long size) { long i = threadIdx.x + blockIdx.x * blockDim.x; if (size <= i) return; outx[i] = inx[i] * 2; outy[i] = iny[i] * 2; } $ nvcc example.cu -ptx 2. Write a Spark program case class Point(x: Int, y: Int) Object SparkExample { val mapFunction = new CUDAFunction( "multiplyBy2", Array("this.x “, “ this.y ”), Array(" this.x “, “ this.y ”), “ example.ptx ”) val output = sc.parallelize(1 to 65536, 24).map(e => Point(e, -e)) .cache .mapExtFunc(p => Point(p.x*2, p.y*2), mapFunction).show } 3. Compile and submit them $ mvn package $ bin/spark-submit --class SparkExample SparkExample.jar --packages com.ibm:gpu-enabler_2.11:1.0.0 12 Leverage GPU Acceleration for your Program on Apache Spark

How Your GPU Program is Executed Point x y ▪ Optimize data layout for GPU – Columnar oriented layout 1 -1 2 -2 3 -3 4 -4 ... .mapExtFunc( Optimize layout p => Point(p.x*2, p.y*2), mapFunction) 1 2 -1 -2 3 4 -3 -4 CPU ... ▪ Copy data Data copy kernel between CPU and GPU CUDAcore 1 2 -1 -2 3 4 -3 -4 __global__ void multiplyBy2(…) { * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = … outx[i] = inx[i] * 2; GPU outy[i] = iny[i] * 2; 2 4 -2 -4 6 8 -6 -8 } ▪ Exploit parallelism Data copy – among GPU kernels 2 4 -2 -4 6 8 -6 -8 – among CUDA cores Deoptimize layout 2 -2 4 -4 6 -6 8 -8 13 Leverage GPU Acceleration for your Program on Apache Spark

Spark Program in a Nutshell ▪ This is on-going project – Blog entry: http://spark.tc/simd-and-gpu/ ▪ We are enhancing Spark by modifying Spark source code – Also apply changes to Java Just-in-time compiler ▪ The enhanced Spark accepts an expression in map() for now ▪ The enhanced Spark handles low-level operations for GPU – Generate GPU code from Spark program – Convert data between Spark and GPU, manage GPU memory, and copy data between GPU and CPU 15 Leverage GPU Acceleration for your Program on Apache Spark

How Scala Code is Executed ▪ Already optimized data layout for GPU Point x y – Modified Spark to use columnar oriented layout ... .map(p => Point(p.x*2, p.y*2)) ... 1 2 -1 -2 3 4 -3 -4 ▪ Generate GPU code CPU Data copy from Scala code kernel CUDAcore ▪ Copy data between CPU and GPU 1 2 -1 -2 3 4 -3 -4 * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = * 2 = __global__ void multiplyBy2(…) { ▪ Exploit parallelism … 2 4 -2 -4 6 8 -6 -8 outx[i] = inx[i] * 2; GPU outy[i] = iny[i] * 2; – among kernels } Data copy – among CUDA cores 2 4 -2 -4 6 8 -6 -8 16 Leverage GPU Acceleration for your Program on Apache Spark

Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , - PowerPoint PPT Presentation

Leverage GPU Acceleration for Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , Madhusudanan Kandasamy * , Gita Koblents - + IBM Research Tokyo * IBM India - IBM Canada 1 Spark is Becoming Popular for Parallel Computing Write a

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

OFBiz CRM, presentation, functionalities Nicolas Malin, Nov. 2012 Agenda CRM and functional

Boosters Programs Apache Junction High Schools Boosters programs are dictated by policy KJA

Presentation and Installation 1 Requirements Presentation Architecture Installation Summary

Johnzon - Apaches Upcoming JSON Library Hendrik Saly, codecentric AG About the Apache

Apache Hadoop YARN: The Next- generation Distributed Operating System Zhijie Shen & Jian He

Real-time Pattern Detection in IP Flow Data using Apache Spark International Symposium on

Analyzing Weather Data with Apache Spark Jeremie Juban Tom Kunicki Introduction Who we

Investor presentation September 2014 0 Disclaimer THIS DOCUMENT IS CONFIDENTIAL This document

Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , - PowerPoint PPT Presentation

Leverage GPU Acceleration for Your Program on Apache Spark GTC 2017 Kazuaki Ishizaki + , Madhusudanan Kandasamy * , Gita Koblents - + IBM Research Tokyo * IBM India - IBM Canada 1 Spark is Becoming Popular for Parallel Computing Write a

Apache Spark: A Unified Engine for Big Data Processing Presented by: Huanyi Chen Apache Spark:

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

Streaming OODT: Combining Apache Spark's Power with Apache OODT Michael Starch NASA

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

Unified Big Data nified Big Data Pr Processing ocessing with with Apache Spark pache Spark

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Cypher for Apache Spark Graph processing workloads on OLAP and OLTP Mats Rydberg

Distributed Deep Learning Inference using Apache MXNet* and Apache Spark Naveen Swamy Amazon AI

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Apache Spark CS240A Winter 2016. T Yang Some of them are based on P. Wendells Spark slides

OFBiz CRM, presentation, functionalities Nicolas Malin, Nov. 2012 Agenda CRM and functional

Boosters Programs Apache Junction High Schools Boosters programs are dictated by policy KJA

Presentation and Installation 1 Requirements Presentation Architecture Installation Summary

Johnzon - Apaches Upcoming JSON Library Hendrik Saly, codecentric AG About the Apache

Apache Hadoop YARN: The Next- generation Distributed Operating System Zhijie Shen &amp; Jian He

Real-time Pattern Detection in IP Flow Data using Apache Spark International Symposium on

Analyzing Weather Data with Apache Spark Jeremie Juban Tom Kunicki Introduction Who we

Investor presentation September 2014 0 Disclaimer THIS DOCUMENT IS CONFIDENTIAL This document

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Apache Hadoop YARN: The Next- generation Distributed Operating System Zhijie Shen & Jian He