GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 - PowerPoint PPT Presentation

GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 University of Waterloo

Outline ● Motivation ● GPU calculation model ● GPUEnabler ● Spark MLlib Algorithms for GPU computation ● Implementation using GPUEnabler ● Performance evaluation ● Current & future work

Motivation ● Problem ○ Computation heavy spark machine learning applications ○ CPU computation bottleneck ● Goal ○ Accelerate Spark MLlib ○ Leverage high performance GPUs ○ Second dimension of distribution ○ Without change of user programs

GPU Calculation Model ● Five steps for GPU programming Shared Local ○ Allocate GPU device memory Memory Memory ○ Copy data on CPU main memory to GPU Thread Thread block device memory ○ Launch a GPU kernel to be executed on in Global Memory parallel ○ Copy back data from GPU memory to main All threads memory GPU ○ Free GPU memory CPU Main Memory

GPU Calculation Model BlockDim.x = N Thread Thread Thread Block 0 Thread …... 0 1 2 N Thread Thread Thread Thread Block 1 …... 0 1 2 N Global Thread Thread Thread Thread Block 2 …... Memory 0 1 2 N …... …… …... …... …... Thread Thread Thread Thread …... Block M 0 1 2 N int idx = threadIdx.x + blockIdx.x * blockDim.x Data Parallelism: Single Instruction, Multiple Data

GPUEnabler ● Offload specific tasks (GPU kernel) to GPU ● Get the data into a format that GPU can consume ● Read data from local memory to GPU memory and vice versa ● Applications can work in a heterogenous environment mapExtFunc() Two One reduceExtFunc() Transformation Action API APIs cacheGpu()

Algorithms Suitable for GPU Computation ● Large dataset ● Complex mathematical computation ● Low data inter-dependency ● Low dependency between cluster nodes

Spark MLlib Algorithms for GPU Acceleration ● Naive Bayes ○ Mainly count and aggregation ○ Not enough mathematical computation ● Decision tree learning ○ Mathematical computation (Information gain) hidden deeply under nested map functions ● LBFGS ○ Calculation uses external numerical processing library Breeze ● SVMs and linear regression ○ Not enough mathematical computation ● Logistic regression ○ Candidate for GPU acceleration

Implementation using GPUEnabler ● Write CUDA kernel ● Create and broadcast CUDAFunction objects ○ Information about CUDA kernel, input/output data type, constant arguments, etc. ● Call mapExtFunc and reduceExtFunc instead of map and reduce ○ Execution of CUDA kernel in parallel

CUDA Kernel

GPUEnabler APIs

Performance Evaluation ● Use logistic regression for classification ● GPU: Nvidia Tesla K80 # of data points # of features each data # of machine in cluster Use GPU Runtime (ms) point 1000000 10 1 No 1182 1000000 10 1 Yes 2826 1000000 10 2 No 1276 1000000 10 2 Yes 3494 2000000 15 1 No 6511 2000000 15 1 Yes 5938 2000000 15 2 No 5760 2000000 15 2 Yes 5639

Our Work ● Setup cluster with GPU, CUDA, Spark, HDFS and GPUEnabler ● Learn Spark MLlib algorithms ● Study Spark MLlib & GPUEnabler source code ● Integrate GPUEnabler & Spark ● Implement GPU Enabled MLlib algorithms ● Deploy and run GPU code on clusters ● Performance evaluation ● Future work: ○ Implement and evaluate more algorithms ○ Investigate GPU computation bottleneck

Thank you Questions?

GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 - PowerPoint PPT Presentation

GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 University of Waterloo Outline Motivation GPU calculation model GPUEnabler Spark MLlib Algorithms for GPU computation Implementation using GPUEnabler

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Integrating Spark MLlib into Weka Mark Hall Pentaho Data Mining Architect, Hitachi Vantara

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Leveraging the GPU on Spark Tobias Polzer, Friedrich-Alexander University Erlangen-Nuremberg

MLlib: Scalable Machine Learning on Spark Xiangrui Meng Collaborators: Ameet Talwalkar, Evan

Spark Machine Learning Amir H. Payberah amir@sics.se SICS Swedish ICT June 30, 2016 Amir H.

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Chris Kirby Key Account Manager November2020 Agenda Why you should consider Kensington for

Far Site Update and Planning Elaine McCluskey, LBNF Project Manager LBNC Review 23 June 2017

Order in Datalog with Applications to Declarative Output Stefan Brass University of Halle,

Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining Complexity

Chapter 3 Objects and Classes Object Oriented Object Oriented Programming Programming OOP

Non-Evil XSS with Drupal & EasyXDM Stephen Barker, Digital Frontiers Media @digitalfrontier

GPU-Based Large-Scale Scientific Visualization Johanna Beyer, Harvard University Markus

Entropy of non rectangular LEGO bricks Jon Wilson Ferris State University joint work with David

GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 - PowerPoint PPT Presentation

GPU Enabled Spark MLlib Lingyun Li & Lei Yao CS 848 University of Waterloo Outline Motivation GPU calculation model GPUEnabler Spark MLlib Algorithms for GPU computation Implementation using GPUEnabler

Big Data Meets Machine Learning Apache Spark MLlib 1 MLlib Spark MLlib Graphx

COMP9313: Big Data Management Classification and PySpark MLlib PySpark MLlib MLlib is

Distributing Matrix Computations with Spark MLlib Reza Zadeh A General Platform Standard libraries

Overview of PySpark MLlib BIG DATA F UN DAMEN TALS W ITH P YS PARK Upendra Devisetty Science

Spark Code Camp Discover Spark Streaming &amp; Spark SQL Project Overview Focus on Spark

Integrating Spark MLlib into Weka Mark Hall Pentaho Data Mining Architect, Hitachi Vantara

Large-Scale Data Engineering Spark and MLLIB event.cwi.nl/lsde OVERVIEW OF SPARK

Intr Intro o to Spark to Spark and Spark and Spark SQL SQL AMP Camp 2014 Michael Armbrust -

High Integrity Ada with SPARK Praxis Critical Systems 1 SPARK and the SPARK Examiner What is

Leveraging the GPU on Spark Tobias Polzer, Friedrich-Alexander University Erlangen-Nuremberg

MLlib: Scalable Machine Learning on Spark Xiangrui Meng Collaborators: Ameet Talwalkar, Evan

Spark Machine Learning Amir H. Payberah amir@sics.se SICS Swedish ICT June 30, 2016 Amir H.

Flex 4 - Spark Containers Ryan Frishberg Software Consultant, Lab49 http://www.frishy.com Spark

Spark starts here. Spark New Zealand Annual Results 2014 Investor Presentation Spark is more

SPARK NEW ZEALAND ANNUAL MEETING 2015 Spark New Zealand 2015 Spark New Zealand 2015 2 Order of

What Information SPARK Collects, and Why What Information SPARK Collects, and Why LeeAnne Green

Chris Kirby Key Account Manager November2020 Agenda Why you should consider Kensington for

Far Site Update and Planning Elaine McCluskey, LBNF Project Manager LBNC Review 23 June 2017

Order in Datalog with Applications to Declarative Output Stefan Brass University of Halle,

Generalized Matrix Factorizations as a Unifying Framework for Pattern Set Mining Complexity

Chapter 3 Objects and Classes Object Oriented Object Oriented Programming Programming OOP

Non-Evil XSS with Drupal &amp; EasyXDM Stephen Barker, Digital Frontiers Media @digitalfrontier

GPU-Based Large-Scale Scientific Visualization Johanna Beyer, Harvard University Markus

Entropy of non rectangular LEGO bricks Jon Wilson Ferris State University joint work with David

Spark Code Camp Discover Spark Streaming & Spark SQL Project Overview Focus on Spark

Non-Evil XSS with Drupal & EasyXDM Stephen Barker, Digital Frontiers Media @digitalfrontier