Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO - PowerPoint PPT Presentation

Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO JIA, MATEI ZAHARIA, ALEX AIKEN SYSML 2019 PRESENTED BY JULIUS LISCHEID

Existing Parallelisation Approaches (1/2) DATA PARALLELISM MODEL PARALLELISM • Replica of neural network on each device • Disjoint subsets of neural network assigned to devices • Each device processes subset of training data • No parameter synchronisation, but requires • After each iteration, parameters are data transfers between operations synchronised • Works well for compute-heavy operations with few parameters (e.g. convolutions)

Existing Parallelisation Approaches (2/2) EXPERT-DESIGNED STRATEGIES AUTOMATED FRAMEWORKS • A. Krizhevsky. One weird trick for parallelizing • A. Mirhoseini et al. Device Placement convolutional neural networks. CoRR 2014. Optimization with Reinforcement Learning. ICML 2017. • Data parallelism for convolutional layers, model • Reinforment learning for model parallelism parallelism for fully-connected layers • Y. Wu et al. Google’s neural machine translation • Z. Jia et al. Exploring hidden dimensions in system: bridging the gap between human and parallelizing convolutional neural networks. CoRR machine translation. CoRR 2016. 2018. • Data parallelism for compute nodes, model • Dynamic Programming for parallelisation of DNNs parallelism for intra-node computation with linear computation graphs • D. Narayanan et al. PipeDream: generalized pipeline parallelism for DNN training. SOSP 2019. • …

The SOAP Search Space Samples (data parallelism) Operators (model parallelism) Attributes (e.g. pixels) Parameters (≈model parallelism)

Hybrid Parallelism in SOAP Example parallelization strategies for 1D convolution

FlexFlow • Trying out strategies on hardware is expensive due to long iteration times • Execution Optimizer uses simulator instead • Measures operator runtime on hardware • Estimates runtime of parallelisation strategies • Delta simulation algorithm uses incremental updates for acceleration • Execution optimizer explores search space with Markov Chain Monte Carlo algorithm

Evaluation (1/2)

Evaluation (2/2)

Review (1/2) STRENGTHS/AGREEMENTS WEAKNESSES/DISAGREEMENTS • Expands search space for parallelisation • Unclear how much SOAP and execution strategies optimiser contribute to training acceleration • Proposes a way to efficiently explore that • Usefulness of Attribute dimension is search space questionable • Leads to an actual speed-up • More end-to-end performance benchmarks would have been useful

Review (2/2) KEY TAKEAWAYS POTENTIAL IMPACT • Training performance of parallelisation • Usage of other search algorithms to explore strategies can be efficiently and accurately parallelisation search space in simulation predicted • Combination of parallelisation search space • The resulting speed-up allows for the with computation graph substitutions exploration of a wider search space (compare Tim’s presentation next week)

Questions? P A S O

Image Citations Images with beige background retrieved from Jia Zhihao’s SysML 19 talk: https://www.youtube.com/watch?v=81l6kkV-OkE All other images extracted from Z. Jia, M. Zaharia, and A. Aiken: Beyond Data and Model Parallelism for Deep Neural Networks, SYSML, 2019.

Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO - PowerPoint PPT Presentation

Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO JIA, MATEI ZAHARIA, ALEX AIKEN SYSML 2019 PRESENTED BY JULIUS LISCHEID Existing Parallelisation Approaches (1/2) DATA PARALLELISM MODEL PARALLELISM Replica of neural

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Beyond Data and Model Parallelism for Deep Neural Networks Zhihao Jia, Matei Zaharia and Alex

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

dhcd Department of Housing and Community Development Department of Housing and Community

1 Business & Operating Highlights Overview of the Bank 03 Review of Operating Environment

BIOLOGICAL REMOVAL OF HYDROGEN SULPHIDE FROM LANDFILL SITE BIOGAS Rania Ibrahim 1 , Shahram

What the E c onomists Say & What it Me ans to VB 2 Ge tting Be tte r Slowly!!!!

Training Compliance Claims Processing Health and Welfare Benefit Enrollment

ST. LUKE S ANGLICAN CHURCH, PALERMO Sunday, 2 February, 2020 10:30 a.m. Holy Eucharist

1 Image from clipartxtras.com 2 3 Sikhism is the fifth largest world religion. (If the students

Investor Presentation December 2018 NYSE: DVN devonenergy.com Investor Contacts & Notices

Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO - PowerPoint PPT Presentation

Beyond Data and Model Parallelism for Deep Neural Networks ZHIHAO JIA, MATEI ZAHARIA, ALEX AIKEN SYSML 2019 PRESENTED BY JULIUS LISCHEID Existing Parallelisation Approaches (1/2) DATA PARALLELISM MODEL PARALLELISM Replica of neural

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Beyond Data and Model Parallelism for Deep Neural Networks Zhihao Jia, Matei Zaharia and Alex

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Deep Learning with Neural Networks The Structure and Optimization of Deep Neural Networks Allan

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

Optimizing Deep Neural Networks Leena Chennuru Vankadara 26-10-2015 Table of Contents Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

dhcd Department of Housing and Community Development Department of Housing and Community

1 Business &amp; Operating Highlights Overview of the Bank 03 Review of Operating Environment

BIOLOGICAL REMOVAL OF HYDROGEN SULPHIDE FROM LANDFILL SITE BIOGAS Rania Ibrahim 1 , Shahram

What the E c onomists Say &amp; What it Me ans to VB 2 Ge tting Be tte r Slowly!!!!

Training Compliance Claims Processing Health and Welfare Benefit Enrollment

ST. LUKE S ANGLICAN CHURCH, PALERMO Sunday, 2 February, 2020 10:30 a.m. Holy Eucharist

1 Image from clipartxtras.com 2 3 Sikhism is the fifth largest world religion. (If the students

Investor Presentation December 2018 NYSE: DVN devonenergy.com Investor Contacts &amp; Notices

1 Business & Operating Highlights Overview of the Bank 03 Review of Operating Environment

What the E c onomists Say & What it Me ans to VB 2 Ge tting Be tte r Slowly!!!!

Investor Presentation December 2018 NYSE: DVN devonenergy.com Investor Contacts & Notices