Distributed Training on HPC Presented By: Aaron D. Saxton, PhD - PowerPoint PPT Presentation

7/11/19 Distributed Training on HPC Presented By: Aaron D. Saxton, PhD

Statistics Review Simple y = 𝑛 $ 𝑦 + 𝑐 regression • • Least Squares to find m,b With data set { 𝑦 ) , 𝑧 ) } )-.,..,0 • • Very special, often hard to measure 𝑧 ) • Let the error be 0 [(𝑧 ) − 𝑛 $ 𝑦 ) + 𝑐 ] 7 • 𝑆 = ∑ )-. Minimize 𝑆 with respect to 𝑛 and 𝑐 . • • Simultaneously Solve • 𝑆 8 𝑛, 𝑐 = 0 • 𝑆 : (𝑛, 𝑐) = 0 • Linear System We will consider more general 𝑧 = 𝑔(𝑦) • 𝑆 8 𝑛, 𝑐 = 0 and 𝑆 : 𝑛, 𝑐 = 0 may not be linear • 2

Statistics Review • Regressions with parameterized sets of functions. e.g. • 𝑧 = 𝑏𝑦 7 + 𝑐𝑦 + 𝑑 (quadratic) • 𝑧 = ∑ 𝑏 ) 𝑦 ) (polynomial) • 𝑧 = 𝑂𝑓 AB (exponential) . • 𝑧 = .CD E(FGHI) (logistic) 3

Statistics Review • Polynomial model of degree ‘n’ • “degrees of freedom” - models capacity Deep Learning, Goodfellow et. al., MIT Press, http://www.deeplearningbook.org, 2016 4

Gradient Decent • Searching for minimum 𝛼𝑆 = 𝑆 K L , 𝑆 K M , … , 𝑆 K O • 𝑆 ⃗ 𝜄 RC. = 𝑆 ⃗ 𝜄 R + 𝛿𝛼𝑆 • 𝛿: Learning Rate • • Recall, Loss depends on data Expand notation, • 𝑆 ⃗ 𝜄 R ; { 𝑦 ) , 𝑧 ) } 0 Recall 𝑆 and 𝛼𝑆 is a sum over 𝑗 • Intuitively, want 𝑆 with • 0 [(𝑧 ) − 𝑔 ALL DATA ….. ? ( 𝑆 = ∑ )-. K W (𝑦 ) )] 7 ) 5

Gradient Decent 6

Stochastic Gradient Decent 0 [(𝑧 ) − 𝑔 K W (𝑦 ) )] 7 ) Recall 𝑆 is a sum over 𝑗 ( 𝑆 = ∑ )-. • Single training example, 𝑦 ) , 𝑧 ) , Sum over only one training example • 𝛼𝑆 B X ,Y X = 𝑆 K L , 𝑆 K M , … , 𝑆 K O • B X ,Y X ⃗ ⃗ 𝑆 B X ,Y X 𝜄 RC. = 𝑆 B X ,Y X 𝜄 R + 𝛿𝛼𝑆 B X ,Y X • 𝛿: Learning Rate • Choose next 𝑦 )C. , 𝑧 )C. , (Shuffled training set) • • SGD with mini batches Many training example, 𝑦 ) , 𝑧 ) , Sum over many training example • • Batch Size or Mini Batch Size (This gets ambiguous with distributed training) • SGD often outperforms traditional GD, want small batches. • https://arxiv.org/abs/1609.04836, On Large-Batch Training … Sharp Minima • https://arxiv.org/abs/1711.04325, Extremely Large ... in 15 Minutes 7

Neural Networks • Activation functions Logistic ReLU (Rectified Linear Unit) 1.5 1.5 1 𝜏 𝑦 = 𝜏 𝑦 = 1 0.5 0.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 Arctan 2.5 𝜏 𝑦 = -7.5 -5 -2.5 0 2.5 5 7.5 10 -2.5 • Softmax D I] • 𝑕 [ 𝑦 . , 𝑦 7 , … , 𝑦 \ = ∑ D IX 8

Neural Networks • Parameterized function • 𝑎 ` = 𝜏 𝛽 b8 + 𝛽 8 𝑌 • 𝑈 e = 𝛾 b[ + 𝛾 [ 𝑎 • 𝑔 e 𝑌 = 𝑕 [ 𝑈 • Linear Transformations with pointwise evaluation of nonlinear function, 𝜏 𝑈 → 𝑍 𝑌 𝑎 • 𝛾 b) , 𝛾 ) , 𝛽 b8 , 𝛽 8 • Weights to be optimized 9

Faux Model Example 10

Distributed Training, data distributed 11

Distributed Training, data distributed 12

Distributed Training, All Reduce Collective 13

Distributed TensorFlow: Parameter Sever/Worker Default, Bad Way on HPC ps:0 worker:0 worker:1 Aggregate Model worker:2 Update Model Parameters Loss (Cross Entropy) Model ps:1 Loss (Cross Entropy) Optimize (Gradient Decent) Aggregate Loss (Cross Entropy) Optimize (Gradient Decent) Update Parameters Optimize (Gradient Decent) 14

Other models: Sequence Modeling Autoregression • i 𝜚 ) 𝐶 ) 𝑌 R + 𝜗 R 𝑌 R = 𝑑 + ∑ )-. Back Shift Operatior : 𝐶 ) Autocorrelation • 𝑆 {{ (𝑢 . , 𝑢 7 ) = 𝐹[𝑌 R ~ 𝑌 R M ] Other tasks • Semantic Labeling • [art.] [adj.] [adj.] [n.] [v.] [adverb] [art.] [adj.] [adj.] [d.o.] The quick red fox jumps over the lazy brown dog

Recurrent Neural Networks: Sequence Modeling • Few projects use pure RNNs, this example is only for pedagogy • RNN is a model that is as “deep” as the modeled sequence is long • LSTM’s, Gated recurrent unit, • No Model Parallel distributed training on the market (June 2019) 16

Distributed Training on HPC Presented By: Aaron D. Saxton, PhD - PowerPoint PPT Presentation

7/11/19 Distributed Training on HPC Presented By: Aaron D. Saxton, PhD Statistics Review Simple y = $ + regression Least Squares to find m,b With data set { ) , ) } )-.,..,0 Very special, often hard to

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Malware Detection in Memory Forensics: Current Issues and Challenges Ricardo J. Rodrguez

Connecting people: WebRTC and the Pexip collaboration platform TF-WebRTC : May 19, 2015 Who are

WebAudio Deep Note DeveloperWeek Nov 7, 2018 Austin, TX

Exploring HTML 5 Thierry Sans Geolocation Get GPS coordinates

Non-Standard Behavior of Density Estimators for Functions of Independent Observations Wolfgang

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today Emoji

Concurrent Revisions A novel determinis2c concurrency model Daan

A Self-Replication Algorithm to Flexibly Match Join Point Traces Paul Leger and ric Tanter

Distributed Training on HPC Presented By: Aaron D. Saxton, PhD - PowerPoint PPT Presentation

7/11/19 Distributed Training on HPC Presented By: Aaron D. Saxton, PhD Statistics Review Simple y = $ + regression Least Squares to find m,b With data set { ) , ) } )-.,..,0 Very special, often hard to

HPC @ SAO S.G. Korzennik - SAO HPC Analyst hpc@cfa February 2013 SGK ( hpc@cfa ) HPC @ SAO

Uni.lu HPC School 2020 PS6: HPC Containers: Singularity Uni.lu High Performance Computing (HPC)

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Whats new in HPC? Gregory Bauer To keep up-to-date on HPC HPC Guru -

UL HPC School 2017[bis] PS1: Getting Started on the UL HPC platform UL High Performance

UL HPC School 2017 PS5: Advanced Scheduling with SLURM and OAR on UL HPC clusters UL High

UL HPC School 2017 PS1: Getting Started on the UL HPC platform UL High Performance Computing

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

CONTAINERS DEMOCRATIZE HPC CJ Newburn, Principal Architect for HPC, NVIDIA GTC19 S9525 -

Computer Security Summer Scholars 2016 Ma7 Vander Werf HPC System Administrator Security in HPC

Building a Grid System for HPC HPC on Grid High Performance Computing (HPC): Use of computer

HPC IN EUROPE Organisation of public HPC resources Context Focus on publicly-funded HPC

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, PhD.

HPC platforms @ UL Overview (as of 2013) and Usage http://hpc.uni.lu S. Varrette, H. Cartiaux

MATLAB on UL HPC Checkpointing &amp; parallel execution UL High Performance Computing (HPC) Team

building software with ease kenneth.hoste@ugent.be HPC UGENT About HPC UGent: central

Malware Detection in Memory Forensics: Current Issues and Challenges Ricardo J. Rodrguez

Connecting people: WebRTC and the Pexip collaboration platform TF-WebRTC : May 19, 2015 Who are

WebAudio Deep Note DeveloperWeek Nov 7, 2018 Austin, TX

Exploring HTML 5 Thierry Sans Geolocation Get GPS coordinates

Non-Standard Behavior of Density Estimators for Functions of Independent Observations Wolfgang

Stanford CS193p Developing Applications for iOS Fall 2017-18 CS193p Fall 2017-18 Today Emoji

Concurrent Revisions A novel determinis2c concurrency model Daan

A Self-Replication Algorithm to Flexibly Match Join Point Traces Paul Leger and ric Tanter

MATLAB on UL HPC Checkpointing & parallel execution UL High Performance Computing (HPC) Team