Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea - PowerPoint PPT Presentation

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea earning Fram rameworks GTC 2019 @ San Jose. Mar. 20, 2019 Seiya Tokui, Preferred Networks, Inc. S9380

Deep Learning Framework for fast iterative research/development 2

Def efine-by by-Run fr frameworks by default from 2.0 3

x = numpy.array (…) Write forward prop h1 = layer1(x, W1) as a plain Python script. h2 = layer2(h1, W2) loss = loss_func(h2) loss.backward() Variables hold how they W1.array -= lr * W1.grad were computed. Use it to W2.array -= lr * W2.grad compute the gradient. 4

Deep learning framework optimized for the Define-by-Run API design 5

✓ Model description ✓ Distributed training ✓ Serialization, export …… Everything is optimized for Define-by-Run style programming 6

class Linear(chainer.Link): Tie parameters to the forward code using OOP. def __init__( self , n_in, n_out): super ().__init__() with self.init_scope(): self .W = chainer.Parameter(I.HeNormal(), (n_in, n_out)) self .b = chainer.Parameter(0, (n_out,)) def forward( self , x): return x @ self .W + self .b 7

class MLP(chainer.Chain): def __init__( self ): super ().__init__() with self.init_scope(): self .l1 = Linear(784, 200) Object structure = self .l2 = Linear(200, 100) composition of NN fragments self .l3 = Linear(100, 10) def forward( self , x): h1 = F.relu( self .l1(x)) h2 = F.relu( self .l2(h1)) return self .l3(h2) 8

for batch in iterator: # fetch the next minibatch x, t = converter(batch) # concat, transfer to the device loss = loss_fun(x, t) # forward prop loss.backward() # backprop optimizer.update() # update parameters model.cleargrad() # cleanup gradients Every part is plain, customizable Python code 9

Fast GPU computation 10

import numpy as np def logsumexp(x): x_max = x.max(axis=1, keepaxis=True) x0 = x – x_max lse = np.log(np.exp(x0).sum(axis=1)) lse += x_max return lse x = np.array([...], dtype=np.float32) print (logsumexp(x)) 12

import cupy as cp def logsumexp(x): x_max = x.max(axis=1, keepaxis=True) x0 = x – x_max lse = cp .log( cp .exp(x0).sum(axis=1)) lse += x_max return lse x = cp .array([...], dtype=np.float32) print (logsumexp(x)) 13

import cupy as cp , numpy as np def logsumexp(x): x_max = x.max(axis=1, keepaxis=True) x0 = x – x_max lse = np .log( np .exp(x0).sum(axis=1)) lse += x_max return lse x = cp .array([...], dtype=np.float32) print (logsumexp(x)) 14

✓ cuDNN support (conv, pooling, LSTM, …) ✓ Easy custom kernel compiled at runtime ✓ FP16 support 15

load & make forward parameter minibatch backward update DALI, float16 mode, Distributed multiprocessing TensorCore training 17

Mixed precision training > TensorCore support automatically available > Techniques for mixed precision training optimizer.set_loss_scale(scale) optimizer.use_fp32_update() > mixed16 mode (coming soon) CHAINER_DTYPE=mixed16 18

Distributed training 19

Process 0 on node 0, GPU 0 Forward Backward Optimize ALL-REDUCE Process 1 on node 0, GPU 1 Backward Optimize Forward Process 2 on node 1, GPU 0 Forward Backward Optimize Process 3 on node 1, GPU 1 Forward Backward Optimize 20

Data parallelism comm = chainermn.create_communicator() device = comm.intra_rank # use this device optimizer = chainermn.create_multi_node_optimizer (…, comm) Scaled to V100x512 environment (https://arxiv.org/abs/1809.00778) 21

Model parallelism # rank 0 > Each node computes phi = send (x, comm, rank=1) different part of the network h = recv (comm, rank=1, delegate_variable=phi) ( model itself is parallelized ) > MPI communication # rank 1 primitives with backprop x = recv (comm, rank=0) h = f(x) phi = send (h, comm, rank=0) 22

Model parallelism # rank 0 phi = send (x, comm, rank=1) > send returns a pseudo variable φ. It h = recv (comm, rank=1, simulates the topology of full delegate_variable=phi) computational graph loss(h).backward() > Collective communication routines, # rank 1 e.g. bcast, scatter, allgather etc., are x = recv (comm, rank=0) also available h = f(x) phi = send (h, comm, rank=0) phi.backward() 24

Domain specific add-on packages 25

✓ Support standard computer vision tasks classification, object detection, semantic/instance segmentation ✓ Simple, unified interface easy to use and compose, optimized for computer vision workloads ✓ Guaranteed reproduction every method implemented is confirmed to reproduce the same performance as the original paper 26

✓ Wide range of Deep RL methods covered DQN, Categorical DQN, IQN, DDPG, A3C, ACER, NSQ, PCL, PPO, TRPO ✓ Clean API and abstraction easy to combine multiple orthogonal design choices, e.g. discrete/continuous actions, recurrent models, async training, ... ✓ Environment support compatible with OpenAI Gym interface 27

Chainer Chemistry Chainer UI 28

What is needed for modern deep learning frameworks? 29

Quick Environment Speed Deployment support Faster trial-and-error Quick adoption of new Quick application of Larger scale hardware/environment research outcome

ChainerX included in Chainer v6 beta1

Cha ChainerX = Nu NumPy-like nda ndarray + + aut autograd Speed • in C++ w/ a thin binding layer = far less host-side overhead • with pluggable device backends Environment Support = open to quickly add a new device support Quick Deployment • with pure C++ API = available for Python-free native apps

Existing code using Chainer High level API (Chainer) Low-overhead computation written in Python Portable code with much ChainerX Python API less overhead in C++ NumPy CuPy ChainerX (with C++ API) custom backend … Native backend CUDA backend 33

Cha ChainerX Pyt ython API: I: chainerx nam namespace > NumPy compatible API import chainerx as chx > NN specific functions x = chx.ones((2, 3), conv, batch_norm , … dtype=chx.float32, device='cuda:0') > Device support y = (x + 1).require_grad() z = chx.exp(y).sum() z.backward() > require_grad() to make array differentiable

Cha Chainer on on Cha ChainerX > Wraps chx.ndarray with Variable arr = chx.ones((2, 3), > FunctionNode fallbacks the dtype=chx.float32) computation to NumPy/CuPy x = chainer.Variable(arr) y = model(x) > Uses ChainerX (C++) y.backward() computational graph with lower overhead in backprop

Cha ChainerX C+ C++ API > Has almost one-to-one chainerx::Array x = chainerx::ones( mapping to Python API {2, 3}, chainerx::Dtype::kFloat32, chainerx::GetDevice("cuda:0")); > Runs without CPython chainerx::Array y = (x + 1).RequireGrad(); chainerx::Array z = chainerx::Exp(y).Sum(); environment chainerx::Backward(z);

Time per iteration Framework/API (=fwd+bwd+update, msec) Chainer on NumPy 14.48 Chainer on ChainerX 7.54 ChainerX Python 1.88 PyTorch 2.45 Host logic overhead

Cha ChainerX: Ro Roadmap v6 v7 future May. 2019 Nov. 2019 2020+ ChainerX Wide coverage of ops Wide coverage of ops Easier deploy Basic ops Ready for most users Ready for most users Wider coverage of Integration to Chainer C++ API made more C++ API made more “compiled models” accessible accessible 38

Chai Chainer Com Compiler https://gi ht github.c .com/pfnet-re research/chainer-compiler Tracing (ONNX-Chainer) Python ChainerX Execution with ONNX+ ChainerX Chainer VM Translation (Chainer to ONNX) Vendor-specific graph formats Graph-based optimization Graph-based autodiff Dynamic shape Native binary Control flows

> Pioneering define-by-run API design > Being made faster and more portable with ChainerX and Chainer Compiler WE ARE HIRING! @ChainerOfficial on Twitter https://bit.ly/join-chainer-slack https://preferred-networks.jp/en/jobs 41

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea - PowerPoint PPT Presentation

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea earning Fram rameworks GTC 2019 @ San Jose. Mar. 20, 2019 Seiya Tokui, Preferred Networks, Inc. S9380 Deep Learning Framework for fast iterative research/development 2 Def

DEF Committee- What is DEF? Diesel exhaust fluid (DEF) is a urea based product to neutralize the

An Overview of Tier 4 Visas for Departmental Administrators Julia Jago Tier 4 Visas Officer 2.

ENGIE Energa Per September 2016 Highlights Q316: EEP Total Installed Capacity reached 2,638

Graphics! def f(p, q): def main(): print(2 * q + p) i = 10 j = 3 f(i, j) def g(c, d): f(j,

WHAT ARE TIER 1, 2, 3 WATERS Tier 1 impaired Tier 2 fishable, swimmable, drinkable

Tier 3 Vehicle and Fuel Standards February 2016 1 Overview Overview of the Tier 3 Program

FCPS FY 2010 Potential Reductions Tier 1 Tier 2 Tier 3 INSTRUCTIONAL 1. Academics 1.

R efine. Earnings Presentation Cautionar y Statement This presentation contains non-IFRS

Fron onti tier er Mark arkets ts ValueX ueX Berk erksh shir ires es 2014 A clearer path

Tier Two Report WHAT IS THE TIER TWO TIER TWO REPORT? EMERGENCY AND HAZARDOUS CHEMICAL A

WEC Tier 3 Annual Plan 2018 Vermont System Planning Committee 24 January 2018 WEC 2018 Tier 3

The 4-tier model for CAMHS Very specialist Services, often Tier 4 children away from home

CPSC 875 CPSC 875 John D McGregor John D. McGregor C 8 More Design 3 tier 3 tier Variations

Wh Whis iskey y Mou ounta ntain in Bi Bighorn orn Sheep eep Draft Plan Presentation and

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

Tier II and You Utilizing EPCRA Tier II Reports to Protect Your Community Kansas LEPC

School of Engineering University of Vermont 2 Motivation GPR System development - System

IGS-MGEX: Preparing for a Multi-GNSS World O. Montenbruck, P. Steigenberger DLR, German Space

Numerical methods in molecular dynamics T. Lelivre CERMICS - Ecole des Ponts ParisTech &

Next Exit Slide Show 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Q1 2019 Financial Results and Strategy Update SET Digit ital al Roadshow show 5 June ne 2019

Challenges of Urmia Lake and Restoration Program Prepared by: International Cooperation

Ground Water Management in Groundwater Demand and Use Groundwater Occurrences Bangladesh:

BASIC ASSESSMENT PROCESS FOR THE PROPOSED REHABILITATION AND ENGINEERING MEASURES AT THE BLUE

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea - PowerPoint PPT Presentation

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea earning Fram rameworks GTC 2019 @ San Jose. Mar. 20, 2019 Seiya Tokui, Preferred Networks, Inc. S9380 Deep Learning Framework for fast iterative research/development 2 Def

DEF Committee- What is DEF? Diesel exhaust fluid (DEF) is a urea based product to neutralize the

An Overview of Tier 4 Visas for Departmental Administrators Julia Jago Tier 4 Visas Officer 2.

ENGIE Energa Per September 2016 Highlights Q316: EEP Total Installed Capacity reached 2,638

Graphics! def f(p, q): def main(): print(2 * q + p) i = 10 j = 3 f(i, j) def g(c, d): f(j,

WHAT ARE TIER 1, 2, 3 WATERS Tier 1 impaired Tier 2 fishable, swimmable, drinkable

Tier 3 Vehicle and Fuel Standards February 2016 1 Overview Overview of the Tier 3 Program

FCPS FY 2010 Potential Reductions Tier 1 Tier 2 Tier 3 INSTRUCTIONAL 1. Academics 1.

R efine. Earnings Presentation Cautionar y Statement This presentation contains non-IFRS

Fron onti tier er Mark arkets ts ValueX ueX Berk erksh shir ires es 2014 A clearer path

Tier Two Report WHAT IS THE TIER TWO TIER TWO REPORT? EMERGENCY AND HAZARDOUS CHEMICAL A

WEC Tier 3 Annual Plan 2018 Vermont System Planning Committee 24 January 2018 WEC 2018 Tier 3

The 4-tier model for CAMHS Very specialist Services, often Tier 4 children away from home

CPSC 875 CPSC 875 John D McGregor John D. McGregor C 8 More Design 3 tier 3 tier Variations

Wh Whis iskey y Mou ounta ntain in Bi Bighorn orn Sheep eep Draft Plan Presentation and

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

Tier II and You Utilizing EPCRA Tier II Reports to Protect Your Community Kansas LEPC

School of Engineering University of Vermont 2 Motivation GPR System development - System

IGS-MGEX: Preparing for a Multi-GNSS World O. Montenbruck, P. Steigenberger DLR, German Space

Numerical methods in molecular dynamics T. Lelivre CERMICS - Ecole des Ponts ParisTech &amp;

Next Exit Slide Show 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21

Q1 2019 Financial Results and Strategy Update SET Digit ital al Roadshow show 5 June ne 2019

Challenges of Urmia Lake and Restoration Program Prepared by: International Cooperation

Ground Water Management in Groundwater Demand and Use Groundwater Occurrences Bangladesh:

BASIC ASSESSMENT PROCESS FOR THE PROPOSED REHABILITATION AND ENGINEERING MEASURES AT THE BLUE

Numerical methods in molecular dynamics T. Lelivre CERMICS - Ecole des Ponts ParisTech &