INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro

Computation Graph ➤ Dynamic: you re-build the computation graph for each input ➤ Eager: each operation is immediately computed ( no lazy computation) Interfaces ➤ Python ➤ C++ (somewhat experimental) Components ➤ torch: tensors (with gradient computation ability) ➤ torch.nn.functionnal: functions that manipulates tensors ➤ torch.nn: neural network (sub-)components (e.g. a ffi ne transformation) ➤ torch.optim: optimizers

1. TENSORS

TENSORS torch.Tensor ➤ dtype: type of elements ➤ shape: shape/size of the tensor ➤ device: device where the tensor is stored (i.e. cpu, gpu) ➤ requires_grad: do we want to backpropagate gradient to this tensor? dtype ➤ torch.long (signed integer) ➤ torch.float/torch.float32 (default type) ➤ torch.bool ➤ torch.double/torch.float64 ➤ … https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype

CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True )

CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True ) Creating an initialized tensor torch.zeros((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.ones((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.rand((2, 4, 4), dtype=torch.float, requires_grad= True ) https://pytorch.org/docs/stable/torch.html#creation-ops

CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool)

CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool) clone() Create a copy of a tensor t1 = torch.ones((1,)) t2 = t1.clone() t2[0] = 3 print (t1, t2) tensor([1.]) tensor([3.])

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs)

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9

CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9 Creating matrices t3 = torch.tensor([[0, 1], [2, 3]]) ➤ First row: 0, 1 ➤ Second row: 2, 3

OPERATIONS Out-of-place operations ➤ Create a new tensor, i.e. memory is allocated to store the results ➤ Set back-propagation information if required   (i.e. if at least one the inputs has requires_grad=True) In-place operations ➤ Modify the data of the tensor (no memory allocation) ➤ Easy to identify: name ending with an underscore ➤ Can be problematic for gradient computation: • Can break the backpropagation algorithm } • Forget the forward value Be careful when   requires_grad=True https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2

OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2 t3 = torch.mul(t1, t2) This is not matrix t3 = t1.mul(t2) ! multiplication t3 = t1 * t2 t3 = torch.div(t1, t2) t3 = t1.div(t2) t3 = t1 / t2 t3 = torch.matmul(t1, t2) t3 = t1.matmul(t2) t3 = t1 @ t2

IN-PLACE OPERATIONS t1.add_(t2) t1.sub_(t2) t1.mul_(t2) t1.div_(t2) Note: no in-place matrix multiplication

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp »

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,)) b = torch.rand((1,)) c = torch.rand((1,), requires_grad= True ) tmp = a * b d = tmp * c ! Backprop will FAIL! # erase the data of tmp torch.zero_(tmp)

ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,), requires_grad= True ) c = torch.rand((1,)) tmp = a * b d = tmp * c This is OK! # erase the data of tmp torch.zero_(tmp)

ACTIVATION FUNCTIONS import torch import torch.nn import torch.nn.functional as F t1 = torch.rand((2, 10)) « Standard » activations t2 = torch.relu(t1) t2 = torch.tanh(t1) t2 = torch.sigmoid(t1) torch.relu_(t1) torch.tanh_(t1) torch.sigmoid_(t1) Other activations t2 = F.leaky_relu(t1) t2 = F.leaky_relu_(t1) t2 = F.elu(t1) t2 = F.elu_(t1)

BROADCASTING c a b = + ! Invalid dimensions

BROADCASTING c a b Copy rows so that = + dimensions are correct

BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting a = torch.rand((3, 3)) b = torch.rand((1, 3)) # explicitly copy the data b.repeat((3, 1)) # implicit construction # (no duplicated memory) b.expand((3, -1))

BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting Implicit broadcasting a = torch.rand((3, 3)) Many operations will automatically b = torch.rand((1, 3)) broadcast dimensions RTFM! ⇒ # explicitly copy the data a = torch.rand((3, 3)) b.repeat((3, 1)) b = torch.rand((1, 3)) c = a + b # implicit construction # (no duplicated memory) b.expand((3, -1)) https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics https://pytorch.org/docs/stable/torch.html#torch.add

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,)) No gradient is required so the b = torch.rand((1,)) ! call to backward will fail! c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 Explicitly set the incoming sensitivity

GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 # by default gradient is accumulated, # so if we want to recompute a gradient, # we have to erase the previous one manually! a.grad.zero_() Explicitly set the c2.backward(torch.tensor([2.])) incoming sensitivity

2. NEURAL NETWORKS

MODULES AND PARAMETERS torch.nn.Module To build a neural network, we store in a module: ➤ Parameters of the network ➤ Other modules Benefits ➤ Execution mode: we can set the network in training or in test mode   (e.g. to automatically apply or discard dropout) ➤ Move whole network to a device ➤ Retrieve all learnable parameters of the network ➤ …

SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) )

SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) ) def forward(self, inputs): z = x @ self.W.data.transpose(0, 1) \ + self.bias.data.transpose(0, 1) return torch.relu(z) Transpose everything because of Pytorch data format Non-linearity!

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Text Classifjcation using PyTorch Jindich Libovick November 28, 2018 B4M36NLP Introduction

AUTOMATIC MIXED PRECISION IN PYTORCH Michael Carilli and Michael Ruberry, 3/20/2019 THIS TALK

S9243 Fast and Accurate Object Detection Floris Chabert , Solutions Architect with PyTorch and

Text Sentiment Analysis with rNN on the IMDB Dataset PyTorch and TensorFlow Comparative

Investigating scalability of recurrent network using dynamic batching in PyTorch Devin Taylor

Lecture 9: Floating Point Todays topics: Division IEEE 754 representations 1

Evaluating the hardware cost of the posit number system FPL19 Barcelona Yohann Uguen, Luc

CS4402-9535: High-Performance Computing with CUDA Marc Moreno Maza University of Western Ontario,

GL Shading Language (GLSL) GLSL: high level C like language Main program (e.g.

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies Practical Bioinformatics Our

CS 240 Programming in C Variable Names, Elementary Types, Bit Operations September 25, 2019

Toss-n-Wash Luther Banner, Alex Breton, Ian Tolan 2 A Is this you? 3 A A combination

Welcome to CE488 Environmental Geotechnics Lecture #17 Dept. of Civil Engineering IIT Bombay

Sambuz

Useful Links

Newsletter

Mail Us

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)

Comparing TensorFlow 2.0 with PyTorch and PyTorch JIT Tim Lazarus 29 November, 2019 Comparing

Introduction to PyTorch Outline Deep Learning RNN CNN Attention

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

PyTorch Review Session CS330: Deep Multi-task and Meta Learning 10/29/2020 Rafael Rafailov

How PyTorch Scales Deep Learning from Experimentation to Production Vincent Quenneville-Blair,

How PyTorch Optimizes Deep Learning Computations Vincent Quenneville-Blair, PhD. Facebook AI.

AMLD Deep Learning in PyTorch 1. Introduction Fran cois Fleuret http://fleuret.org/amld/

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

&gt;&gt;&gt; ELEG5491: Introduction to Deep Learning &gt;&gt;&gt; PyTorch Tutorials Name: GE

Taking Advantage of Low Precision to Accelerate Training and Inference Using PyTorch Presented

How to train an image classifier using PyTorch Rogier van der Geer -- GoDataDriven What is

Text Classifjcation using PyTorch Jindich Libovick November 28, 2018 B4M36NLP Introduction

AUTOMATIC MIXED PRECISION IN PYTORCH Michael Carilli and Michael Ruberry, 3/20/2019 THIS TALK

S9243 Fast and Accurate Object Detection Floris Chabert , Solutions Architect with PyTorch and

Text Sentiment Analysis with rNN on the IMDB Dataset PyTorch and TensorFlow Comparative

Investigating scalability of recurrent network using dynamic batching in PyTorch Devin Taylor

Lecture 9: Floating Point Todays topics: Division IEEE 754 representations 1

Evaluating the hardware cost of the posit number system FPL19 Barcelona Yohann Uguen, Luc

CS4402-9535: High-Performance Computing with CUDA Marc Moreno Maza University of Western Ontario,

GL Shading Language (GLSL) GLSL: high level C like language Main program (e.g.

Practical Bioinformatics Mark Voorhies 4/29/2011 Mark Voorhies Practical Bioinformatics Our

CS 240 Programming in C Variable Names, Elementary Types, Bit Operations September 25, 2019

Toss-n-Wash Luther Banner, Alex Breton, Ian Tolan 2 A Is this you? 3 A A combination

Welcome to CE488 Environmental Geotechnics Lecture #17 Dept. of Civil Engineering IIT Bombay

Sambuz

Useful Links

Newsletter

Mail Us

>>> ELEG5491: Introduction to Deep Learning >>> PyTorch Tutorials Name: GE