INTRODUCTION TO PYTORCH Caio corro
Computation Graph ➤ Dynamic: you re-build the computation graph for each input ➤ Eager: each operation is immediately computed ( no lazy computation) Interfaces ➤ Python ➤ C++ (somewhat experimental) Components ➤ torch: tensors (with gradient computation ability) ➤ torch.nn.functionnal: functions that manipulates tensors ➤ torch.nn: neural network (sub-)components (e.g. a ffi ne transformation) ➤ torch.optim: optimizers
1. TENSORS
TENSORS torch.Tensor ➤ dtype: type of elements ➤ shape: shape/size of the tensor ➤ device: device where the tensor is stored (i.e. cpu, gpu) ➤ requires_grad: do we want to backpropagate gradient to this tensor? dtype ➤ torch.long (signed integer) ➤ torch.float/torch.float32 (default type) ➤ torch.bool ➤ torch.double/torch.float64 ➤ … https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype
CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True )
CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True ) Creating an initialized tensor torch.zeros((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.ones((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.rand((2, 4, 4), dtype=torch.float, requires_grad= True ) https://pytorch.org/docs/stable/torch.html#creation-ops
CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool)
CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool) clone() Create a copy of a tensor t1 = torch.ones((1,)) t2 = t1.clone() t2[0] = 3 print (t1, t2) tensor([1.]) tensor([3.])
CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs)
CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9
CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9 Creating matrices t3 = torch.tensor([[0, 1], [2, 3]]) ➤ First row: 0, 1 ➤ Second row: 2, 3
OPERATIONS Out-of-place operations ➤ Create a new tensor, i.e. memory is allocated to store the results ➤ Set back-propagation information if required (i.e. if at least one the inputs has requires_grad=True) In-place operations ➤ Modify the data of the tensor (no memory allocation) ➤ Easy to identify: name ending with an underscore ➤ Can be problematic for gradient computation: • Can break the backpropagation algorithm } • Forget the forward value Be careful when requires_grad=True https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd
OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2
OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2 t3 = torch.mul(t1, t2) This is not matrix t3 = t1.mul(t2) ! multiplication t3 = t1 * t2 t3 = torch.div(t1, t2) t3 = t1.div(t2) t3 = t1 / t2 t3 = torch.matmul(t1, t2) t3 = t1.matmul(t2) t3 = t1 @ t2
IN-PLACE OPERATIONS t1.add_(t2) t1.sub_(t2) t1.mul_(t2) t1.div_(t2) Note: no in-place matrix multiplication
ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d
ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp »
ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,)) b = torch.rand((1,)) c = torch.rand((1,), requires_grad= True ) tmp = a * b d = tmp * c ! Backprop will FAIL! # erase the data of tmp torch.zero_(tmp)
ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,), requires_grad= True ) c = torch.rand((1,)) tmp = a * b d = tmp * c This is OK! # erase the data of tmp torch.zero_(tmp)
ACTIVATION FUNCTIONS import torch import torch.nn import torch.nn.functional as F t1 = torch.rand((2, 10)) « Standard » activations t2 = torch.relu(t1) t2 = torch.tanh(t1) t2 = torch.sigmoid(t1) torch.relu_(t1) torch.tanh_(t1) torch.sigmoid_(t1) Other activations t2 = F.leaky_relu(t1) t2 = F.leaky_relu_(t1) t2 = F.elu(t1) t2 = F.elu_(t1)
BROADCASTING c a b = + ! Invalid dimensions
BROADCASTING c a b Copy rows so that = + dimensions are correct
BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting a = torch.rand((3, 3)) b = torch.rand((1, 3)) # explicitly copy the data b.repeat((3, 1)) # implicit construction # (no duplicated memory) b.expand((3, -1))
BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting Implicit broadcasting a = torch.rand((3, 3)) Many operations will automatically b = torch.rand((1, 3)) broadcast dimensions RTFM! ⇒ # explicitly copy the data a = torch.rand((3, 3)) b.repeat((3, 1)) b = torch.rand((1, 3)) c = a + b # implicit construction # (no duplicated memory) b.expand((3, -1)) https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics https://pytorch.org/docs/stable/torch.html#torch.add
GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,)) No gradient is required so the b = torch.rand((1,)) ! call to backward will fail! c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity
GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity
GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 Explicitly set the incoming sensitivity
GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 # by default gradient is accumulated, # so if we want to recompute a gradient, # we have to erase the previous one manually! a.grad.zero_() Explicitly set the c2.backward(torch.tensor([2.])) incoming sensitivity
2. NEURAL NETWORKS
MODULES AND PARAMETERS torch.nn.Module To build a neural network, we store in a module: ➤ Parameters of the network ➤ Other modules Benefits ➤ Execution mode: we can set the network in training or in test mode (e.g. to automatically apply or discard dropout) ➤ Move whole network to a device ➤ Retrieve all learnable parameters of the network ➤ …
SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) )
SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) ) def forward(self, inputs): z = x @ self.W.data.transpose(0, 1) \ + self.bias.data.transpose(0, 1) return torch.relu(z) Transpose everything because of Pytorch data format Non-linearity!
Recommend
More recommend