introduction to pytorch
play

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you - PowerPoint PPT Presentation

INTRODUCTION TO PYTORCH Caio corro Computation Graph Dynamic: you re-build the computation graph for each input Eager: each operation is immediately computed ( no lazy computation) Interfaces Python C++ (somewhat experimental)


  1. INTRODUCTION TO PYTORCH Caio corro

  2. Computation Graph ➤ Dynamic: you re-build the computation graph for each input ➤ Eager: each operation is immediately computed ( no lazy computation) Interfaces ➤ Python ➤ C++ (somewhat experimental) Components ➤ torch: tensors (with gradient computation ability) ➤ torch.nn.functionnal: functions that manipulates tensors ➤ torch.nn: neural network (sub-)components (e.g. a ffi ne transformation) ➤ torch.optim: optimizers

  3. 1. TENSORS

  4. TENSORS torch.Tensor ➤ dtype: type of elements ➤ shape: shape/size of the tensor ➤ device: device where the tensor is stored (i.e. cpu, gpu) ➤ requires_grad: do we want to backpropagate gradient to this tensor? dtype ➤ torch.long (signed integer) ➤ torch.float/torch.float32 (default type) ➤ torch.bool ➤ torch.double/torch.float64 ➤ … https://pytorch.org/docs/stable/tensor_attributes.html#torch.torch.dtype

  5. CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True )

  6. CREATING TENSORS Creating an uninitialized tensor Default arguments import torch ➤ Float t = torch.empty( ➤ CPU (2, 4, 4), # shape ➤ No grad dtype=torch.float, device="cpu", requires_grad= True ) Creating an initialized tensor torch.zeros((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.ones((2, 4, 4), dtype=torch.float, requires_grad= True ) torch.rand((2, 4, 4), dtype=torch.float, requires_grad= True ) https://pytorch.org/docs/stable/torch.html#creation-ops

  7. CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool)

  8. CREATING TENSORS FROM OTHER TENSORS *_like() functions Create a new tensor with the same attributes than the argument: ➤ Specific attributes can be overridden ➤ Shape cannot be changed t2 = torch.zeros_like(t) t_bool = torch.zeros_like(t, dtype=torch.bool) clone() Create a copy of a tensor t1 = torch.ones((1,)) t2 = t1.clone() t2[0] = 3 print (t1, t2) tensor([1.]) tensor([3.])

  9. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs)

  10. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9

  11. CREATING TENSORS FROM DATA From python data t1 = torch.tensor([0, 1, 2, 3], dtype=torch.long) ➤ Creates a vector with integers 0, 1, 2, 3 ➤ Elements are signed integers (longs) Using iterables t2 = torch.tensor(range(10)) ➤ Vector of floats with values from 0 to 9 Creating matrices t3 = torch.tensor([[0, 1], [2, 3]]) ➤ First row: 0, 1 ➤ Second row: 2, 3

  12. OPERATIONS Out-of-place operations ➤ Create a new tensor, i.e. memory is allocated to store the results ➤ Set back-propagation information if required 
 (i.e. if at least one the inputs has requires_grad=True) In-place operations ➤ Modify the data of the tensor (no memory allocation) ➤ Easy to identify: name ending with an underscore ➤ Can be problematic for gradient computation: • Can break the backpropagation algorithm } • Forget the forward value Be careful when 
 requires_grad=True https://pytorch.org/docs/stable/notes/autograd.html#in-place-operations-with-autograd

  13. OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2

  14. OUT-OF-PLACE OPERATIONS t1 = torch.rand((4, 4)) t2 = torch.rand((4, 4)) t3 = torch.sub(t1, t2) t3 = torch.add(t1, t2) t3 = t1.sub(t2) t3 = t1.add(t2) t3 = t1 - t2 t3 = t1 + t2 t3 = torch.mul(t1, t2) This is not matrix t3 = t1.mul(t2) ! multiplication t3 = t1 * t2 t3 = torch.div(t1, t2) t3 = t1.div(t2) t3 = t1 / t2 t3 = torch.matmul(t1, t2) t3 = t1.matmul(t2) t3 = t1 @ t2

  15. IN-PLACE OPERATIONS t1.add_(t2) t1.sub_(t2) t1.mul_(t2) t1.div_(t2) Note: no in-place matrix multiplication

  16. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d

  17. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp »

  18. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,)) b = torch.rand((1,)) c = torch.rand((1,), requires_grad= True ) tmp = a * b d = tmp * c ! Backprop will FAIL! # erase the data of tmp torch.zero_(tmp)

  19. ELEMENT-WISE OPERATIONS: IN-PLACE b c tmp = a × b d = tmp × c a x tmp x d Back-propagation to tmp ∂ d ∂ d ➤ We need the value of « c » ∂ tmp = c ∂ c = tmp ➤ We don’t need the value of « tmp » a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,), requires_grad= True ) c = torch.rand((1,)) tmp = a * b d = tmp * c This is OK! # erase the data of tmp torch.zero_(tmp)

  20. ACTIVATION FUNCTIONS import torch import torch.nn import torch.nn.functional as F t1 = torch.rand((2, 10)) « Standard » activations t2 = torch.relu(t1) t2 = torch.tanh(t1) t2 = torch.sigmoid(t1) torch.relu_(t1) torch.tanh_(t1) torch.sigmoid_(t1) Other activations t2 = F.leaky_relu(t1) t2 = F.leaky_relu_(t1) t2 = F.elu(t1) t2 = F.elu_(t1)

  21. BROADCASTING c a b = + ! Invalid dimensions

  22. BROADCASTING c a b Copy rows so that = + dimensions are correct

  23. BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting a = torch.rand((3, 3)) b = torch.rand((1, 3)) # explicitly copy the data b.repeat((3, 1)) # implicit construction # (no duplicated memory) b.expand((3, -1))

  24. BROADCASTING c a b Copy rows so that = + dimensions are correct Explicit broadcasting Implicit broadcasting a = torch.rand((3, 3)) Many operations will automatically b = torch.rand((1, 3)) broadcast dimensions RTFM! ⇒ # explicitly copy the data a = torch.rand((3, 3)) b.repeat((3, 1)) b = torch.rand((1, 3)) c = a + b # implicit construction # (no duplicated memory) b.expand((3, -1)) https://pytorch.org/docs/stable/notes/broadcasting.html#broadcasting-semantics https://pytorch.org/docs/stable/torch.html#torch.add

  25. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,)) No gradient is required so the b = torch.rand((1,)) ! call to backward will fail! c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

  26. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() Explicitly set the incoming sensitivity

  27. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 Explicitly set the incoming sensitivity

  28. GRADIENT COMPUTATION ➤ backward() launch the back-prop algorithm if and only if a gradient is required a = torch.rand((1,), requires_grad= True ) b = torch.rand((1,)) c = a * b # after the call a.grad contains the gradient c.backward() # let's do something else... b2 = torch.rand((1, )) c2 = a * b2 # by default gradient is accumulated, # so if we want to recompute a gradient, # we have to erase the previous one manually! a.grad.zero_() Explicitly set the c2.backward(torch.tensor([2.])) incoming sensitivity

  29. 2. NEURAL NETWORKS

  30. MODULES AND PARAMETERS torch.nn.Module To build a neural network, we store in a module: ➤ Parameters of the network ➤ Other modules Benefits ➤ Execution mode: we can set the network in training or in test mode 
 (e.g. to automatically apply or discard dropout) ➤ Move whole network to a device ➤ Retrieve all learnable parameters of the network ➤ …

  31. SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) )

  32. SINGLE HIDDEN LAYER 1/2 class HiddenLayer(torch.nn.Module): def __init__(self, input_dim, output_dim): super().__init__() self.W = torch.nn.Parameter( torch.empty(dim_output, dim_input) ) self.bias = torch.nn.Parameter( torch.empty(dim_output, 1) ) def forward(self, inputs): z = x @ self.W.data.transpose(0, 1) \ + self.bias.data.transpose(0, 1) return torch.relu(z) Transpose everything because of Pytorch data format Non-linearity!

Recommend


More recommend