bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 12: Computational - PowerPoint PPT Presentation

Illustration: 3Blue1Brown BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation Aykut Erdem // Hacettepe University // Fall 2019 Last time Multilayer Perceptron Layer Representation y W 4 y i =


  1. Illustration: 3Blue1Brown BBM406 Fundamentals of 
 Machine Learning Lecture 12: Computational Graph Backpropagation Aykut Erdem // Hacettepe University // Fall 2019

  2. Last time… 
 Multilayer Perceptron • Layer Representation y W 4 y i = W i x i x i +1 = σ ( y i ) x4 W 3 • (typically) iterate between 
 x3 linear mapping Wx and 
 W 2 nonlinear function x2 • Loss function 
 l ( y, y i ) W 1 to measure quality of 
 estimate so far slide by Alex Smola x1 2

  3. Last time… Forward Pass • Output of the network can be written as: D X X h j ( x ) = f ( v j 0 + x i v ji ) i =1 J slide by Raquel Urtasun, Richard Zemel, Sanja Fidler X o k ( x ) = g ( w k 0 + h j ( x ) w kj ) j =1 (j indexing hidden units, k indexing the output units, D number of inputs) • Activation functions f , g : sigmoid/logistic, tanh, or rectified linear (ReLU) 1 + exp( − z ) , tanh ( z ) = exp( z ) − exp( − z ) 1 σ ( z ) = exp( z ) + exp( − z ) , ReLU ( z ) = max(0 , z ) 3

  4. 
 
 
 Last time… Forward Pass in Python • Example code for a forward pass for a 3-layer network in Python: 
 slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • Can be implemented e ffi ciently using matrix operations • Example above: W 1 is matrix of size 4 × 3, W 2 is 4 × 4. What about biases and W 3 ? 4 [http://cs231n.github.io/neural-networks-1/]

  5. Backpropagation 5

  6. Recap: Loss function/Optimization TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training 3.42 -3.45 -0.51 data. 
 -8.87 4.64 6.04 0.09 2.65 5.31 2. Come up with a way of 2.9 5.1 -4.22 efficiently finding the 4.48 2.64 parameters that minimize the -4.19 loss function. (optimization) 8.02 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 5.55 3.58 3.78 -4.34 4.49 1.06 -1.5 -4.37 -0.36 -4.79 -2.09 -0.72 6.14 -2.93 We defined a (linear) score function: 6

  7. Softmax Classifier (Multinomial 7 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  8. Softmax Classifier (Multinomial 8 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  9. Softmax Classifier (Multinomial 9 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  10. Softmax Classifier (Multinomial 10 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  11. Softmax Classifier (Multinomial 11 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  12. Softmax Classifier (Multinomial 12 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  13. Softmax Classifier (Multinomial 13 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  14. Softmax Classifier (Multinomial 14 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  15. Softmax Classifier (Multinomial 15 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  16. Softmax Classifier (Multinomial 16 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  17. Optimization 17

  18. 18 Gradient Descent slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  19. Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 19

  20. Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson there are also more fancy update formulas (momentum, Adagrad, RMSProp, Adam, …) 20

  21. The e ff ects of di ff erent update form formulas slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 21 (image credits to Alec Radford)

  22. 22 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  23. 23 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  24. 24 L Computational Graph + R hinge loss s (scores) * x W slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  25. Convolutional Network (AlexNet) input image weights slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson loss 25

  26. 26 Neural Turing Machine loss input tape slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  27. 27 e.g. x = -2, y = 5, z = -4 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  28. 28 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  29. 29 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  30. 30 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  31. 31 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  32. 32 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  33. 33 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  34. 34 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  35. 35 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  36. 36 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  37. 37 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  38. 38 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  39. 39 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  40. 40 f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  41. 41 “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  42. 42 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  43. 43 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  44. 44 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  45. 45 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  46. 46 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  47. 47 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  48. 48 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  49. 49 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  50. 50 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  51. 51 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  52. 52 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  53. 53 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  54. 54 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  55. 55 (-1) * (-0.20) = 0.20 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  56. 56 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  57. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 57

  58. 58 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  59. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 0.40 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 59

  60. 60 sigmoid function sigmoid gate 0.40 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  61. sigmoid function slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 0.40 sigmoid gate (0.73) * (1 - 0.73) = 0.2 61

  62. Patterns in backward flow • add gate: gradient distributor • max gate: gradient router • mul gate: gradient… “switcher”? slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 62

  63. 63 Gradients add at branches + slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

  64. Implementation: forward/backward API Graph (or Net) object. (Rough pseudo code) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 0.40 64

  65. Implementation: forward/backward API x z * y (x,y,z are scalars) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 65

  66. Implementation: forward/backward API x z * y (x,y,z are scalars) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 66

Recommend


More recommend