administrative a1 is due today midnight you can use up to
play

Administrative - A1 is due Today (midnight). You can use up to 3 - PowerPoint PPT Presentation

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, its due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email)


  1. Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, it’s due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 1

  2. Lecture 5: Backprop and intro to Neural Nets Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 2

  3. Linear Classification SVM: Softmax: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 3

  4. Optimization Landscape Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 4

  5. Gradient Descent Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 5

  6. This class: Becoming a backprop ninja Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 6

  7. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 7

  8. Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 8

  9. Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Question: If I increase x by h, how would the output of f change? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 9

  10. Compound expressions: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 10

  11. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 11

  12. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 12

  13. Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 13

  14. Another example: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 14

  15. Another example: -1/(1.37^2) = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 15

  16. Another example: [local gradient] x [its gradient] [1] x [-0.53] = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 16

  17. Another example: [local gradient] x [its gradient] [e^(-1)] x [-0.53] = -0.20 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 17

  18. Another example: [local gradient] x [its gradient] [-1] x [-0.2] = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 18

  19. Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 19

  20. Another example: [local gradient] x [its gradient] x0: [2] x [0.2] ~= 0.4 w0: [-1] x [0.2] = -0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 20

  21. a gate hanging out Every gate during backprop computes, for all its inputs: [LOCAL GRADIENT] x [GATE GRADIENT] Can be computed right away, The gate receives this during even during forward pass backpropagation Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 21

  22. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 22

  23. sigmoid function (0.73) * (1 - 0.73) = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 23

  24. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 24

  25. sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 25

  26. We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 26

  27. We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 27

  28. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 28

  29. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 29

  30. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 30

  31. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 31

  32. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 32

  33. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 33

  34. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 34

  35. forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 35

  36. Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 36

  37. Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 37

  38. Gradients for vectorized code X is [10 x 3], dD is [5 x 3] dW must be [5 x 10] dX must be [10 x 3] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 38

  39. Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 39

  40. In summary - in practice it is rarely needed to derive long gradients of variables on pen and paper - structured your code in stages (layers), where you can derive the local gradients, then chain the gradients during backprop . - caveat: sometimes gradients simplify (e.g. for sigmoid, also softmax). Group these. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 40

  41. NEURAL NETWORKS Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 41

  42. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 42

  43. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 43

  44. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 44

  45. sigmoid activation function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 45

  46. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 46

  47. A Single Neuron can be used as a binary linear classifier Regularization has the interpretation of “gradual forgetting” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 47

  48. Be very careful with your Brain analogies: Biological Neurons: - Many different types - Dendrites can perform complex non- linear computations - Synapses are not a single weight but a complex non-linear dynamical system - Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 48

  49. Activation Functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 49

Recommend


More recommend