Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, it’s due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 1
Lecture 5: Backprop and intro to Neural Nets Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 2
Linear Classification SVM: Softmax: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 3
Optimization Landscape Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 4
Gradient Descent Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 5
This class: Becoming a backprop ninja Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 6
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 7
Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 8
Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Question: If I increase x by h, how would the output of f change? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 9
Compound expressions: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 10
Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 11
Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 12
Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 13
Another example: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 14
Another example: -1/(1.37^2) = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 15
Another example: [local gradient] x [its gradient] [1] x [-0.53] = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 16
Another example: [local gradient] x [its gradient] [e^(-1)] x [-0.53] = -0.20 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 17
Another example: [local gradient] x [its gradient] [-1] x [-0.2] = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 18
Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 19
Another example: [local gradient] x [its gradient] x0: [2] x [0.2] ~= 0.4 w0: [-1] x [0.2] = -0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 20
a gate hanging out Every gate during backprop computes, for all its inputs: [LOCAL GRADIENT] x [GATE GRADIENT] Can be computed right away, The gate receives this during even during forward pass backpropagation Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 21
sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 22
sigmoid function (0.73) * (1 - 0.73) = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 23
sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 24
sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 25
We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 26
We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 27
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 28
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 29
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 30
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 31
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 32
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 33
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 34
forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 35
Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 36
Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 37
Gradients for vectorized code X is [10 x 3], dD is [5 x 3] dW must be [5 x 10] dX must be [10 x 3] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 38
Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 39
In summary - in practice it is rarely needed to derive long gradients of variables on pen and paper - structured your code in stages (layers), where you can derive the local gradients, then chain the gradients during backprop . - caveat: sometimes gradients simplify (e.g. for sigmoid, also softmax). Group these. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 40
NEURAL NETWORKS Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 41
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 42
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 43
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 44
sigmoid activation function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 45
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 46
A Single Neuron can be used as a binary linear classifier Regularization has the interpretation of “gradual forgetting” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 47
Be very careful with your Brain analogies: Biological Neurons: - Many different types - Dendrites can perform complex non- linear computations - Synapses are not a single weight but a complex non-linear dynamical system - Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 48
Activation Functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 49
Recommend
More recommend