Administrative - A1 is due Today (midnight). You can use up to 3 - PowerPoint PPT Presentation

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, it’s due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 1

Lecture 5: Backprop and intro to Neural Nets Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 2

Linear Classification SVM: Softmax: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 3

Optimization Landscape Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 4

Gradient Descent Numerical gradient : slow :(, approximate :(, easy to write :) Analytic gradient : fast :), exact :), error-prone :( In practice: Derive analytic gradient, check your implementation with numerical gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 5

This class: Becoming a backprop ninja Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 6

Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 7

Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 8

Example: x = 4, y = -3. => f(x,y) = -12 partial derivatives gradient Question: If I increase x by h, how would the output of f change? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 9

Compound expressions: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 10

Compound expressions: Chain rule: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 11

Another example: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 14

Another example: -1/(1.37^2) = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 15

Another example: [local gradient] x [its gradient] [1] x [-0.53] = -0.53 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 16

Another example: [local gradient] x [its gradient] [e^(-1)] x [-0.53] = -0.20 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 17

Another example: [local gradient] x [its gradient] [-1] x [-0.2] = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 18

Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 19

Another example: [local gradient] x [its gradient] x0: [2] x [0.2] ~= 0.4 w0: [-1] x [0.2] = -0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 20

a gate hanging out Every gate during backprop computes, for all its inputs: [LOCAL GRADIENT] x [GATE GRADIENT] Can be computed right away, The gate receives this during even during forward pass backpropagation Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 21

sigmoid function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 22

sigmoid function (0.73) * (1 - 0.73) = 0.2 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 23

We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 26

We are ready: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 27

forward pass was: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 28

Patterns in backward flow add gate: gradient distributor max gate: gradient router mul gate: gradient… “switcher”? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 36

Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 37

Gradients for vectorized code X is [10 x 3], dD is [5 x 3] dW must be [5 x 10] dX must be [10 x 3] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 38

Gradients for vectorized code Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 39

In summary - in practice it is rarely needed to derive long gradients of variables on pen and paper - structured your code in stages (layers), where you can derive the local gradients, then chain the gradients during backprop . - caveat: sometimes gradients simplify (e.g. for sigmoid, also softmax). Group these. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 40

NEURAL NETWORKS Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 41

sigmoid activation function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 45

A Single Neuron can be used as a binary linear classifier Regularization has the interpretation of “gradual forgetting” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 47

Be very careful with your Brain analogies: Biological Neurons: - Many different types - Dendrites can perform complex non- linear computations - Synapses are not a single weight but a complex non-linear dynamical system - Rate code may not be adequate [Dendritic Computation. London and Hausser] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 48

Activation Functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 5 - Lecture 5 - 21 Jan 2015 21 Jan 2015 49

Administrative - A1 is due Today (midnight). You can use up to 3 - PowerPoint PPT Presentation

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this Friday, its due next next Wednesday (Feb 4) - Project Proposal is due next Friday at midnight (~one paragraph (200-400 words), send as email)

Using Synchronization Administrative Lab 2 due TONIGHT at MIDNIGHT. Sign up for demos on the

Logistics About the readings: Intro to Light Undergrads: Due Friday by midnight

Lecture 4: Image pyramids PS1 due at midnight PS2 out, due next Tues. No Thursday

Using Synchronization Using Synchronization Administrative Administrative Lab 2 due Thursday at

Problem Set 0 PS 0 Due Tomorrow at Midnight Anyone can turn it in (waitlist = OK)

DTTF/NB479: Dszquphsbqiz Day 24 Announcements: Term project groups and topics due midnight 1.

Class logistics Tonight midnight, the take-home exam is due. Next week: spring break

2014 PROJECT GRADUATION SENIOR MIDNIGHT CRUISE - 6/20/2014 2014 SENIORS ARE INVITED TO CELEBRATE

Midnight Sun Mining

Project Midnight Southray (BDF 150014) Drug preventive service for high risk non-Chinese ethnic

Jesus tells a parable about a midnight request of a friend. This friend needs to borrow

What is it? You can hold it. It can wander. You can attract it. You can turn it.

ADMINISTRATIVE PANEL ADMINISTRATIVE PANEL Administrative panel is an instrument which helps

6.s096 Lecture 2 1 Thursday, January 10, 13 Administrative Notes Assignment 1 due at

Lec06: DEP and ASLR Taesoo Kim 2 Scoreboard 3 Administrivia Due: Lab04 is extended for

Memory Hierarchy Lecture 25 CS301 Administrative Program #3 due Friday, 12/7 at 4:59pm

Ternary system (Fo-An-SiO 2 ) with intermediate compound (peritectic) Phase rule: At points c

ATCO Amateur Television in Central Ohio Digital Television The New Ham Frontier Art Towslee

VLC and VideoLAN what you might know, what you should know and what you dont know Felix

C L A S S R O O M R O C K E T S C I E N T I S T W H AT I S A S AT E L L I T E ? A satellite

Comparison of Approaches for Querying Chemical Compounds Vojtch pek, Irena Holubov,

The Easy (and Free!) way to implement drill-through

Performance metrics How is my parallel code performing and scaling? Performance metrics A

DataLab: Introducing Software Engineering Thinking into Data Science Education at Scale Yang Zhang