Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion - PowerPoint PPT Presentation

Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion Section Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen

Agenda ● Motivation ● Backprop Tips & Tricks ● Matrix calculus primer

Motivation Recall: Optimization objective is minimize loss

Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the parameters to decrease the loss?

A Simple Example Loss Goal: Tweak the parameters to minimize loss => minimize a multivariable function in parameter space

A Simple Example => minimize a multivariable function Plotted on WolframAlpha

Approach #1: Random Search Intuition: the step we take in the domain of function

Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region

Approach #2: Numerical Gradient Intuition: rate of change of a function with respect to a variable surrounding a small region Finite Differences:

Approach #3: Analytical Gradient Recall : partial derivative by limit definition

Approach #3: Analytical Gradient Recall : chain rule

Approach #3: Analytical Gradient Recall : chain rule E.g.

Approach #3: Analytical Gradient Recall : chain rule Intuition: upstream gradient values propagate backwards -- we can reuse them!

Gradient “ direction and rate of fastest increase” Numerical Gradient vs Analytical Gradient

What about Autograd? ● Deep learning frameworks can automatically perform backprop! ● Problems might surface related to underlying gradients when debugging your models “Yes You Should Understand Backprop” https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

Problem Statement: Backpropagation Given a function f with respect to inputs x , labels y , and parameters 𝜄 compute the gradient of Loss with respect to 𝜄

Problem Statement: Backpropagation An algorithm for computing the gradient of a compound function as a series of local, intermediate gradients : 1. Identify intermediate functions (forward prop) 2. Compute local gradients (chain rule) 3. Combine with upstream error signal to get full gradient local(x,W,b) => y Input x y output W,b dx,dW,db <= grad_local(dy,x,W,b) dx dy dW,db

Modularity: Previous Example Compound function Intermediate Variables (forward propagation)

Modularity: 2-Layer Neural Network Compound function Intermediate Variables (forward propagation) => Squared Euclidean Distance between and

Intermediate Variables ? f(x;W,b) = Wx + b ? (forward propagation) (↑ lecture note) Input one feature vector (← here) Input a batch of data ( matrix )

1. intermediate functions Intermediate Variables Intermediate Gradients 2. local gradients (forward propagation) (backward propagation) 3. full gradients ？？？？？？？？？

Derivative w.r.t. Vector Scalar-by-Vector Vector-by-Vector

1. intermediate functions 2. local gradients Derivative w.r.t. Vector: Chain Rule 3. full gradients ?

Derivative w.r.t. Vector: Takeaway

Derivative w.r.t. Matrix Scalar-by-Matrix Vector-by-Matrix ?

Derivative w.r.t. Matrix: Dimension Balancing When you take scalar-by-matrix gradients The gradient has shape of denominator ● Dimension balancing is the “cheap” but efficient approach to gradient calculations in most practical settings

Derivative w.r.t. Matrix: Takeaway

1. intermediate functions Intermediate Variables Intermediate Gradients 2. local gradients (forward propagation) (backward propagation) 3. full gradients

Backprop Menu for Success 1. Write down variable graph 2. Keep track of error signals 3. Compute derivative of loss function 4. Enforce shape rule on error signals, especially when deriving over a linear transformation

Vector-by-vector ?

Matrix multiplication [Backprop] ? ?

Elementwise function [Backprop] ?

Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion - PowerPoint PPT Presentation

Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion Section Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen Agenda Motivation Backprop Tips & Tricks Matrix calculus primer Agenda Motivation

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

What does backpropagation compute? Edouard Pauwels (IRIT, Toulouse 3) joint work with J er

Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam

White Box : Website Frontend & Network visualization using Guided Backpropagation Neha Das

Neural Networks + Backpropagation Last Class Softmax Classifier Generalization /

Backpropagation Learning 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006 1

Online Planning 3/1/17 Q-Learning vs MCTS Dynamic programming Backpropagation Update

An Introduction to Neural Networks - Feedforward NN Backpropagation Agathe Merceron Beuth

Backpropagation Ryan Cotterell and Clara Meister Administrivia Changes in the Teaching Staff

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey

Neural Networks and Backpropagation Neural Net Readings: Matt Gormley Murphy

Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion - PowerPoint PPT Presentation

Backpropagation TA: Yi Wen April 17, 2020 CS231n Discussion Section Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen Agenda Motivation Backprop Tips & Tricks Matrix calculus primer Agenda Motivation

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

Neural Network Backpropagation 3-2-16 Recall from Monday... Perceptrons can only classify

Backpropagation and Gradients Agenda Motivation Backprop Tips &amp; Tricks

BBM406 Fundamentals of Machine Learning Lecture 12: Computational Graph Backpropagation

Unsupervised Domain Adaptation by Backpropagation Chih-Hui Ho, Xingyu Gu, Yuan Qi Outline

What does backpropagation compute? Edouard Pauwels (IRIT, Toulouse 3) joint work with J er

Natural Language Understanding Lecture 2: Revision of neural networks and backpropagation Adam

White Box : Website Frontend &amp; Network visualization using Guided Backpropagation Neha Das

Neural Networks + Backpropagation Last Class Softmax Classifier Generalization /

Backpropagation Learning 15-486/782: Artificial Neural Networks David S. Touretzky Fall 2006 1

Online Planning 3/1/17 Q-Learning vs MCTS Dynamic programming Backpropagation Update

An Introduction to Neural Networks - Feedforward NN Backpropagation Agathe Merceron Beuth

Backpropagation Ryan Cotterell and Clara Meister Administrivia Changes in the Teaching Staff

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

Neural Networks: Backpropagation Machine Learning Based on slides and material from Geoffrey

Neural Networks and Backpropagation Neural Net Readings: Matt Gormley Murphy

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks

White Box : Website Frontend & Network visualization using Guided Backpropagation Neha Das