Backpropagation and Gradients Agenda Motivation Backprop Tips - PowerPoint PPT Presentation

Backpropagation and Gradients

Agenda ● Motivation Backprop Tips & Tricks ● ● Matrix calculus primer ● Example: 2-layer Neural Network

Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the parameters to decrease the loss slightly? Plotted on WolframAlpha

Approach #1: Random search Intuition: the way we tweak parameters is the direction we step in our optimization What if we randomly choose a direction?

Approach #2: Numerical gradient Intuition: gradient describes rate of change of a function with respect to a variable surrounding an infinitesimally small region Finite Differences: Challenge: how do we compute the gradient independent of each input?

Approach #3: Analytical gradient Recall : chain rule Assuming we know the structure of the computational graph beforehand… Intuition: upstream gradient values propagate backwards -- we can reuse them!

What about autograd? ● Deep learning frameworks can automatically perform backprop! ● Problems might surface related to underlying gradients when debugging your models “Yes You Should Understand Backprop” https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

Problem Statement Given a function f with respect to inputs x , labels y , and parameters � compute the gradient of Loss with respect to �

Backpropagation An algorithm for computing the gradient of a compound function as a series of local, intermediate gradients

Backpropagation 1. Identify intermediate functions (forward prop) 2. Compute local gradients 3. Combine with upstream error signal to get full gradient

Modularity - Simple Example Compound function Intermediate Variables (forward propagation)

Modularity - Neural Network Example Compound function Intermediate Variables (forward propagation)

Intermediate Variables Intermediate Gradients (forward propagation) (backward propagation)

Chain Rule Behavior Key chain rule intuition: Slopes multiply

Circuit Intuition

Matrix Calculus Primer Scalar-by-Vector Vector-by-Vector

Matrix Calculus Primer Scalar-by-Matrix Vector-by-Matrix

Vector-by-Matrix Gradients Let

Backpropagation Shape Rule When you take gradients against a scalar The gradient at each intermediate step has shape of denominator

Dimension Balancing

Dimension Balancing Dimension balancing is the “cheap” but efficient approach to gradient calculations in most practical settings Read gradient computation notes to understand how to derive matrix expressions for gradients from first principles

Activation Function Gradients is an element-wise function on each index of h (scalar-to-scalar) Officially, Diagonal matrix represents that and have no dependence if

Activation Function Gradients Element-wise multiplication (hadamard product) corresponds to matrix product with a diagonal matrix

Backprop Menu for Success 1. Write down variable graph 2. Compute derivative of cost function 3. Keep track of error signals 4. Enforce shape rule on error signals 5. Use matrix balancing when deriving over a linear transformation

As promised: A matrix example... ? ? Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - 76 April 13, 2017

As promised: A matrix example... import numpy as np # forward prop z_1 = np.dot(X, W_1) h_1 = np.maximum(z_1, 0) y_hat = np.dot(h_1, W_2) L = np.sum(y_hat**2) # backward prop dy_hat = 2.0*y_hat dW2 = h_1.T.dot(dy_hat) dh1 = dy_hat.dot(W_2.T) dz1 = dh1.copy() dz1[z1 < 0] = 0 dW1 = X.T.dot(dz1)

Backpropagation and Gradients Agenda Motivation Backprop Tips - PowerPoint PPT Presentation

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks Matrix calculus primer Example: 2-layer Neural Network Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Outline Last time Image gradients Seam carving gradients as energy Edges

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 4:

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020

Union High School Class of 2020 Agenda: Senior Reminders Graduation Information

CSC2547: Learning to Search Lecture 2: Background and gradient esitmators Sept 20, 2019

Binary Activated Neural Networks via Continuous Binarization Charbel Sakr *# , Jungwook Choi + ,

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

Selected Topics in Optimization Some slides borrowed from

Backpropagation and Gradients Agenda Motivation Backprop Tips - PowerPoint PPT Presentation

Backpropagation and Gradients Agenda Motivation Backprop Tips & Tricks Matrix calculus primer Example: 2-layer Neural Network Motivation Recall: Optimization objective is minimize loss Goal: how should we tweak the

Backpropagation Why backpropagation Neural networks are sequences of parametrized functions

MLPs with Backpropagation CS 472 Backpropagation 1 Multilayer Nets? Linear Systems F(cx) =

CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1

Outline Last time Image gradients Seam carving gradients as energy Edges

Blended Conditional Gradients: The unconditioning of conditional gradients Joint work with Gabor

Neural Networks for Machine Learning Lecture 13a The ups and downs of backpropagation Geoffrey

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&amp;A 3 BACKPROPAGATION 4 A

Implicit Reparameterization Gradients Michael Figurnov, Shakir Mohamed, Andriy Mnih Poster: Room

Natural Policy Gradients (cont.) Katerina Fragkiadaki Revision Policy Gradients 1.

The oxygen abundance gradients of galaxies in the Eagle simulations Patricia B. Tissera

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB,

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November

Natural Language Processing with Deep Learning CS224N/Ling284 Christopher Manning Lecture 4:

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

AMMI Introduction to Deep Learning 5.3. PyTorch optimizers Fran cois Fleuret

autograd January 31, 2019 1 Automatic Differentiation 1.1 Import autograd and create a variable

Choosing Your Advisor Andrew Wood and Nadezhda Voronova CS 697: Graduate Initiation 2/05/2020

Union High School Class of 2020 Agenda: Senior Reminders Graduation Information

CSC2547: Learning to Search Lecture 2: Background and gradient esitmators Sept 20, 2019

Binary Activated Neural Networks via Continuous Binarization Charbel Sakr *# , Jungwook Choi + ,

CS 6316 Machine Learning Neural Networks Yangfeng Ji Department of Computer Science University

Selected Topics in Optimization Some slides borrowed from

Backpropagation Matt Gormley Lecture 12 Oct 10, 2018 1 Q&A 3 BACKPROPAGATION 4 A