Lecture 12: Computational Graph Backpropagation Aykut Erdem March - PowerPoint PPT Presentation

Lecture 12: − Computational Graph − Backpropagation Aykut Erdem March 2016 Hacettepe University

Administrative • Assignment 2 due March 20, 2016!   • Midterm exam on Thursday, March 24, 2016 − You are responsible from the beginning till the end of this class − You can prepare and bring a full-page copy sheet (A4-paper, both sides) to the exam.   • Assignment 3 will be out soon! − It is due April 7, 2016 − You will implement a 2-layer Neural Network 2

Last time…   Multilayer Perceptron • Layer Representation y W 4 y i = W i x i x i +1 = σ ( y i ) x4 W 3 • (typically) iterate between   x3 linear mapping Wx and   W 2 nonlinear function x2 • Loss function   l ( y, y i ) W 1 to measure quality of   estimate so far slide by Alex Smola x1 3

Last time… Forward Pass • Output of the network can be written as: D X X h j ( x ) = f ( v j 0 + x i v ji ) i =1 J slide by Raquel Urtasun, Richard Zemel, Sanja Fidler X o k ( x ) = g ( w k 0 + h j ( x ) w kj ) j =1 (j indexing hidden units, k indexing the output units, D number of inputs) • Activation functions f , g : sigmoid/logistic, tanh, or rectified linear (ReLU) 1 + exp( − z ) , tanh ( z ) = exp( z ) − exp( − z ) 1 σ ( z ) = exp( z ) + exp( − z ) , ReLU ( z ) = max(0 , z ) 4

      Last time… Forward Pass in Python • Example code for a forward pass for a 3-layer network in Python:   slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • Can be implemented e ffi ciently using matrix operations • Example above: W 1 is matrix of size 4 × 3, W 2 is 4 × 4. What about biases and W 3 ? 5 [http://cs231n.github.io/neural-networks-1/]

Today • Backpropagation and Neural Networks • Tips and Tricks 6

Backpropagation and Neural Networks 7

Recap: Loss function/Optimization TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training 3.42 -3.45 -0.51 data.   -8.87 4.64 6.04 0.09 2.65 5.31 2. Come up with a way of 2.9 5.1 -4.22 efficiently finding the 4.48 2.64 parameters that minimize the -4.19 loss function. (optimization) 8.02 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 5.55 3.58 3.78 -4.34 4.49 1.06 -1.5 -4.37 -0.36 -4.79 -2.09 -0.72 6.14 -2.93 We defined a (linear) score function: 8

Softmax Classifier (Multinomial 9 Logistic Regression) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Optimization 19

20 Gradient Descent slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 21

Mini-batch Gradient Descent • only use a small portion of the training set to compute the gradient slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson there are also more fancy update formulas (momentum, Adagrad, RMSProp, Adam, … ) 22

The e ff ects of di ff erent update form formulas slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 23 (image credits to Alec Radford)

24 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

25 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Back-propagation 26

27 L Computational Graph + R hinge loss s (scores) * x W slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Convolutional Network (AlexNet) input image weights slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson loss 28

29 Neural Turing Machine loss input tape slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

30 e.g. x = -2, y = 5, z = -4 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

31 e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

40 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

42 Chain rule: e.g. x = -2, y = 5, z = -4 Want: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

43 f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

44 “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

45 gradients “local gradient” f activations slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

49 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

58 (-1) * (-0.20) = 0.20 Another example: slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Another example: [local gradient] x [its gradient] [1] x [0.2] = 0.2 [1] x [0.2] = 0.2 (both inputs!) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 60

Another example: [local gradient] x [its gradient] x0: [2] x [0.2] = 0.4 w0: [-1] x [0.2] = -0.2 slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 62

63 sigmoid function sigmoid gate slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

sigmoid function slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson sigmoid gate (0.73) * (1 - 0.73) = 0.2 64

Patterns in backward flow • add gate: gradient distributor • max gate: gradient router • mul gate: gradient… “switcher”? slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 65

66 Gradients add at branches + slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Implementation: forward/backward API Graph (or Net) object. (Rough pseudo code) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 67

Implementation: forward/backward API x z * y (x,y,z are scalars) slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson 68

Lecture 12: Computational Graph Backpropagation Aykut Erdem March - PowerPoint PPT Presentation

Lecture 12: Computational Graph Backpropagation Aykut Erdem March 2016 Hacettepe University Administrative Assignment 2 due March 20, 2016! Midterm exam on Thursday, March 24, 2016 You are responsible from the beginning

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

CS4811 Neural Network Training Example Consider the following network. It has two inputs (two

Reminder: Linear Classifiers CS 188: Artificial Intelligence Optimization and Neural Nets

Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu

Feedforward neural nets CSE 250B Outline 1 Architecture 2 Expressivity 3 Learning The

Logistic Regression INFO-4604, Applied Machine Learning University of Colorado Boulder September

Anartificialneuron Artificialneuralnetworks y = f ( S ) x 0 =+1 Background

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for