CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – Specifying Layers – Forward & Backward autodifferentiation – (Beginning of) Convolutional neural networks Zsolt Kira Georgia Tech

Administrivia • PS0 released – mean of 20.7 – standard deviation of 3.4 – median of 21 – max of 25 – See me if you did not pass • PS1/HW1 out • Start thinking about project topics/teams – More details on project next time (C) Dhruv Batra & Zsolt Kira 2

Recap from last time (C) Dhruv Batra & Zsolt Kira 3

Gradient Descent Pseudocode for i in {0,…,num_epochs}: for x, y in data: Some design decisions: • How many examples to use to calculate gradient per iteration? • What should alpha (learning rate) be? • Should it be constant throughout? • How many epochs to run to?

Computational Graph Any DAG of differentiable modules is allowed! (C) Dhruv Batra & Zsolt Kira 5 Slide Credit: Marc'Aurelio Ranzato

Key Computation: Back-Prop (C) Dhruv Batra & Zsolt Kira 6 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] (C) Dhruv Batra & Zsolt Kira 7 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Neural Network Training • Step 1: Compute Loss on mini-batch [F-Pass] • Step 2: Compute gradients wrt parameters [B-Pass] (C) Dhruv Batra & Zsolt Kira 8 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

General Flow Graphs “Deep Learning” book, Bengio

Jacobian of ReLU 4096-d 4096-d g(x) = max(0,x) input vector output vector (elementwise) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Jacobian of ReLU 4096-d 4096-d g(x) = max(0,x) input vector output vector (elementwise) Q: what is the size of the Jacobian matrix? 13 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Jacobian of ReLU 4096-d 4096-d g(x) = max(0,x) input vector output vector (elementwise) Q: what is the size of the Jacobian matrix? [4096 x 4096!] 14 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Jacobian of ReLU 4096-d 4096-d g(x) = max(0,x) input vector output vector (elementwise) Q: what is the Q2: what does it size of the look like? Jacobian matrix? [4096 x 4096!] Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Plan for Today • Specifying Layers • Forward & Backward auto-differentiation • (Beginning of) Convolutional neural networks (C) Dhruv Batra & Zsolt Kira 17

Deep Learning = Differentiable Programming • Computation = Graph – Input = Data + Parameters – Output = Loss – Scheduling = Topological ordering • What do we need to do? – Generic code for representing the graph of modules – Specify modules (both forward and backward function) (C) Dhruv Batra & Zsolt Kira 18

Modularized implementation: forward / backward API Graph (or Net) object (rough psuedo code) 19 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 20 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Modularized implementation: forward / backward API x z * y (x,y,z are scalars) 21 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example: Caffe layers Caffe is licensed under BSD 2-Clause 22 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Caffe Sigmoid Layer * top_diff (chain rule) Caffe is licensed under BSD 2-Clause 23 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Deep Learning = Differentiable Programming • Computation = Graph – Input = Data + Parameters – Output = Loss – Scheduling = Topological ordering • Auto-Diff – A family of algorithms for implementing chain-rule on computation graphs (C) Dhruv Batra & Zsolt Kira 24

Forward mode vs Reverse Mode • Key Computations (C) Dhruv Batra & Zsolt Kira 25

Forward mode AD g 26

Reverse mode AD g 27

Example: Forward mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 28

Example: Forward mode AD Q: What happens if there’s another input variable x 3 ? + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 31

Example: Forward mode AD Q: What happens if there’s another input variable x 3 ? A: more sophisticated graph; + d “forward props” for d variables sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 32

Example: Forward mode AD Q: What happens if there’s another output variable f 2 ? + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 33

Example: Forward mode AD Q: What happens if there’s another output variable f 2 ? A: more sophisticated graph; + single “forward prop” sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 34

Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 35

Example: Reverse mode AD + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 36

Gradients add at branches + Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example: Reverse mode AD Q: What happens if there’s another input variable x 3 ? + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 38

Example: Reverse mode AD Q: What happens if there’s another input variable x 3 ? A: more sophisticated graph; + single “backward prop” sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 39

Example: Reverse mode AD Q: What happens if there’s another output variable f 2 ? + sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 40

Example: Reverse mode AD Q: What happens if there’s another output variable f 2 ? A: more sophisticated graph; + c “backward props” for c vars sin( ) * x 1 x 2 (C) Dhruv Batra & Zsolt Kira 41

Forward mode vs Reverse Mode • x  Graph  L • Intuition of Jacobian (C) Dhruv Batra & Zsolt Kira 42

Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? (C) Dhruv Batra & Zsolt Kira 43

Forward mode vs Reverse Mode • What are the differences? • Which one is faster to compute? – Forward or backward? • Which one is more memory efficient (less storage)? – Forward or backward? + + sin( ) sin( ) * * x 1 x 2 x 1 x 2 (C) Dhruv Batra & Zsolt Kira 44

Practical Note 2: Software Frameworks A few weeks ago! +Keras Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

PyTorch

Plan for Today (Cont.) • Specifying Layers • Forward & Backward auto-differentiation • (Beginning of) Convolutional neural networks – What is a convolution? – FC vs Conv Layers (C) Dhruv Batra & Zsolt Kira 48

Recall: Linear Classifier 3072x1 f(x,W) = Wx + b 10x1 Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score 56 231 231 + = 1.5 1.3 2.1 0.0 3.2 437.9 Dog score 24 2 24 0 0.25 0.2 -0.3 -1.2 61.95 Ship score Input image 2 b W 50 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Recall: (Fully-Connected) Neural networks ( Before ) Linear score function: ( Now ) 2-layer Neural Network x h s W1 W2 10 3072 100 51 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Convolutional Neural Networks (without the brain stuff) Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Fully Connected Layer Example: 200x200 image 40K hidden units ~2B parameters !!! - Spatial correlation is local - Waste of resources + we have not enough training samples anyway.. 53 Slide Credit: Marc'Aurelio Ranzato

Locally Connected Layer Example: 200x200 image 40K hidden units “Filter” size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face 54 Slide Credit: Marc'Aurelio Ranzato recognition).

Locally Connected Layer STATIONARITY? Statistics similar at all locations 55 Slide Credit: Marc'Aurelio Ranzato

Convolutional Layer Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels 56 Slide Credit: Marc'Aurelio Ranzato

What filter to use?

Discrete convolution • Discrete Convolution! • Very similar to correlation but associative 1D Convolution 2D Convolution Filter

A note on sizes m N-m +1 N m N N-m +1 Filter Input Output MATLAB to the rescue! • conv2(x,w, ‘valid’)

Convolutions! • Math vs. CS vs. programming viewpoints (C) Dhruv Batra & Zsolt Kira 60

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward autodifferentiation (Beginning of) Convolutional neural networks Zsolt Kira Georgia Tech Administrivia PS0 released mean of 20.7

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

An Economical Business-Cycle Model Pascal Michaillat (LSE) & Emmanuel Saez (Berkeley) April

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

OPAL : Opportunistic Alignment of Advertisement Delivery with Cellular Base station Overloads Ravi

Automatic Differentiation (or Differentiable Programming) Atlm Gne Baydin National

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

CASE STUDY: MEDICAL COMPANY Client who sells medical devices Large ad spend managing 6

Electrospun nanofiber materials for high power target applications Sujit Bidhar 21-22 nd Sept.,

AdS/CFT and Bubbling Geometries: Going Beyond the BPS Sector Sera Cremonini University of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward autodifferentiation (Beginning of) Convolutional neural networks Zsolt Kira Georgia Tech Administrivia PS0 released mean of 20.7

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

An Economical Business-Cycle Model Pascal Michaillat (LSE) &amp; Emmanuel Saez (Berkeley) April

Contextual Advertising: Contextual Advertising: Semantic Approach Semantic Approach Ekaterina

OPAL : Opportunistic Alignment of Advertisement Delivery with Cellular Base station Overloads Ravi

Automatic Differentiation (or Differentiable Programming) Atlm Gne Baydin National

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

CASE STUDY: MEDICAL COMPANY Client who sells medical devices Large ad spend managing 6

Electrospun nanofiber materials for high power target applications Sujit Bidhar 21-22 nd Sept.,

AdS/CFT and Bubbling Geometries: Going Beyond the BPS Sector Sera Cremonini University of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

An Economical Business-Cycle Model Pascal Michaillat (LSE) & Emmanuel Saez (Berkeley) April