CS 4803 / 7643: Deep Learning Topics: Regularization Neural - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – Regularization – Neural Networks – Optimization – Computing Gradients Zsolt Kira Georgia Tech

Administrativia • HW0 Reminder – Due: 01/18, 11:55pm • Plagiarism – No Tolerance • Office hours have started (one for every day!) – CCB 222 for instructor – CCB 345 for Tas • Sign up for Piazza if you haven’t! (C) Dhruv Batra 2

Computing • Major bottleneck – GPUs • Options – Google colaboratory allows free TPU access!! • https://colab.research.google.com/notebooks/welcome.ipynb – Google Cloud Credits • courtesy Google – details forthcoming for next HW – PACE-ICE • https://pace.gatech.edu/sites/default/files/pace-ice_orientation_1.pdf (C) Dhruv Batra and Zsolt Kira 3

Recap from last time (C) Dhruv Batra and Zsolt Kira 4

Parametric Approach: Linear Classifier 3072x1 f(x,W) = Wx + b 10x1 Image 10x1 10x3072 10 numbers giving f( x , W ) class scores Array of 32x32x3 numbers W (3072 numbers total) parameters or weights Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Error Decomposition Reality Multi-class Logistic Regression Softmax FC HxWx3 Input (C) Dhruv Batra and Zsolt Kira 6

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 56 231 231 24 2 24 Input image 2 7 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Example with an image with 4 pixels, and 3 classes (cat/dog/ship) Stretch pixels into column 56 0.2 -0.5 0.1 2.0 1.1 -96.8 Cat score 56 231 231 + = 1.5 1.3 2.1 0.0 3.2 437.9 Dog score 24 2 24 0 0.25 0.2 -0.3 -1.2 61.95 Ship score Input image 2 b W 8 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Linear Classifier: Three Viewpoints Visual Viewpoint Algebraic Viewpoint Geometric Viewpoint One template Hyperplanes f(x,W) = Wx per class cutting up space 9 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Recall from last time : Linear Classifier TODO: 1. Define a loss function that quantifies our unhappiness with the scores across the training data. 1. Come up with a way of efficiently finding the parameters that minimize the loss function. (optimization) Cat image by Nikita is licensed under CC-BY 2.0; Car image is CC0 1.0 public domain; Frog image is in the public domain Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax vs. SVM Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example “Hinge loss” where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax vs. SVM Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function 3.2 cat 5.1 car -1.7 frog 15 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities must be >= 0 3.2 24.5 cat exp 5.1 164.0 car -1.7 0.18 frog unnormalized probabilities 16 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog unnormalized probabilities probabilities 17 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog Unnormalized log- unnormalized probabilities probabilities / logits probabilities 18 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat L i = -log(0.13) = 2.04 exp normalize 5.1 164.0 0.87 car -1.7 0.18 0.00 frog Unnormalized log- unnormalized probabilities probabilities / logits probabilities 19 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 cat L i = -log(0.13) = 2.04 exp normalize 5.1 164.0 0.87 car Maximum Likelihood Estimation -1.7 0.18 0.00 frog Choose probabilities to maximize the likelihood of the observed data Unnormalized log- unnormalized probabilities probabilities / logits probabilities 20 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Log-Likelihood / KL-Divergence / Cross-Entropy (C) Dhruv Batra and Zsolt Kira 21

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize 5.1 164.0 0.87 0.00 car -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 22 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize Kullback–Leibler 5.1 164.0 0.87 0.00 car divergence -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 23 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Softmax Classifier (Multinomial Logistic Regression) Want to interpret raw classifier scores as probabilities Softmax Function Probabilities Probabilities must be >= 0 must sum to 1 3.2 24.5 0.13 1.00 cat compare exp normalize 5.1 164.0 0.87 0.00 car Cross Entropy -1.7 0.18 0.00 0.00 frog Unnormalized log- unnormalized probabilities Correct probabilities / logits probabilities probs 24 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Plan for Today • Regularization • Neural Networks • Optimization • Computing Gradients (C) Dhruv Batra and Zsolt Kira 26

Regularization Data loss : Model predictions should match training data 27 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Regularization Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 28 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Regularization = regularization strength (hyperparameter) Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 29 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Model Complexity y x 30 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Polynomial Regression f y x 31 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Regularization: Prefer Simpler Models f 1 f 2 y x Regularization pushes against fitting the data too well so we don’t fit noise in the data 32 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Polynomial Regression (C) Dhruv Batra and Zsolt Kira 33

Polynomial Regression • Demo: – https://arachnoid.com/polysolve/ • Data: – 10 6 – 15 9 – 20 11 – 25 12 – 29 13 – 40 11 – 50 10 – 60 9 (C) Dhruv Batra and Zsolt Kira 35

Regularization = regularization strength (hyperparameter) Data loss : Model predictions Regularization : Prevent the model should match training data from doing too well on training data 37 Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

CS 4803 / 7643: Deep Learning Topics: Regularization Neural - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization Computing Gradients Zsolt Kira Georgia Tech Administrativia HW0 Reminder Due: 01/18, 11:55pm Plagiarism No Tolerance

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish

Risk Management Strategy Implementation Slides Head of Quality & Risk June 2016

Game theory (Ch. 17.5) MCTS How to find which actions are good? The Upper Confidence

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

PIP-II 800 MeV Linac Fernanda G. Garcia PIP-II Machine Advisory Committee Meeting 15-17 March

Discount Factor as a Regularizer in RL Ron Amit , Ron Meir (Technion) , Kamil Ciosek (MSR)

SELL MORE PRODUCTS ONLINE STEP FIVE SELL THEM STUFF PART 2 jane hamill 1 THE PLUS SIZED MARKET

Announcements Homework 2 Due 2/11 (today) at 11:59pm Electronic HW2 Written HW2

CS 4803 / 7643: Deep Learning Topics: Regularization Neural - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization Computing Gradients Zsolt Kira Georgia Tech Administrativia HW0 Reminder Due: 01/18, 11:55pm Plagiarism No Tolerance

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish

Risk Management Strategy Implementation Slides Head of Quality &amp; Risk June 2016

Game theory (Ch. 17.5) MCTS How to find which actions are good? The Upper Confidence

Deep Learning for Mobile Part I Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps

PIP-II 800 MeV Linac Fernanda G. Garcia PIP-II Machine Advisory Committee Meeting 15-17 March

Discount Factor as a Regularizer in RL Ron Amit , Ron Meir (Technion) , Kamil Ciosek (MSR)

SELL MORE PRODUCTS ONLINE STEP FIVE SELL THEM STUFF PART 2 jane hamill 1 THE PLUS SIZED MARKET

Announcements Homework 2 Due 2/11 (today) at 11:59pm Electronic HW2 Written HW2

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

Risk Management Strategy Implementation Slides Head of Quality & Risk June 2016