Administrative - A2 is out. It was late 2 days so due date will be - PowerPoint PPT Presentation

Administrative - A2 is out. It was late 2 days so due date will be shifted by ~2 days. - we updated the project page with many pointers to datasets. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 1

Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 2

Backpropagation (recursive chain rule) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 5

Mini-batch Gradient descent Loop: 1. Sample a batch of data 2. Backprop to calculate the analytic gradient 3. Perform a parameter update Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 6

A bit of history Widrow and Hoff, ~1960: Adaline Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 7

A bit of history recognizable maths Rumelhart et al. 1986: First time back-propagation became popular Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 8

A bit of history [Hinton and Salkhutdinov 2006] Reinvigorated research in Deep Learning Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 9

Training Neural Networks Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 10

Step 1: Preprocess the data (Assume X [NxD] is data matrix, each example in a row) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 11

Step 1: Preprocess the data In practice, you may also see PCA and Whitening of the data (data has diagonal (covariance matrix is the covariance matrix) identity matrix) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 12

Step 2: Choose the architecture: say we start with one hidden layer of 50 neurons: 50 hidden neurons 10 output output layer neurons, one input CIFAR-10 per class hidden layer layer images, 3072 numbers Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 13

Before we try training, lets initialize well: - set weights to small random numbers (Matrix of small numbers drawn randomly from a gaussian) Warning : This is not optimal, but simplest! (More on this later) - set biases to zero Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 14

Double check that the loss is reasonable: disable regularization loss ~2.3. returns the loss and the “correct “ for gradient for all parameters 10 classes Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 15

Double check that the loss is reasonable: crank up regularization loss went up, good. (sanity check) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 16

Lets try to train now… Tip : Make sure that you can overfit very small portion of the The above code: - take the first 20 examples from CIFAR-10 data - turn off regularization (reg = 0.0) - use simple vanilla ‘sgd’ details: - (learning_rate_decay = 1 means no decay, the learning rate will stay constant) - sample_batches = False means we’re doing full gradient descent, not mini-batch SGD - we’ll perform 200 updates (epochs = 200) “ epoch ”: number of times we see the training set Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 17

Lets try to train now… Tip : Make sure that you can overfit very small portion of the data Very small loss, train accuracy 1.00, nice! Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 18

Lets try to train now… I like to start with small regularization and find learning rate that makes the loss go down. loss not going down: learning rate too low loss exploding: learning rate too high Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 19

Lets try to train now… I like to start with small regularization and find learning rate that makes the loss go down. Loss barely changing: Learning rate must loss not going down: be too low. (could also be reg too high) learning rate too low Notice train/val accuracy goes to 20% loss exploding: though, what’s up with that? (remember learning rate too high this is softmax) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 20

Lets try to train now… I like to start with small regularization and find learning rate that Okay now lets try learning rate 1e6. What could makes the loss go possibly go wrong? down. loss not going down: learning rate too low loss exploding: learning rate too high Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 21

Lets try to train now… I like to start with small regularization and find learning rate that makes the loss go down. cost: NaN almost loss not going down: always means high learning rate too low learning rate... loss exploding: learning rate too high Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 22

Lets try to train now… I like to start with small regularization and find learning rate that makes the loss go down. 3e-3 is still too high. Cost explodes…. loss not going down: => Rough range for learning rate we learning rate too low should be cross-validating is loss exploding: somewhere [1e-3 … 1e-5] learning rate too high Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 23

Cross-validation strategy I like to do coarse -> fine cross-validation in stages First stage : only a few epochs to get rough idea of what params work Second stage : longer running time, finer search … (repeat as necessary) Tip for detecting explosions in the solver: If the cost is ever > 3 * original cost, break out early Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 24

For example: run coarse search for 5 epochs note it’s best to optimize in log space nice Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 25

Now run finer search... adjust range 53% - relatively good for a 2-layer neural net with 50 hidden neurons. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 26

Now run finer search... adjust range 53% - relatively good for a 2-layer neural net with 50 hidden neurons. But this best cross- validation result is worrying. Why? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 27

Normally you can’t afford a huge computational budget for expensive cross-validations. Need to rely more on intuitions and visualizations… Visualizations to play with: - loss function - validation and training accuracy - min,max,std for values and updates , (and monitor their ratio) - first-layer visualization of weights (if working with images) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 28

Monitor and visualize the loss curve If this looks too linear: learning rate is low. If it doesn’t decrease much: learning rate might be too high Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 29

Monitor and visualize the loss curve If this looks too linear: learning rate is low. If it doesn’t decrease much: learning rate might be too high the “width” of the curve is related to the batch size. This one looks too wide (noisy) => might want to increase batch size Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 30

Monitor and visualize the accuracy: big gap = overfitting => increase regularization strength no gap => increase model capacity Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 31

Track the ratio of weight updates / weight magnitudes: max mean min ratio between the values and updates: ~ 0.0002 / 0.02 = 0.01 (about okay) want this to be somewhere around 0.01 - 0.001 or so Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21 Jan 2015 21 Jan 2015 32

Administrative - A2 is out. It was late 2 days so due date will be - PowerPoint PPT Presentation

Administrative - A2 is out. It was late 2 days so due date will be shifted by ~2 days. - we updated the project page with many pointers to datasets. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 6 - Lecture 6 - 21

Potty Training in Potty Training in Potty Training in Potty Training in Four Days Four Days

Administrative - A1 is due Today (midnight). You can use up to 3 late days - A2 will be up this

Final Planting Date/Late Planting Period (LPP) Final Planting Date is Crop/County specific Corn

NRHS Late Starts and Class Sizes April 6, 2016 1 Late Start Key Questions How have late

Commission: Out of touch, out of date, out of pocket April 2017 Commission: Out of touch, out of

No disclosures Scholarship Presentation 2 days Radiation Oncology 2 days Wellness Beyond Cancer

Upstream Graphics: Too Little, Too Late Upstream Graphics: Too Little, Too Late Daniel Vetter,

Late binding Ch 15.3 Highlights - Late binding for variables - Late binding for functions

CS 224d: Assignment #1 Due date: 4/19 11:59 PM PST (You are allowed to use three (3) late days

DAYS OF REMEMBRANCE May 1-8, 2016 DAYS OF REMEMBRANCE Each year, the United States Holocaust

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 27 Oct Mon

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 1 Dec Mon

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 2.4 2.5

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 2.6 2.8

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date Ohms Law

Schedule Date Day Class Title Chapters HW Lab Exam No. Due date Due date 10 Nov Mon

9.520/6.860: Statistical Learning Theory and Applications Class: Mon., Wed. 1:00 - 2:30 pm,

Case 3:16-md-02741-VC Document 317 Filed 05/22/17 Page 1 of 5 Aimee H. Wagstaff, Esq.

ANNUAL PERFORMANCE REPORT (APR) TECHNICAL ASSISTANCE WEBINAR NATIVE ACHIEVEMENT PROGRAMS RURAL,

Mode ling Pha se Mode ling Pha se We e k 4 Announcement Announcement Midterm I

Distressed Loans What they are and how to manage them Obtaining a loan Almost everyone will

MAIN STREET LENDING FACILITIES UPDATED 25 JUNE 2020 NATO WEBINAR: MAIN STREET LENDING FACILITIES

NIFA Reporting Web Conference February 09, 2017 Start Recording Adam Preuter Adam

Introduction to Selected Classes of the QuantLib Library I Dimitri Reiswich December 2010