Lecture 3: Loss functions and Optimization Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 1
Administrative A1 is due Jan 20 (Wednesday). ~9 days left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 2
Recall from last time… Challenges in Visual Recognition Deformation Camera pose Illumination Occlusion Intraclass variation Background clutter Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 3
Recall from last time… data-driven approach, kNN the data NN classifier 5-NN classifier Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 4
Recall from last time… Linear classifier image parameters 10 numbers, indicating f( x , W ) class scores [32x32x3] array of numbers 0...1 (3072 numbers total) Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 5
Recall from last time… Going forward: Loss function/Optimization TODO: 1. Define a loss function that quantifies our unhappiness with the 3.42 -3.45 -0.51 scores across the training -8.87 4.64 6.04 data. 0.09 2.65 5.31 2.9 5.1 -4.22 4.48 2.64 2. Come up with a way of -4.19 8.02 5.55 3.58 efficiently finding the 3.78 -4.34 4.49 parameters that minimize 1.06 -1.5 -4.37 the loss function. -0.36 -4.79 -2.09 (optimization) -0.72 6.14 -2.93 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 6
Suppose: 3 training examples, 3 classes. With some W the scores are: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 7
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car -1.7 2.0 -3.1 frog Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 8
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 5.1 - 3.2 + 1) +max(0, -1.7 - 3.2 + 1) -1.7 2.0 -3.1 frog = max(0, 2.9) + max(0, -3.9) = 2.9 + 0 2.9 Losses: = 2.9 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 9
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 1.3 - 4.9 + 1) +max(0, 2.0 - 4.9 + 1) -1.7 2.0 -3.1 frog = max(0, -2.6) + max(0, -1.9) = 0 + 0 2.9 0 Losses: = 0 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 10
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car = max(0, 2.2 - (-3.1) + 1) +max(0, 2.5 - (-3.1) + 1) -1.7 2.0 -3.1 frog = max(0, 5.3) + max(0, 5.6) = 5.3 + 5.6 2.9 0 10.9 Losses: = 10.9 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 11
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat and the full training loss is the mean 5.1 4.9 2.5 over all examples in the training data: car -1.7 2.0 -3.1 frog L = (2.9 + 0 + 10.9)/3 2.9 0 10.9 Losses: = 4.6 Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 12
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat Q: what if the sum 5.1 4.9 2.5 car was instead over all -1.7 2.0 -3.1 classes? frog (including j = y_i) 2.9 0 10.9 Losses: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 13
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q2: what if we used a -1.7 2.0 -3.1 mean instead of a frog sum here? 2.9 0 10.9 Losses: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 14
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q3: what if we used -1.7 2.0 -3.1 frog 2.9 0 10.9 Losses: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 15
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat 5.1 4.9 2.5 car Q4: what is the -1.7 2.0 -3.1 min/max possible frog loss? 2.9 0 10.9 Losses: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 16
Suppose: 3 training examples, 3 classes. Multiclass SVM loss: With some W the scores are: Given an example where is the image and where is the (integer) label, and using the shorthand for the scores vector: the SVM loss has the form: 3.2 1.3 2.2 cat Q5: usually at 5.1 4.9 2.5 car initialization W are small -1.7 2.0 -3.1 numbers, so all s ~= 0. frog What is the loss? 2.9 0 10.9 Losses: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 17
Example numpy code: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 18
Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 19
There is a bug with the loss: Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 20
There is a bug with the loss: E.g. Suppose that we found a W such that L = 0. Is this W unique? Fei-Fei Li & Andrej Karpathy & Justin Johnson Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 3 - Lecture 3 - 11 Jan 2016 11 Jan 2016 21
Recommend
More recommend