Administrative - how is the assignment going? - btw, the notes get updated all the time based on your feedback - no lecture on Monday Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 1
Lecture 4: Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 2
Image Classification assume given set of discrete labels {dog, cat, truck, plane, ...} cat Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 3
Data-driven approach Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 4
1. Score function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 5
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 6
1. Score function 2. Two loss functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 7
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 8
Three key components to training Neural Nets: 1. Score function 2. Loss function 3. Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 9
Brief aside: Image Features - In practice, very rare to see Computer Vision applications that train linear classifiers on pixel values Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 10
Brief aside: Image Features - In practice, very rare to see Computer Vision applications that train linear classifiers on pixel values Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 11
Example: Color (Hue) Histogram hue bins +1 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 12
Example: HOG features 8x8 pixel region, quantize the edge orientation into 9 bins (images from vlfeat.org) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 13
Example: Bag of Words 1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature gives a matrix of size [number_of_features x 144] Problem: different images will have different numbers of features. Need fixed-sized vectors for linear classification Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 14
Example: Bag of Words 1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature gives a matrix of size [number_of_features x 144] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 15
Example: Bag of Words histogram of visual words visual word vectors 1000-d vector learn k-means centroids “vocabulary of visual words 144 1000-d vector e.g. 1000 centroids 1000-d vector Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 16
Brief aside: Image Features Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 17
Most recognition systems are build on the same Architecture (slide from Yann LeCun) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 18
Most recognition systems are build on the same Architecture CNNs: end-to-end models (slide from Yann LeCun) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 19
Visualizing the loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 20
Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 21
Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 22
Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 23
Visualizing the (SVM) loss function the full data loss: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 24
Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 25
Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 26
Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 27
Visualizing the (SVM) loss function Question: CIFAR-10 has 50,000 training images, 5,000 per class and 10 labels. How many occurrences of one classifier row in the full data loss? Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 28
Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 29
Strategy #1: A first very bad idea solution: Random search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 30
Strategy #1: A first very bad idea solution: Random search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 31
Strategy #1: A first very bad idea solution: Random search what’s up with 0.0001? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 32
Lets see how well this works on the test set... Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 33
Fun aside: When W = 0, what is the CIFAR-10 loss for SVM and Softmax? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 34
Strategy #2: A better but still very bad idea solution: Random local search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 35
Strategy #2: A better but still very bad idea solution: Random local search gives 21.4%! Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 36
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 37
Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 38
Strategy #3: Following the gradient In 1-dimension, the derivative of a function: In multiple dimension, the gradient is the vector of (partial derivatives). Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 39
Evaluation the gradient numerically Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 40
Evaluation the gradient numerically “finite difference approximation” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 41
Evaluation the gradient numerically in practice: “centered difference formula” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 42
Evaluation the gradient numerically Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 43
performing a parameter update Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 44
performing a parameter update Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 45
original W negative gradient direction Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 46
Recommend
More recommend