Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020
Administrative โข HW3 due Wednesday, March 4 11:59pm โข TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish the assignment by Friday, February 25 Justin Johnson EECS 442 WI 2020: Lecture 14 - 2 February 25, 2020
Last Time: Supervised Learning 1. Collect a dataset of images and labels 2. Use Machine Learning to train a classifier 3. Evaluate the classifier on new images Example training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 3 February 25, 2020
Last Time: Least Squares 8 arg min ๐ โ ๐๐ 8 or ๐ Training ( x i ,y i ): 4 ๐ " ๐ ๐ โ ๐ง 2 8 arg min ๐ 1 23& ๐ " ๐ = ๐ฅ & ๐ฆ & + โฏ + ๐ฅ * ๐ฆ * Inference (x): Testing/Inference: Given a new output, whatโs the prediction? Justin Johnson EECS 442 WI 2020: Lecture 14 - 4 February 25, 2020
Last Time: Regularization 8 + ๐ ๐ 8 8 arg min ๐ โ ๐๐ 8 Objective: ๐ Loss Trade-off Regularization What happens (and why) if: โข ฮป=0 โข ฮป= โ Least-squares w = 0 Something sensible? ? 0 โ Justin Johnson EECS 442 WI 2020: Lecture 14 - 5 February 25, 2020
Hyperparameters 8 + ๐ ๐ 8 8 arg min ๐ โ ๐๐ 8 Objective: ๐ Loss Trade-off Regularization What happens (and why) if: โข ฮป=0 โข ฮป= โ W is a parameter , since we optimize for it on the training set ฮป is a hyperparameter , since we choose it before fitting the training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 6 February 25, 2020
Choosing Hyperparameters Idea #1 : Choose hyperparameters BAD : ฮป =0 always works that work best on the data best on training data Your Dataset Idea #2 : Split data into train and BAD : No idea how we test , choose hyperparameters will perform on new data that work best on test data train test Idea #3 : Split data into train , val , and Better! test ; choose hyperparameters on val and evaluate on test train validation test Justin Johnson EECS 442 WI 2020: Lecture 14 - 7 February 25, 2020
Choosing Hyperparameters Your Dataset Idea #4 : Cross-Validation : Split data into folds , try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but (unfortunately) not used too frequently in deep learning Justin Johnson EECS 442 WI 2020: Lecture 14 - 8 February 25, 2020
Training and Testing Fit model parameters on training set; find hyperparameters by testing on validation set; evaluate on entirely unseen test set. Training Validation Test Use these data Evaluate on these points to fit points for different ฮป, w* =( X T X + ฮป I ) -1 X T y pick the best Justin Johnson EECS 442 WI 2020: Lecture 14 - 9 February 25, 2020
Image Classification Start with simplest example: binary classification x 1 x 2 Cat or not cat? โฆ x N Actually: a feature vector representing the image Justin Johnson EECS 442 WI 2020: Lecture 14 - 10 February 25, 2020
Classification with Least Squares Treat as regression: x i is image feature; y i is 1 if itโs a cat, 0 if itโs not a cat. Minimize least-squares loss. 4 Training ( x i ,y i ): ๐ " ๐ ๐ โ ๐ง 2 8 arg min ๐ 1 23& ๐ " ๐ > ๐ข Inference (x): Unprincipled in theory, but often effective in practice The reverse (regression via discrete bins) is also common Rifkin, Yeo, Poggio. Regularized Least Squares Classification (http://cbcl.mit.edu/publications/ps/rlsc.pdf). 2003 Redmon, Divvala, Girshick, Farhadi. You Only Look Once: Unified, Real-Time Object Detection. CVPR 2016. Justin Johnson EECS 442 WI 2020: Lecture 14 - 11 February 25, 2020
Classification via Memorization Just memorize (as in a Python dictionary) Consider cat/dog/hippo classification. If this: If this: If this: cat. dog. hippo. Justin Johnson EECS 442 WI 2020: Lecture 14 - 12 February 25, 2020
Classification via Memorization Where does this go wrong? Rule: if this, Hmmm. Not quite the then cat same. Justin Johnson EECS 442 WI 2020: Lecture 14 - 13 February 25, 2020
Classification via Memorization Test Known Images Image Labels ๐ธ(๐ & , ๐ " ) ๐ & ๐ " Cat Cat! โฆ ๐ธ(๐ @ , ๐ " ) (1) Compute distance between feature vectors ๐ @ (2) find nearest Dog (3) use label. Justin Johnson EECS 442 WI 2020: Lecture 14 - 14 February 25, 2020
Nearest Neighbor โAlgorithmโ Memorize training set Training ( x i ,y i ): bestDist, prediction = Inf, None Inference (x): for i in range(N): if dist(x i ,x) < bestDist: bestDist = dist(x i ,x) prediction = y i Justin Johnson EECS 442 WI 2020: Lecture 14 - 15 February 25, 2020
Nearest Neighbor Decision boundaries How to smooth out Nearest neighbors can be noisy; decision boundaries? in two dimensions affected by outliers Use more neighbors! x 1 Decision boundary is the boundary between two Points are training classification regions examples; colors give training labels Background colors give the category x a test point would x 0 be assigned Justin Johnson EECS 442 WI 2020: Lecture 14 - 16 February 25, 2020
K-Nearest Neighbors K = 1 K = 3 Instead of copying label from nearest neighbor, take majority vote from K closest points Justin Johnson EECS 442 WI 2020: Lecture 14 - 17 February 25, 2020
K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps smooth out rough decision boundaries Justin Johnson EECS 442 WI 2020: Lecture 14 - 18 February 25, 2020
K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps reduce the effect of outliers Justin Johnson EECS 442 WI 2020: Lecture 14 - 19 February 25, 2020
K-Nearest Neighbors K = 1 K = 3 When K > 1 there can be ties! Need to break them somehow Justin Johnson EECS 442 WI 2020: Lecture 14 - 20 February 25, 2020
K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 ๐ฆ 2 โ ๐ง 2 8 ๐ ๐ฆ, ๐ง = 1 ๐ ๐ฆ, ๐ง = 1 ๐ฆ 2 โ ๐ง 2 2 2 Justin Johnson EECS 442 WI 2020: Lecture 14 - 21 February 25, 2020
K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 ๐ฆ 2 โ ๐ง 2 8 ๐ ๐ฆ, ๐ง = 1 ๐ ๐ฆ, ๐ง = 1 ๐ฆ 2 โ ๐ง 2 2 2 K = 1 K = 1 Justin Johnson EECS 442 WI 2020: Lecture 14 - 22 February 25, 2020
K-Nearest Neighbors What distance? What value for K? Training Validation Test Use these data Evaluate on these points for lookup points for different k, distances Justin Johnson EECS 442 WI 2020: Lecture 14 - 23 February 25, 2020
K-Nearest Neighbors โข No learning going on but usually effective โข Same algorithm for every task โข As number of datapoints โ โ, error rate is guaranteed to be at most 2x worse than optimal you could do on data โข Training is fast, but inference is slow. Opposite of what we want! Justin Johnson EECS 442 WI 2020: Lecture 14 - 24 February 25, 2020
Linear Classifiers Example Setup: 3 classes ๐ G , ๐ & , ๐ 8 Model โ one weight per class: " ๐ ๐ G big if cat " ๐ ๐ & big if dog " ๐ ๐ 8 big if hippo Stack together: ๐ฟ ๐๐๐ฎ where x is in R F Justin Johnson EECS 442 WI 2020: Lecture 14 - 25 February 25, 2020
Linear Classifiers Cat weight vector Cat score 0.2 -0.5 0.1 2.0 1.1 56 -96.8 Dog weight vector Dog score 1.5 1.3 2.1 0.0 3.2 231 437.9 Hippo weight vector Hippo score 0.0 0.3 0.2 -0.3 -1.2 24 61.95 2 ๐ฟ๐ ๐ ๐ฟ 1 Prediction is vector Weight matrix a collection of scoring where jth component is ๐ ๐ functions, one per class โscoreโ for jth class. Diagram by: Karpathy, Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 26 February 25, 2020
Linear Classifiers: Geometric Intuition What does a linear classifier look like in 2D? Be aware: Intuition from 2D doesnโt always carry over into high-dimensional spaces. See: On the Surprising Behavior of Distance Metrics in High Dimensional Space. Charu, Hinneburg, Keim. ICDT 2001 Diagram credit: Karpathy & Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 27 February 25, 2020
Linear Classifiers: Visual Intuition CIFAR 10: 32x32x3 Images, 10 Classes Turn each image into โข feature by unrolling all pixels Train a linear model โข to recognize 10 classes Justin Johnson EECS 442 WI 2020: Lecture 14 - 28 February 25, 2020
Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Deer or Plane? Justin Johnson EECS 442 WI 2020: Lecture 14 - 29 February 25, 2020
Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Ship or Dog? Justin Johnson EECS 442 WI 2020: Lecture 14 - 30 February 25, 2020
Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Justin Johnson EECS 442 WI 2020: Lecture 14 - 31 February 25, 2020
Recommend
More recommend