lecture 14 linear classifiers
play

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: - PowerPoint PPT Presentation

Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020 Administrative HW3 due Wednesday, March 4 11:59pm TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish


  1. Lecture 14: Linear Classifiers Justin Johnson EECS 442 WI 2020: Lecture 14 - 1 February 25, 2020

  2. Administrative โ€ข HW3 due Wednesday, March 4 11:59pm โ€ข TAs will not be checking Piazza over Spring Break. You are strongly encouraged to finish the assignment by Friday, February 25 Justin Johnson EECS 442 WI 2020: Lecture 14 - 2 February 25, 2020

  3. Last Time: Supervised Learning 1. Collect a dataset of images and labels 2. Use Machine Learning to train a classifier 3. Evaluate the classifier on new images Example training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 3 February 25, 2020

  4. Last Time: Least Squares 8 arg min ๐’› โˆ’ ๐’€๐’™ 8 or ๐’™ Training ( x i ,y i ): 4 ๐’™ " ๐’š ๐’‹ โˆ’ ๐‘ง 2 8 arg min ๐’™ 1 23& ๐’™ " ๐’š = ๐‘ฅ & ๐‘ฆ & + โ‹ฏ + ๐‘ฅ * ๐‘ฆ * Inference (x): Testing/Inference: Given a new output, whatโ€™s the prediction? Justin Johnson EECS 442 WI 2020: Lecture 14 - 4 February 25, 2020

  5. Last Time: Regularization 8 + ๐œ‡ ๐’™ 8 8 arg min ๐’› โˆ’ ๐’€๐’™ 8 Objective: ๐’™ Loss Trade-off Regularization What happens (and why) if: โ€ข ฮป=0 โ€ข ฮป= โˆž Least-squares w = 0 Something sensible? ? 0 โˆž Justin Johnson EECS 442 WI 2020: Lecture 14 - 5 February 25, 2020

  6. Hyperparameters 8 + ๐œ‡ ๐’™ 8 8 arg min ๐’› โˆ’ ๐’€๐’™ 8 Objective: ๐’™ Loss Trade-off Regularization What happens (and why) if: โ€ข ฮป=0 โ€ข ฮป= โˆž W is a parameter , since we optimize for it on the training set ฮป is a hyperparameter , since we choose it before fitting the training set Justin Johnson EECS 442 WI 2020: Lecture 14 - 6 February 25, 2020

  7. Choosing Hyperparameters Idea #1 : Choose hyperparameters BAD : ฮป =0 always works that work best on the data best on training data Your Dataset Idea #2 : Split data into train and BAD : No idea how we test , choose hyperparameters will perform on new data that work best on test data train test Idea #3 : Split data into train , val , and Better! test ; choose hyperparameters on val and evaluate on test train validation test Justin Johnson EECS 442 WI 2020: Lecture 14 - 7 February 25, 2020

  8. Choosing Hyperparameters Your Dataset Idea #4 : Cross-Validation : Split data into folds , try each fold as validation and average the results fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test fold 1 fold 2 fold 3 fold 4 fold 5 test Useful for small datasets, but (unfortunately) not used too frequently in deep learning Justin Johnson EECS 442 WI 2020: Lecture 14 - 8 February 25, 2020

  9. Training and Testing Fit model parameters on training set; find hyperparameters by testing on validation set; evaluate on entirely unseen test set. Training Validation Test Use these data Evaluate on these points to fit points for different ฮป, w* =( X T X + ฮป I ) -1 X T y pick the best Justin Johnson EECS 442 WI 2020: Lecture 14 - 9 February 25, 2020

  10. Image Classification Start with simplest example: binary classification x 1 x 2 Cat or not cat? โ€ฆ x N Actually: a feature vector representing the image Justin Johnson EECS 442 WI 2020: Lecture 14 - 10 February 25, 2020

  11. Classification with Least Squares Treat as regression: x i is image feature; y i is 1 if itโ€™s a cat, 0 if itโ€™s not a cat. Minimize least-squares loss. 4 Training ( x i ,y i ): ๐’™ " ๐’š ๐’‹ โˆ’ ๐‘ง 2 8 arg min ๐’™ 1 23& ๐’™ " ๐’š > ๐‘ข Inference (x): Unprincipled in theory, but often effective in practice The reverse (regression via discrete bins) is also common Rifkin, Yeo, Poggio. Regularized Least Squares Classification (http://cbcl.mit.edu/publications/ps/rlsc.pdf). 2003 Redmon, Divvala, Girshick, Farhadi. You Only Look Once: Unified, Real-Time Object Detection. CVPR 2016. Justin Johnson EECS 442 WI 2020: Lecture 14 - 11 February 25, 2020

  12. Classification via Memorization Just memorize (as in a Python dictionary) Consider cat/dog/hippo classification. If this: If this: If this: cat. dog. hippo. Justin Johnson EECS 442 WI 2020: Lecture 14 - 12 February 25, 2020

  13. Classification via Memorization Where does this go wrong? Rule: if this, Hmmm. Not quite the then cat same. Justin Johnson EECS 442 WI 2020: Lecture 14 - 13 February 25, 2020

  14. Classification via Memorization Test Known Images Image Labels ๐ธ(๐’š & , ๐’š " ) ๐’š & ๐’š " Cat Cat! โ€ฆ ๐ธ(๐’š @ , ๐’š " ) (1) Compute distance between feature vectors ๐’š @ (2) find nearest Dog (3) use label. Justin Johnson EECS 442 WI 2020: Lecture 14 - 14 February 25, 2020

  15. Nearest Neighbor โ€œAlgorithmโ€ Memorize training set Training ( x i ,y i ): bestDist, prediction = Inf, None Inference (x): for i in range(N): if dist(x i ,x) < bestDist: bestDist = dist(x i ,x) prediction = y i Justin Johnson EECS 442 WI 2020: Lecture 14 - 15 February 25, 2020

  16. Nearest Neighbor Decision boundaries How to smooth out Nearest neighbors can be noisy; decision boundaries? in two dimensions affected by outliers Use more neighbors! x 1 Decision boundary is the boundary between two Points are training classification regions examples; colors give training labels Background colors give the category x a test point would x 0 be assigned Justin Johnson EECS 442 WI 2020: Lecture 14 - 16 February 25, 2020

  17. K-Nearest Neighbors K = 1 K = 3 Instead of copying label from nearest neighbor, take majority vote from K closest points Justin Johnson EECS 442 WI 2020: Lecture 14 - 17 February 25, 2020

  18. K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps smooth out rough decision boundaries Justin Johnson EECS 442 WI 2020: Lecture 14 - 18 February 25, 2020

  19. K-Nearest Neighbors K = 1 K = 3 Using more neighbors helps reduce the effect of outliers Justin Johnson EECS 442 WI 2020: Lecture 14 - 19 February 25, 2020

  20. K-Nearest Neighbors K = 1 K = 3 When K > 1 there can be ties! Need to break them somehow Justin Johnson EECS 442 WI 2020: Lecture 14 - 20 February 25, 2020

  21. K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 ๐‘ฆ 2 โˆ’ ๐‘ง 2 8 ๐‘’ ๐‘ฆ, ๐‘ง = 1 ๐‘’ ๐‘ฆ, ๐‘ง = 1 ๐‘ฆ 2 โˆ’ ๐‘ง 2 2 2 Justin Johnson EECS 442 WI 2020: Lecture 14 - 21 February 25, 2020

  22. K-Nearest Neighbors: Distance Metric L1 (Manhattan) Distance L2 (Euclidean) Distance &/8 ๐‘ฆ 2 โˆ’ ๐‘ง 2 8 ๐‘’ ๐‘ฆ, ๐‘ง = 1 ๐‘’ ๐‘ฆ, ๐‘ง = 1 ๐‘ฆ 2 โˆ’ ๐‘ง 2 2 2 K = 1 K = 1 Justin Johnson EECS 442 WI 2020: Lecture 14 - 22 February 25, 2020

  23. K-Nearest Neighbors What distance? What value for K? Training Validation Test Use these data Evaluate on these points for lookup points for different k, distances Justin Johnson EECS 442 WI 2020: Lecture 14 - 23 February 25, 2020

  24. K-Nearest Neighbors โ€ข No learning going on but usually effective โ€ข Same algorithm for every task โ€ข As number of datapoints โ†’ โˆž, error rate is guaranteed to be at most 2x worse than optimal you could do on data โ€ข Training is fast, but inference is slow. Opposite of what we want! Justin Johnson EECS 442 WI 2020: Lecture 14 - 24 February 25, 2020

  25. Linear Classifiers Example Setup: 3 classes ๐’™ G , ๐’™ & , ๐’™ 8 Model โ€“ one weight per class: " ๐’š ๐’™ G big if cat " ๐’š ๐’™ & big if dog " ๐’š ๐’™ 8 big if hippo Stack together: ๐‘ฟ ๐Ÿ’๐’š๐‘ฎ where x is in R F Justin Johnson EECS 442 WI 2020: Lecture 14 - 25 February 25, 2020

  26. Linear Classifiers Cat weight vector Cat score 0.2 -0.5 0.1 2.0 1.1 56 -96.8 Dog weight vector Dog score 1.5 1.3 2.1 0.0 3.2 231 437.9 Hippo weight vector Hippo score 0.0 0.3 0.2 -0.3 -1.2 24 61.95 2 ๐‘ฟ๐’š ๐’‹ ๐‘ฟ 1 Prediction is vector Weight matrix a collection of scoring where jth component is ๐’š ๐’‹ functions, one per class โ€œscoreโ€ for jth class. Diagram by: Karpathy, Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 26 February 25, 2020

  27. Linear Classifiers: Geometric Intuition What does a linear classifier look like in 2D? Be aware: Intuition from 2D doesnโ€™t always carry over into high-dimensional spaces. See: On the Surprising Behavior of Distance Metrics in High Dimensional Space. Charu, Hinneburg, Keim. ICDT 2001 Diagram credit: Karpathy & Fei-Fei Justin Johnson EECS 442 WI 2020: Lecture 14 - 27 February 25, 2020

  28. Linear Classifiers: Visual Intuition CIFAR 10: 32x32x3 Images, 10 Classes Turn each image into โ€ข feature by unrolling all pixels Train a linear model โ€ข to recognize 10 classes Justin Johnson EECS 442 WI 2020: Lecture 14 - 28 February 25, 2020

  29. Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Deer or Plane? Justin Johnson EECS 442 WI 2020: Lecture 14 - 29 February 25, 2020

  30. Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Ship or Dog? Justin Johnson EECS 442 WI 2020: Lecture 14 - 30 February 25, 2020

  31. Linear Classifiers: Visual Intuition Decision rule is w T x . If w i is big, then big values of x i are indicative of the class. Justin Johnson EECS 442 WI 2020: Lecture 14 - 31 February 25, 2020

Recommend


More recommend