softmax classifier generalization
play

Softmax Classifier + Generalization Various slides from previous - PowerPoint PPT Presentation

CS4501: Introduction to Computer Vision Softmax Classifier + Generalization Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays


  1. CS4501: Introduction to Computer Vision Softmax Classifier + Generalization Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays (Brown / Georgia Tech), A. Berg (Stony Brook / UNC), D. Samaras (Stony Brook) . J. M. Frahm (UNC), V. Ordonez (UVA), Steve Seitz (UW).

  2. Last Class • Introduction to Machine Learning • Unsupervised Learning: Clustering (e.g. k-means clustering) • Supervised Learning: Classification (e.g. k-nearest neighbors)

  3. Today’s Class • Softmax Classifier (Linear Classifiers) • Generalization / Overfitting / Regularization • Global Features

  4. Supervised Learning vs Unsupervised Learning ! → $ ! cat dog bear dog bear dog cat cat bear

  5. Supervised Learning vs Unsupervised Learning ! → $ ! cat dog bear dog bear dog cat cat bear

  6. Supervised Learning vs Unsupervised Learning ! → $ ! cat dog bear dog Classification Clustering bear dog cat cat bear

  7. Supervised Learning – k-Nearest Neighbors cat k=3 dog bear cat, cat, dog cat cat dog bear dog bear 7

  8. Supervised Learning – k-Nearest Neighbors cat k=3 dog bear cat bear, dog, dog cat dog bear dog bear 8

  9. Supervised Learning – k-Nearest Neighbors • How do we choose the right K? • How do we choose the right features? • How do we choose the right distance metric? 9

  10. Supervised Learning – k-Nearest Neighbors • How do we choose the right K? • How do we choose the right features? • How do we choose the right distance metric? Answer: Just choose the one combination that works best! BUT not on the test data. Instead split the training data into a ”Training set” and a ”Validation set” (also called ”Development set”) 10

  11. Supervised Learning - Classification Training Data Test Data dog bear cat dog bear cat cat bear dog 11

  12. Supervised Learning - Classification Training Data Test Data cat dog cat . . . . . . bear 12

  13. Supervised Learning - Classification Training Data ! ) = [ ] * ) = [ ] cat ! ( = [ ] * ( = [ ] dog * ' = [ ] ! ' = [ ] cat . . . ! " = [ ] * " = [ ] bear 13

  14. Supervised Learning - Classification Training Data targets / We need to find a function that labels / inputs predictions maps x and y for any of them. ground truth ' & = [' && ' &% ' &$ ' &* ] ! & = ! : & = 1 1 ! , - = .(' , ; 1) ! % = ! : % = ' % = [' %& ' %% ' %$ ' %* ] 2 2 ' $ = [' $& ' $% ' $$ ' $* ] ! $ = ! : $ = 1 2 How do we ”learn” the parameters of this function? . We choose ones that makes the . following quantity small: . " 3 4567(! , -, ! , ) ! " = ! : " = ' " = [' "& ' "% ' "$ ' "* ] 3 1 ,9& 14

  15. Supervised Learning –Softmax Classifier Training Data targets / labels / inputs ground truth ' & = [' && ' &% ' &$ ' &* ] ! & = 1 ! % = ' % = [' %& ' %% ' %$ ' %* ] 2 ' $ = [' $& ' $% ' $$ ' $* ] ! $ = 1 . . . ! " = ' " = [' "& ' "% ' "$ ' "* ] 3 15

  16. Supervised Learning –Softmax Classifier Training Data targets / labels / predictions inputs ground truth ! , & = ' & = [' && ' &% ' &$ ' &* ] ! & = [0.85 0.10 0.05] [1 0 0] ! , % = ! % = ' % = [' %& ' %% ' %$ ' %* ] [0.20 0.70 0.10] [0 1 0] ! , $ = ' $ = [' $& ' $% ' $$ ' $* ] ! $ = [0.40 0.45 0.05] [1 0 0] . . . ! , " = ! " = [0.40 0.25 0.35] ' " = [' "& ' "% ' "$ ' "* ] [0 0 1] 16

  17. Supervised Learning –Softmax Classifier $ " = [$ "& $ "( $ ") $ "* ] ! " = ! , " = [- . - / - 0 ] [1 0 0] 1 . = 2 .& $ "& + 2 .( $ "( + 2 .) $ ") + 2 .* $ "* + 4 . 1 / = 2 /& $ "& + 2 /( $ "( + 2 /) $ ") + 2 /* $ "* + 4 / 1 0 = 2 0& $ "& + 2 0( $ "( + 2 0) $ ") + 2 0* $ "* + 4 0 . = 5 6 7 /(5 6 7 +5 6 : + 5 6 ; ) - / = 5 6 : /(5 6 7 +5 6 : + 5 6 ; ) - 0 = 5 6 ; /(5 6 7 +5 6 : + 5 6 ; ) - 17

  18. How do we find a good w and b? $ " = [$ "& $ "( $ ") $ "* ] ! " = ! , " = [- . (0, 2) - 4 (0, 2) - 5 (0, 2)] [1 0 0] We need to find w, and b that minimize the following function L: > ) > > 6 0, 2 = 7 7 −! ",9 log (! , ",9 ) = 7 −log (! , ",?@5A? ) = 7 −log - ",?@5A? (0, 2) "=& 9=& "=& "=& Why? 18

  19. How do we find a good w and b? Problem statement: Find ! and " # !, " such that is minimal. Solution from calculus. % ! %! # !, " = 0 and solve for % " %" # !, " = 0 and solve for

  20. https://courses.lumenlearning.com/businesscalc1/chapter/reading-curve-sketching/

  21. How do we find a good w and b? Problem statement: Find ! and " # !, " such that is minimal. Solution from calculus. % ! %! # !, " = 0 and solve for % " %" # !, " = 0 and solve for

  22. Problems with this approach: • Some functions L(w, b) are very complicated and compositions of many functions. So finding its analytical derivative is tedious. • Even if the function is simple to derivate, it might not be easy to solve for w. e.g. ! !" # ", % = ' ( + " = 0 How do you find w in that equation?

  23. Solution: Iterative Approach: Gradient Descent (GD) 1. Start with a random value of w (e.g. w = 12) ! " 2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) 3. Recompute w as: w = w – lambda * (dL / dw) w=12 " 23

  24. Solution: Iterative Approach: Gradient Descent (GD) ! " 2. Compute the gradient (derivative) of L(w) at point w = 12. (e.g. dL/dw = 6) 3. Recompute w as: w = w – lambda * (dL / dw) w=10 " 24

  25. Gradient Descent (GD) Problem: expensive! 7 = 0.01 4 !(#, %) = ( −log . /,01230 (#, %) Initialize w and b randomly /56 for e = 0, num_epochs do Compute: and ;!(#, %)/;% ;!(#, %)/;# Update w: # = # − 7 ;!(#, %)/;# Update b: % = % − 7 ;!(#, %)/;% // Useful to see if this is becoming smaller or not. Print: !(#, %) end 25

  26. Solution: (mini-batch) Stochastic Gradient Descent (SGD) 6 = 0.01 !(#, %) = ( −log . /,01230 (#, %) Initialize w and b randomly /∈5 for e = 0, num_epochs do B is a small for b = 0, num_batches do set of training Compute: and :!(#, %)/:% :!(#, %)/:# examples. Update w: # = # − 6 :!(#, %)/:# Update b: % = % − 6 :!(#, %)/:% // Useful to see if this is becoming smaller or not. Print: !(#, %) end end 26

  27. Source: Andrew Ng

  28. Three more things • How to compute the gradient • Regularization • Momentum updates 28

  29. SGD Gradient for the Softmax Function

  30. SGD Gradient for the Softmax Function

  31. SGD Gradient for the Softmax Function

  32. Supervised Learning –Softmax Classifier ; < " = [2 , 2 0 2 1 ] Extract features ! " = [! "% ! "' ! "( ! ") ] Run features through classifier + , = - ,% ! "% + - ,' ! "' + - ,( ! "( + - ,) ! ") + / , + 0 = - 0% ! "% + - 0' ! "' + - 0( ! "( + - 0) ! ") + / 0 + 1 = - 1% ! "% + - 1' ! "' + - 1( ! "( + - 1) ! ") + / 1 Get predictions , = 3 4 5 /(3 4 5 +3 4 8 + 3 4 9 ) 2 0 = 3 4 8 /(3 4 5 +3 4 8 + 3 4 9 ) 2 1 = 3 4 9 /(3 4 5 +3 4 8 + 3 4 9 ) 2 32

  33. Supervised Machine Learning Steps Training Training Labels Training Images Image Learned Training Features model Learned model Testing Image Prediction Features Test Image Slide credit: D. Hoiem

  34. Generalization Generalization refers to the ability to correctly classify never before • seen examples Can be controlled by turning “knobs” that affect the complexity of • the model Test set (labels unknown) Training set (labels known)

  35. Overfitting % is a polynomial of % is linear % is cubic degree 9 !"## $ is high !"## $ is low !"## $ is zero! Overfitting Underfitting High Bias High Variance

  36. Questions? 36

Recommend


More recommend