administrative how is the assignment going btw the notes
play

Administrative - how is the assignment going? - btw, the notes get - PowerPoint PPT Presentation

Administrative - how is the assignment going? - btw, the notes get updated all the time based on your feedback - no lecture on Monday Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7


  1. Administrative - how is the assignment going? - btw, the notes get updated all the time based on your feedback - no lecture on Monday Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 1

  2. Lecture 4: Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 2

  3. Image Classification assume given set of discrete labels {dog, cat, truck, plane, ...} cat Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 3

  4. Data-driven approach Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 4

  5. 1. Score function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 5

  6. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 6

  7. 1. Score function 2. Two loss functions Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 7

  8. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 8

  9. Three key components to training Neural Nets: 1. Score function 2. Loss function 3. Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 9

  10. Brief aside: Image Features - In practice, very rare to see Computer Vision applications that train linear classifiers on pixel values Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 10

  11. Brief aside: Image Features - In practice, very rare to see Computer Vision applications that train linear classifiers on pixel values Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 11

  12. Example: Color (Hue) Histogram hue bins +1 Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 12

  13. Example: HOG features 8x8 pixel region, quantize the edge orientation into 9 bins (images from vlfeat.org) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 13

  14. Example: Bag of Words 1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature gives a matrix of size [number_of_features x 144] Problem: different images will have different numbers of features. Need fixed-sized vectors for linear classification Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 14

  15. Example: Bag of Words 1. Resize patch to a fixed size (e.g. 32x32 pixels) 2. Extract HOG on the patch (get 144 numbers) repeat for each detected feature gives a matrix of size [number_of_features x 144] Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 15

  16. Example: Bag of Words histogram of visual words visual word vectors 1000-d vector learn k-means centroids “vocabulary of visual words 144 1000-d vector e.g. 1000 centroids 1000-d vector Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 16

  17. Brief aside: Image Features Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 17

  18. Most recognition systems are build on the same Architecture (slide from Yann LeCun) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 18

  19. Most recognition systems are build on the same Architecture CNNs: end-to-end models (slide from Yann LeCun) Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 19

  20. Visualizing the loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 20

  21. Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 21

  22. Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 22

  23. Visualizing the (SVM) loss function Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 23

  24. Visualizing the (SVM) loss function the full data loss: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 24

  25. Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 25

  26. Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 26

  27. Visualizing the (SVM) loss function Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 27

  28. Visualizing the (SVM) loss function Question: CIFAR-10 has 50,000 training images, 5,000 per class and 10 labels. How many occurrences of one classifier row in the full data loss? Suppose there are 3 examples with 3 classes (class 0, 1, 2 in sequence), then this becomes: Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 28

  29. Optimization Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 29

  30. Strategy #1: A first very bad idea solution: Random search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 30

  31. Strategy #1: A first very bad idea solution: Random search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 31

  32. Strategy #1: A first very bad idea solution: Random search what’s up with 0.0001? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 32

  33. Lets see how well this works on the test set... Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 33

  34. Fun aside: When W = 0, what is the CIFAR-10 loss for SVM and Softmax? Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 34

  35. Strategy #2: A better but still very bad idea solution: Random local search Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 35

  36. Strategy #2: A better but still very bad idea solution: Random local search gives 21.4%! Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 36

  37. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 37

  38. Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 38

  39. Strategy #3: Following the gradient In 1-dimension, the derivative of a function: In multiple dimension, the gradient is the vector of (partial derivatives). Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 39

  40. Evaluation the gradient numerically Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 40

  41. Evaluation the gradient numerically “finite difference approximation” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 41

  42. Evaluation the gradient numerically in practice: “centered difference formula” Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 42

  43. Evaluation the gradient numerically Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 43

  44. performing a parameter update Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 44

  45. performing a parameter update Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 45

  46. original W negative gradient direction Fei-Fei Li & Andrej Karpathy Fei-Fei Li & Andrej Karpathy Lecture 4 - Lecture 4 - 7 Jan 2015 7 Jan 2015 46

Recommend


More recommend