support vector
play

Support vector machines and kernels Thurs Nov 19 Kristen Grauman - PDF document

11/18/2015 Support vector machines and kernels Thurs Nov 19 Kristen Grauman UT Austin Last time Sliding window object detection pros and cons Attentional cascade Object proposals for detection Nearest neighbor classification


  1. 11/18/2015 Support vector machines and kernels Thurs Nov 19 Kristen Grauman UT Austin Last time • Sliding window object detection pros and cons • Attentional cascade • Object proposals for detection • Nearest neighbor classification • Scene recognition example with global descriptors 1

  2. 11/18/2015 Today • HMM examples • Support vector machines (SVM) • Basic algorithm • Kernels • Structured input spaces: Pyramid match kernels • Multi-class • HOG + SVM for person detection • Visualizing a feature: Hoggles • Evaluating an object detector Window-based models: Three case studies Boosting + face SVM + person NN + scene Gist detection detection classification e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones Slide credit: Kristen Grauman 2

  3. 11/18/2015 Recall: Nearest Neighbor classification • Assign label of nearest training data point to each test data point Black = negative Novel test example Red = positive Closest to a positive example from the training set, so classify it as positive. from Duda et al. Voronoi partitioning of feature space for 2-category 2D data 6+ million geotagged photos by 109,788 photographers Annotated by Flickr users Slide credit: James Hays 3

  4. 11/18/2015 Im2gps: Scene Matches Slide credit: James Hays [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Slide credit: James Hays 4

  5. 11/18/2015 The Importance of Data [Hays and Efros. im2gps : Estimating Geographic Information from a Single Image. CVPR 2008.] Slide credit: James Hays HMM example: Photo Geo-location Where was this picture taken? Slide credit: Kristen Grauman 5

  6. 11/18/2015 Example: Photo Geo-location Where was this picture taken? Slide credit: Kristen Grauman Example: Photo Geo-location Where was this picture taken? Slide credit: Kristen Grauman 6

  7. 11/18/2015 Example: Photo Geo-location Where was each picture in this sequence taken? Slide credit: Kristen Grauman Idea: Exploit the beaten path • Learn dynamics model from “training” tourist photos • Exploit timestamps and sequences for novel “test” photos [Chen & Grauman CVPR 2011] Slide credit: Kristen Grauman 7

  8. 11/18/2015 Idea: Exploit the beaten path [Chen & Grauman CVPR 2011] Slide credit: Kristen Grauman Hidden Markov Model Observation P(Observation | State ) P(S 2 |S 2 ) P(State ) State 2 P(S 3 |S 2 ) Observation P(S 2 |S 1 ) P(S 2 |S 3 ) Observation P(S 1 |S 2 ) State 1 State 3 P(S 1 |S 1 ) P(S 3 |S 1 ) P(S 3 |S 3 ) P(S 1 |S 3 ) Slide credit: Kristen Grauman 8

  9. 11/18/2015 Discovering a city’s locations Define states with data-driven approach: New York mean shift clustering on the GPS coordinates of the training images Observation model P(Observation | State) = P( | Liberty Island) P(L 2 |L 2 ) Location 2 P(L 3 |L 2 ) P(L 2 |L 1 ) P(S 2 |S 3 ) P(L 1 |L 2 ) Location 1 Location 3 P(L 1 |L 1 ) P(L 3 |L 1 ) P(L 3 |L 3 ) P(L 1 |L 3 ) Slide credit: Kristen Grauman 9

  10. 11/18/2015 Observation model Slide credit: Kristen Grauman Location estimation accuracy Slide credit: Kristen Grauman 10

  11. 11/18/2015 Qualitative Result – New York Slide credit: Kristen Grauman Discovering travel g uides’ beaten paths Routes from travel guide book for New York vs. Random walks in learned HMM Slide credit: Kristen Grauman 11

  12. 11/18/2015 Video textures • Schodl, Szeliski, Salesin, Essa; Siggraph 2000. • http://www.cc.gatech.edu/cpl/projects/videotexture/ Today • HMM examples • Support vector machines (SVM) – Basic algorithm – Kernels • Structured input spaces: Pyramid match kernels – Multi-class – HOG + SVM for person detection • Visualizing a feature: Hoggles • Evaluating an object detector 12

  13. 11/18/2015 Window-based models: Three case studies Boosting + face SVM + person NN + scene Gist detection detection classification e.g., Hays & Efros e.g., Dalal & Triggs Viola & Jones Slide credit: Kristen Grauman Linear classifiers 13

  14. 11/18/2015 Linear classifiers • Find linear function to separate positive and negative examples    positive : b 0 x x w i i    negative : 0 b x x w i i Which line is best? Support Vector Machines (SVMs) • Discriminative classifier based on optimal separating line (for 2d case) • Maximize the margin between the positive and negative training examples 14

  15. 11/18/2015 Support vector machines • Want line that maximizes the margin.     positive ( 1) : 1 y b x x w i i i       negative ( y 1) : b 1 x x w i i i     1 b For support, vectors, x i w Support vectors Margin C. Burges, A Tutorial on Support V ector Machines for Pattern Recognition, Data Mining and Knowledge Discovery , 1998 Support vector machines • Want line that maximizes the margin.     positive ( 1) : 1 y b x x w i i i       negative ( 1) : 1 y b x x w i i i     1 b For support, vectors, x i w   | b | Distance between point x w i and line: || || w For support vectors: Τ  b   1 1 1 2 w x     M Support vectors Margin M w w w w w C. Burges, A Tutorial on Support Vector Machines f or Pattern Recognition, Data Mining and Knowledge Discov ery, 15

  16. 11/18/2015 Support vector machines • Want line that maximizes the margin.     positive ( 1) : 1 y b x x w i i i       negative ( y 1) : b 1 x x w i i i     1 b For support, vectors, x i w   | | b Distance between point x w i and line: || || w Therefore, the margin is 2 / || w || Support vectors Margin M C. Burges, A Tutorial on Support Vector Machines f or Pattern Recognition, Data Mining and Knowledge Discov ery, Finding the maximum margin line 1. Maximize margin 2/|| w || 2. Correctly classify all training data points:     positive ( 1) : 1 y b x x w i i i       negative ( 1) : 1 y b x x w i i i Quadratic optimization problem : 1 w T Minimize w 2 Subject to y i ( w · x i + b ) ≥ 1 C. Burges, A Tutorial on Support Vector Machines f or Pattern Recognition, Data Mining and Knowledge Discov ery, 16

  17. 11/18/2015 Finding the maximum margin line    • Solution: i y x w i i i learned Support weight vector C. Burges, A Tutorial on Support Vector Machines f or Pattern Recognition, Data Mining and Knowledge Discov ery, Finding the maximum margin line    • Solution: i y x w i i i b = y i – w · x i (for any support vector)        b y b w x x x i i i i • Classification function:    ( ) sign ( b) f x w x        sign y b x x i i i i If f(x) < 0, classify as negative, if f(x) > 0, classify as positive C. Burges, A Tutorial on Support Vector Machines f or Pattern Recognition, Data Mining and Knowledge Discov ery, 19 17

  18. 11/18/2015 Questions • What if the data is not linearly separable? What if the data is not linearly separable? 1 2    min subject to ( ) 1 y b • w w x Separable: i i 2 , b w   1 n 2  min C • Non-separable: w i 2 , b w  i 1       subject to ( ) 1 0 y b w x i i i • C : tradeoff constant, ξ i : slack variable (positive) • Whenever margin is ≥ 1, ξ i = 0      1 ( ) • y b Whenever margin is < 1, w x i i i Lana Lazebnik 18

  19. 11/18/2015 Today • HMM examples • Support vector machines (SVM) – Basic algorithm – Kernels • Structured input spaces: Pyramid match kernels – Multi-class – HOG + SVM for person detection • Visualizing a feature: Hoggles • Evaluating an object detector Non-linear SVMs  Datasets that are linearly separable with some noise work out great: x 0  But what are we going to do if the dataset is just too hard? x 0  How about … mapping data to a higher-dimensional space: x 2 0 x 19

  20. 11/18/2015 Non-linear SVMs: feature spaces  General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x ) Slide f rom Andrew Moore’s tutorial: http://www .autonlab.org/tutorials/sv m.html Nonlinear SVMs • The kernel trick : instead of explicitly computing the lifting transformation φ ( x ), define a kernel function K such that j ) = φ ( x i ) · φ ( x j ) K ( x i , x j • This gives a nonlinear decision boundary in the original feature space:    ( , ) y K b x x i i i i 20

Recommend


More recommend