deformable part models
play

Deformable part models Ross Girshick UC Berkeley CS231B Stanford - PowerPoint PPT Presentation

Deformable part models Ross Girshick UC Berkeley CS231B Stanford University Guest Lecture April 16, 2013 Image understanding Snack time in the lab photo by thomas pix http://www.flickr.com/photos/thomaspix/2591427106 What objects are


  1. Deformable part models Ross Girshick UC Berkeley CS231B Stanford University Guest Lecture April 16, 2013

  2. Image understanding Snack time in the lab photo by “thomas pix” http://www.flickr.com/photos/thomaspix/2591427106

  3. What objects are where? I see twinkies! . . . robot: “I see a table with twinkies, pretzels, fruit, and some mysterious chocolate things...”

  4. DPM lecture overview Part 1: modeling Part 2: learning AP 12% 27% 36% 45% 49% 2005 2008 2009 2010 2011

  5. Formalizing the object detection task Many possible ways

  6. Formalizing the object detection task Many possible ways, this one is popular: person cat, motorbike dog, chair, cow, person, motorbike, car, ... Input Desired output

  7. Formalizing the object detection task Many possible ways, this one is popular: person cat, motorbike dog, chair, cow, person, motorbike, car, ... Input Desired output Performance summary: Average Precision (AP) 0 is worst 1 is perfect

  8. Benchmark datasets PASCAL VOC 2005 – 2012 - 54k objects in 22k images - 20 object classes - annual competition

  9. Benchmark datasets PASCAL VOC 2005 – 2012 - 54k objects in 22k images - 20 object classes - annual competition

  10. Reduction to binary classification pos = { ... ... } neg = { ... background patches ... } HOG SVM “Sliding window” detector Dalal & Triggs (CVPR’05)

  11. Sliding window detection p ����� ( � , � ) = w · φ ( � , � ) (f) Image pyramid HOG feature pyramid • Compute HOG of the whole image at multiple resolutions • Score every subwindow of the feature pyramid • Apply non-maxima suppression

  12. Detection p number of locations p ~ 250,000 per image

  13. Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify

  14. Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify typically only ~ 1,000 true positive locations

  15. Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify typically only ~ 1,000 true positive locations Extremely unbalanced binary classification

  16. Dalal & Triggs detector on INRIA Recall − Precision −− different descriptors on INRIA static person database 1 0.9 0.8 0.7 0.6 Precision 0.5 0.4 Ker. R − HOG 0.3 Lin. R − HOG Lin. R2 − Hog 0.2 Wavelet 0.1 PCA − SIFT Lin. E − ShapeC 0 0 0.2 0.4 0.6 0.8 1 Recall • AP = 75% • (79% in my implementation) • Very good • Declare victory and go home?

  17. Dalal & Triggs on PASCAL VOC 2007 AP = 12% (using my implementation)

  18. How can we do better? Revisit an old idea: part-based models (“pictorial structures”) - Fischler & Elschlager ‘73, Felzenszwalb & Huttenlocher ’00 Combine with modern features and machine learning

  19. Part-based models • Parts — local appearance templates • “Springs” — spatial connections between parts (geom. prior) Image: [Felzenszwalb and Huttenlocher 05]

  20. Part-based models • Local appearance is easier to model than the global appearance - Training data shared across deformations - “part” can be local or global depending on resolution • Generalizes to previously unseen configurations

  21. General formulation � = ( � , � ) � = ( � � , . . . , � � ) � ⊆ � × � ( � � , . . . , � � ) ∈ � � v 1 v 2 p part locations in the image (or feature pyramid)

  22. Part configuration score function spring costs � score( � � , . . . , � � ) = � � � ( � � ) − � � �� ( � � , � � ) � = � ( � , � ) ∈ � Part match scores v 1 v 2 p Highest scoring configurations

  23. Part configuration score function spring costs � score( � � , . . . , � � ) = � � � ( � � ) − � � �� ( � � , � � ) � = � ( � , � ) ∈ � Part match scores • Objective: maximize score over p 1 ,...,p n • h n configurations! (h = |P|, about 250,000) • Dynamic programming - If G = (V,E) is a tree, O(nh 2 ) general algorithm ‣ O(nh) with some restrictions on d ij

  24. Star-structured deformable part models root part “star” model test image detection

  25. Recall the Dalal & Triggs detector p ����� ( � , � ) = w · φ ( � , � ) (f) Image pyramid HOG feature pyramid • HOG feature pyramid • Linear filter / sliding-window detector • SVM training to learn parameters w

  26. D&T + parts p 0 root z [FMR CVPR’08] Image pyramid HOG feature pyramid [FGMR PAMI’10] • Add parts to the Dalal & Triggs detector - HOG features - Linear filters / sliding-window detector - Discriminative training

  27. � � Sliding window DPM score function p 0 root z Image pyramid HOG feature pyramid � = ( � � , . . . , � � ) � � score( � , � � ) = max � � ( � , � � ) − � � ( � � , � � ) � � ,..., � � � = � � = � Filter scores Spring costs

  28. Detection in a slide test image feature map feature map at 2x resolution model ... x x x 1-st part filter n -th part filter � � ... root filter � � responses of part filters [ � � ( � � ) − � � ( � � , � � )] ... max response of root filter � � transformed responses + color encoding of filter response values detection scores for each root location low value high value

  29. What are the parts?

  30. Aspect soup General philosophy: enrich models to better represent the data

  31. Mixture models Data driven: aspect, occlusion modes, subclasses FMR CVPR ’08: AP = 0.27 (person) FGMR PAMI ’10: AP = 0.36 (person)

  32. Pushmi–pullyu? Good generalization properties on Doctor Dolittle’s farm ( + ) / 2 = This was supposed to detect horses

  33. Latent orientation Unsupervised left/right orientation discovery horse AP 0.42 0.47 0.57 FGMR PAMI ’10: AP = 0.36 (person) voc-release5: AP = 0.45 (person) Publicly available code for the whole system: current voc-release5

  34. Summary of results (f) [DT’05] [FMR’08] AP 0.12 AP 0.27 [FGMR’10] [GFM voc-release5] AP 0.36 AP 0.45 [GFM’11] AP 0.49

  35. Part 2: DPM parameter learning fixed model structure ? ? ? ? ? ? ? ? ? ? ? ? component 1 component 2

  36. Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2

  37. Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2 -1

  38. Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2 Parameters to learn: -1 – biases (per component) – deformation costs (per part) – filter weights

  39. � � � Linear parameterization � = ( � � , . . . , � � ) � � score( � , � � ) = max � � ( � , � � ) − � � ( � � , � � ) � � ,..., � � � = � � = � Filter scores Spring costs � � ( � , � � ) = w � · φ ( � , � � ) Filter scores � � ( � � , � � ) = d � · ( �� � , �� � , �� , �� ) Spring costs ����� ( � , � � ) = max w · � ( � , ( � � , � ))

  40. Positive examples ( y = +1) x specifies an image and bounding box person We want � w ( � ) = max � ∈ � ( � ) w · � ( � , � ) to score >= +1 � ( � ) includes all z with more than 70% overlap with ground truth

  41. Negative examples ( y = -1) x specifies an image and a HOG pyramid location p 0 p 0 We want � w ( � ) = max � ∈ � ( � ) w · � ( � , � ) to score <= -1 � ( � ) restricts the root to p 0 and allows any placement of the other filters

  42. Typical dataset 300 – 8,000 positive examples 500 million to 1 billion negative examples (not including latent configurations!) Large-scale* *unless someone from google is here

  43. � How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) }

  44. � How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) } � � w � � + � � � ( w ) = � max { � , � � max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � � + � max { � , � + max � ∈ � ( � ) w · � ( � � , � ) } � ∈ �

  45. � How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) } � � w � � + � � � ( w ) = � max { � , � � max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � � + � max { � , � + max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � + score z 4 z 1 z 2 z 3 w convex

Recommend


More recommend