category level localization g y
play

Category-level localization g y Cordelia Schmid Cordelia Schmid - PowerPoint PPT Presentation

Category-level localization g y Cordelia Schmid Cordelia Schmid Recognition Recognition Classification Classification Object present/absent in an image Often presence of a significant amount of background clutter


  1. Category-level localization g y Cordelia Schmid Cordelia Schmid

  2. Recognition Recognition • • Classification Classification – Object present/absent in an image – Often presence of a significant amount of background clutter • Localization / Detection – Localize object within the frame – Bounding box or pixel- level segmentation

  3. Pixel-level object classification Pixel level object classification

  4. Difficulties Difficulties • Intra-class variations Intra class variations • Scale and viewpoint change • Multiple aspects of categories

  5. Approaches Approaches • Intra-class variation Intra class variation => Modeling of the variations, mainly by learning from a large dataset for example by SVMs large dataset, for example by SVMs • Scale + limited viewpoints changes • Scale + limited viewpoints changes => multi-scale approach or invariant local features • Multiple aspects of categories => separate detectors for each aspect, front/profile face, > separate detectors for each aspect front/profile face build an approximate 3D “category” model

  6. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

  7. Sliding window detector • Basic component: binary classifier Car/non-car Classifier Yes, No, not a car a car t

  8. Sliding window detector • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale

  9. Sliding window detector • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale

  10. Detection by Classification • Detect objects in clutter by search Car/non-car Classifier • Sliding window : exhaustive search over position and scale Sliding window : exhaustive search over position and scale (can use same size window over a spatial pyramid of images)

  11. Feature Extraction Classification Detection Does the image contain a car? Does the image contain a car? • Classification: Unknown location + clutter ) lots of invariance • Detection: Uncluttered, normalized image ) more “detail”

  12. Window (Image) Classification Training Data  Feature Classifier    Extraction   Car/Non-car • Features usually engineered • Classifier learnt from data

  13. Problems with sliding windows … • aspect ratio • granularity (finite grid) • granularity (finite grid) • partial occlusion • multiple responses

  14. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients (HOG) 4. State of the art algorithms and PASCAL VOC

  15. BOW + Spatial pyramids Start from BoW for region of interest (ROI) • no spatial information recorded no spatial information recorded • sliding window detector B Bag of Words f W d     Feature Vector      

  16. Adding Spatial Information to Bag of Words Bag of Words C Concatenate t t         Feature Vector     Keeps fixed length feature vector for a window

  17. Spatial Pyramid – represent correspondence  1 BoW      4 BoW          16 BoW 16 BoW      

  18. Dense Visual Words • Why extract only sparse image fragments? fragments? • Good where lots of invariance is needed, but not relevant to sliding window detection? • Extract dense visual words on an overlapping grid Quantize   Word    Patch / SIFT • More “detail” at the expense of invariance

  19. Outline 1. Sliding window detectors S 2. Features and adding spatial information g p 3. Histogram of Oriented Gradients + linear SVM classifier 4. State of the art algorithms and PASCAL VOC

  20. Feature: Histogram of Oriented Gradients (HOG) Gradients (HOG) dominant HOG image direction ency • tile 64 x 128 pixel window into 8 x 8 pixel cells tile 64 x 128 pixel window into 8 x 8 pixel cells freque • each cell represented by histogram over 8 orientation bins (i.e. angles in range 0-180 degrees) orientation orientation

  21. Histogram of Oriented Gradients (HOG) continued • Adds a second level of overlapping spatial bins re • Adds a second level of overlapping spatial bins re- normalizing orientation histograms over a larger spatial area • Feature vector dimension (approx) = 16 x 8 (for tiling) x 8 (orientations) x 4 (for blocks) = 4096 (orientations) x 4 (for blocks) 4096

  22. Window (Image) Classification Training Data  Feature Classifier    Extraction   pedestrian/Non-pedestrian • HOG Features • Linear SVM classifier

  23. Averaged examples

  24. Dalal and Triggs, CVPR 2005

  25. positive training data average over g p f( x )  w T x  b Learned model

  26. Training a sliding window detector g g • • Unlike training an image classifier there are a (virtually) Unlike training an image classifier, there are a (virtually) infinite number of possible negative windows • Training (learning) generally proceeds in three distinct Training (learning) generally proceeds in three distinct stages: 1. Bootstrapping: learn an initial window classifier from 1 B i l i i i l i d l ifi f positives and random negatives 2. Hard negatives: use the initial window classifier for detection on the training images (inference) and identify false positives with a high score false positives with a high score 3. Retraining: use the hard negatives as additional t training data i i d t

  27. Training a sliding window detector • Object detection is inherently asymmetric: much more “non-object” than “object” data non object than object data • Classifier needs to have very low false positive rate • Non-object category is very complex – need lots of data • Non-object category is very complex – need lots of data

  28. Bootstrapping 1. Pick negative training set at random set at random 2. Train classifier 3 3. Run on training data Run on training data 4. Add false positives to training set training set 5. Repeat from 2 • Collect a finite but diverse set of non-object windows • Force classifier to concentrate on hard negative examples • For some classifiers can ensure equivalence to training on For some classifiers can ensure equivalence to training on entire data set

  29. Example: train an upper body detector – Training data – used for training and validation sets • 33 Hollywood2 training movies 33 Hollywood2 training movies • 1122 frames with upper bodies marked – First stage training (bootstrapping) • 1607 upper body annotations jittered to 32k positive samples • • 55k negatives sampled from the same set of frames 55k negatives sampled from the same set of frames – Second stage training (retraining) • 150k hard negatives found in the training data

  30. Training data Training data – positive annotations positive annotations

  31. Positive windows Note: common size and alignment

  32. Jittered positives

  33. Jittered positives

  34. Random negatives

  35. Random negatives

  36. Window (Image) first stage classification Linear SVM  HOG Feature HOG Feature Jittered positives Jittered positives  Classifier  Extraction  random negatives f( x )  w T x  b  x • find high scoring false positives detections find high scoring false positives detections • these are the hard negatives for the next round of training • these are the hard negatives for the next round of training • cost = # training images x inference on each image cost = # training images x inference on each image

  37. Hard negatives

  38. Hard negatives

  39. First stage performance on validation set

  40. Precision – Recall curve correct returned windows windows windows windows • Precision: % of returned windows that are correct are correct • Recall: % of correct windows that are • Recall: % of correct windows that are returned all windows 1 0.8 classifier score decreasing 0.6 on precisio 0.4 0 2 0.2 0 0 0.2 0.4 0.6 0.8 1 recall

  41. Effects of retraining

  42. Side by side before retraining after retraining

  43. Side by side before retraining after retraining

  44. Accelerating Sliding Window Search • Sliding window search is slow because so many windows are needed e g x × y × scale ≈ 100 000 for a 320×240 image needed e.g. x × y × scale 100,000 for a 320×240 image • Most windows are clearly not the object class of interest • Can we speed up the search?

  45. Cascaded Classification • Build a sequence of classifiers with increasing complexity More complex, slower, lower false positive rate Classifier Classifier Classifier Possibly a Possibly a Face 1 1 2 2 N N face face face face Window Non-face Non-face Non-face • Reject easy non-objects using simpler and faster classifiers

Recommend


More recommend