histogram of oriented gradients hog for object detection
play

Histogram of Oriented Gradients (HOG) for Object Detection Navneet - PowerPoint PPT Presentation

Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety of articulated poses n


  1. Histogram of Oriented Gradients (HOG) for Object Detection Navneet DALAL Joint work with Bill TRIGGS and Cordelia SCHMID

  2. Goal & Challenges Goal: Detect and localise people in images and videos n Wide variety of articulated poses n Variable appearance and clothing n Complex backgrounds n Unconstrained illumination n Occlusions, different scales n Videos sequences involves motion of the subject, the camera and the objects in the background Main assumption: upright fully visible people 2

  3. Chronology n Haar Wavelets as features + AdaBoost for learning u Viola & Jones, ICCV 2001 u De-facto standard for detecting faces in images n Another approach: Haar wavelets + SVM: u Papageorgiou & Poggio, 2000; Mohan et al 2000 -1 +1 -2 +1 +1 3

  4. Chronology n Edge templates from Gavrila et al n Based on Information bottleneck principle of Tishby et al n Maximize MI between edge fragments & detection task J Supports irregular shapes & partial occlusions J Window free framework L Sensitive to edge detection & edge threshold L Not resistant to local illumination changes L Needs segmented positive images At par with then s-o-a 4

  5. Chronology n Key point detectors repeat on backgrounds n Key point detectors do not repeat on people, even when looking at two consecutive frames of a video n Leibe et al, 2005; Mikolajczyk et al, 2004 Needed a different approach 5

  6. Overview of Methodology Detection Phase Scale-space pyramid Scan image(s) at all ` scales and locations Extract features over windows Run linear SVM Detection window classifier on all locations Fuse multiple Focus on building robust detections in 3-D position & scale space feature sets (static & motion) Object detections with bounding boxes 6

  7. HOG for Finding People in Images 7 N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005

  8. Static Feature Extraction Input image Detection window Normalise gamma Compute gradients Cell Weighted vote in spatial & orientation cells Block Contrast normalise over Overlap overlapping spatial cells of Blocks Collect HOGs over detection window Feature vector f = [ ..., ..., ...] Linear SVM 8 N. Dalal and B. Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005

  9. Overview of Learning Phase Learning phase Resample negative training Input: Annotations on training images to create hard images examples Create fixed-resolution Encode images into feature normalised training image spaces data set Encode images into feature Learn binary classifier spaces Object/Non-object decision Learn binary classifier Retraining reduces false positives by an order of magnitude! 9

  10. HOG Descriptors Parameters Schemes n Gradient scale n RGB or Lab, colour/gray-space n Orientation bins n Block normalisation L2 -norm, n Percentage of block 2 v v / v or overlap ← + ε 2 L1 -norm, v v /( v ) ← + ε 1 Block R-HOG/SIFT C-HOG Center bin Cell 10

  11. Evaluation Data Sets MIT pedestrian database INRIA person database 507 positive windows 1208 positive windows Train Train Negative data unavailable 1218 negative images 200 positive windows 566 positive windows Test Test Negative data unavailable 453 negative images Overall 709 annotations+ Overall 1774 annotations+ reflections reflections 11

  12. Overall Performance MIT pedestrian database INRIA person database n R/C-HOG give near perfect separation on MIT database n Have 1-2 order lower false positives than other descriptors 12

  13. Performance on INRIA Database 13

  14. Effect of Parameters Gradient smoothing, σ Orientation bins, β n Reducing gradient scale n Increasing orientation bins from 3 to 0 decreases false from 4 to 9 decreases false positives by 10 times positives by 10 times 14

  15. Normalisation Method & Block Overlap Normalisation method Block overlap n Strong local normalisation n Overlapping blocks improve is essential performance, but descriptor size increases 15

  16. Effect of Block and Cell Size 64 128 n Trade off between need for local spatial invariance and need for finer spatial resolution 16

  17. Descriptor Cues Input Average Weighted Weighted Outside-in example gradients pos wts neg wts weights n Most important cues are head, shoulder, leg silhouettes n Vertical gradients inside a person are counted as negative n Overlapping blocks just outside the contour are most important 17

  18. Multi-Scale Object Localisation Bias Clip Detection Score Multi-scale dense scan of detection window s (in log) y x Threshold [exp( s ) , exp( s ) , ] Η = σ σ σ i i x i y s 2 n ⎛ ⎞ 1 f ( x ) w exp ( x x ) / − / 2 ∑ = − − Η ⎜ ⎟ i i i i ⎝ ⎠ Apply robust mode detection, Final detections like mean shift 18

  19. Effect of Spatial Smoothing n Spatial smoothing aspect ratio as per window shape, smallest sigma approx. equal to stride/cell size n Relatively independent of scale smoothing, sigma equal to 0.4 to 0.7 octaves gives good results 19

  20. Effect of Other Parameters Different mappings Effect of scale-ratio n Hard clipping of SVM scores n Fine scale sampling helps improve gives the best results than simple recall probabilistic mapping of these scores 20

  21. HOGs vs approaches till date … HOG still among the best detector in terms of FPPI (b) Typical aspect ratios n See Dollar et al, 1 CVPR 2009 0.9 “ Pedestrian 0.8 Detection: A 0.7 Benchmark ” miss rate 0.6 0.5 VJ (0.85) HOG (0.44) 0.4 FtrMine (0.55) Shapelet (0.80) MultiFtr (0.54) LatSvm (0.63) HikSvm (0.77) 0.3 − 2 − 1 0 1 2 10 10 10 10 10 false positives per image 21

  22. Results Using Static HOG 22 No temporal smoothing of detections

  23. Conclusions for Static Case n Fine grained features improve performance u Rectify fine gradients then pool spatially • No gradient smoothing, [1 0 -1] derivative mask • Orientation voting into fine bins • Spatial voting into coarser bins u Use gradient magnitude (no thresholding) u Strong local normalization u Use overlapping blocks u Robust non-maximum suppression • Fine scale sampling, hard clipping & anisotropic kernel J Human detection rate of 90% at 10 -4 false positives per window L Slower than integral images of Viola & Jones, 2001 23

  24. Applications to Other Classes 24 M. Everingham et al. The 2005 PASCAL Visual Object Classes Challenge. Proceedings of the PASCAL Challenge Workshop, 2006.

  25. Motion HOG for Finding People in Videos 25 N. Dalal, B. Triggs and C. Schmid. Human Detection Using Oriented Histograms of Flow and Appearance . ECCV, 2006.

  26. Finding People in Videos n Motivation u Human motion is very characteristic n Requirements u Must work for moving camera and background u Robust coding of relative motion of human parts n Previous works Courtesy: R. Blake u Viola et al, 2003 Vanderbilt Univ u Gavrila et al, 2004 u Efros et al, 2003 26 N. Dalal, B. Triggs and C. Schmid. Human Detection Using Oriented Histograms of Flow and Appearance . ECCV, 2006.

  27. Handling Camera Motion n Camera motion characterisation u Pan and tilt is locally translational u Rest is depth induced motion parallax n Use local differential of flow u Cancels out effects of camera rotation u Highlights 3D depth boundaries u Highlights motion boundaries n Robust encoding into oriented histograms u Some focus on capturing motion boundaries u Other focus on capturing internal motion or relative dynamics of different limbs 27

  28. Motion HOG Processing Chain Input image Consecutive image Detection windows Normalise gamma & colour Flow field Magnitude of flow Compute optical flow Compute differential flow Differential flow X Differential flow Y Accumulate votes for differential flow orientation over spatial cells Cell Block Normalise contrast within overlapping blocks of cells Overlap of Blocks Collect HOGs for all blocks over detection window 28

  29. Overview of Feature Extraction Input image Consecutive image(s) Appearance Channel Channel Motion Static HOG Motion HOG Encoding Encoding Collect HOGs over Data Set detection window 5 DVDs, 182 shots Train Linear SVM 5562 positive windows Same 5 DVDs, 50 shots Test 1 Object/Non-object decision 1704 positive windows 6 new DVDs, 128 shots Test 2 2700 positive windows 29

  30. Coding Motion Boundaries n Treat x , y -flow components as independent images n Take their local gradients separately, and compute HOGs as in static images First Second Estd. Flow frame frame flow mag. Motion Boundary Histograms (MBH) encode depth and motion Avg. Avg. x -flow y -flow boundaries diff diff x -flow y -flow diff diff 30

  31. Coding Internal Dynamics n Ideally compute relative displacements of different limbs u Requires reliable part detectors n Parts are relatively localised in our detection windows n Allows different coding schemes based on fixed spatial differences Internal Motion Histograms (IMH) encode relative dynamics of different regions 31

Recommend


More recommend