People-Tracking-by-Detection and People-Detection-by-Tracking Mykhaylo Andriluka Stefan Roth Bernt Schiele Department of Computer Science TU Darmstadt People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008
Motivation • Goal: Detection and tracking of people in complex scenes • Challenges for detection: ‣ Partial occlusions ‣ Appearance variation ‣ Data association difficult • Challenges for tracking: ‣ Dynamic backgrounds ‣ Multiple people ‣ Frequent long term occlusions People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 2
Motivation • Goal: Detection and tracking of people in complex scenes • Challenges for detection: ‣ Partial occlusions ‣ Appearance variation ‣ Data association difficult • Challenges for tracking: ‣ Dynamic backgrounds ‣ Multiple people ‣ Frequent long term occlusions People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 3
Overview Three stages of our multi-person detection and tracking system: 1. Single-frame detection People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 4
Overview Three stages of our multi-person detection and tracking system: 1. Single-frame 2. Tracklet detection detection People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 4
Overview Three stages of our multi-person detection and tracking system: 1. Single-frame 3. Tracking through 2. Tracklet detection detection occlusion People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 4
Previous Work • People Detection & Tracking: ‣ [Fossati et al., CVPR 2007]: 3D articulated tracking aided by detection, single person, ground plane needed. ‣ [Leibe et al., ICCV 2007]: Detection of tracking of multiple people, high viewpoint → no full-body occlusions. ‣ [Ramanan et al., PAMI 2007]: Appearance model learned from people detection, then used for tracking and data association. ‣ [Wu & Nevatia, IJCV 2007]: Use detection for tracking, works for multiple people → no articulations, detector not aided by tracking. • Here: ‣ More people ‣ Significant, long-term full-body occlusions ‣ However: more restricted scenario (2-D, people in side views) People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 5
Overview Three stages of our multi-person detection and tracking system: 1. Single-frame 3. Tracking through 2. Tracklet detection detection occlusion People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 6
Single-frame Detector: partISM • Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 7
Single-frame Detector: partISM • Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 7
Single-frame Detector: partISM • Appearance of parts: Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x o People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 7
Single-frame Detector: partISM • Appearance of parts: x 8 Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x 7 x o x 3 x 6 x 2 x 5 x 4 x 1 People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 8
Single-frame Detector: partISM • Appearance of parts: x 8 Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x 7 • Part decomposition and inference: x o Pictorial structures model [Felzenszwalb & Huttenlocher, IJCV 2005] x 3 x 6 x 2 x 5 x 4 x 1 People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 8
Single-frame Detector: partISM • Appearance of parts: x 8 Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x 7 • Part decomposition and inference: x o Pictorial structures model [Felzenszwalb & Huttenlocher, IJCV 2005] x 3 x 6 x 2 x 5 x 4 x 1 People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 8
Single-frame Detector: partISM • Appearance of parts: x 8 Implicit Shape Model (ISM) [Leibe, Seemann & Schiele, CVPR 2005] x 7 • Part decomposition and inference: x o Pictorial structures model [Felzenszwalb & Huttenlocher, IJCV 2005] x 3 x 6 x 2 x 5 p ( L | E ) ∝ p ( E | L ) p ( L ) x 4 x 1 Body-part positions Image evidence People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 8
Part Decomposition • - configuration of L = { x o , x 1 , . . . , x 8 } x 8 body parts • Structure of the prior distribution : p ( L ) x 7 ‣ Articulation variable models correlations a x o between part positions. ‣ Given articulation, prior on configuration x 3 x 6 becomes a star model. x 2 x 5 articulation x 4 x 1 part position a x i x o object center People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 9
Part Decomposition • - configuration of L = { x o , x 1 , . . . , x 8 } x 8 body parts • Structure of the prior distribution : p ( L ) x 7 ‣ Articulation variable models correlations a x o between part positions. ‣ Given articulation, prior on configuration x 3 x 6 becomes a star model. x 2 x 5 articulation x 4 x 1 part position a x i x o object center People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 9
Part Decomposition • - configuration of L = { x o , x 1 , . . . , x 8 } x 8 body parts • Structure of the prior distribution : p ( L ) x 7 ‣ Articulation variable models correlations a x o between part positions. ‣ Given articulation, prior on configuration x 3 x 6 becomes a star model. x 2 x 5 articulation x 4 x 1 part position a x i x o object center People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 9
Part Decomposition • - configuration of L = { x o , x 1 , . . . , x 8 } body parts • Structure of the prior distribution : p ( L ) ‣ Articulation variable models correlations a between part positions. ‣ Given articulation, prior on configuration becomes a star model. articulation part position Covariance and mean part a p ( x i | x o ) positions for . x i x o object center People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 10
Single Frame Detection • Detections at equal error rate: HOG 4D-ISM partISM People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 11
Single-frame Detection Results TUD pedestrians data No occlusions • partISM clearly outperforms 4D-ISM [Seemann et al, DAGM’06] . • Outperforms HOG [Dalal&Triggs, CVPR’05] with much less training data (Note: we only use sideviews). People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 12
Overview Three stages of our multi-person detection and tracking system: 1. Single-frame 3. Tracking through 2. Tracklet detection detection occlusion People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 13
Tracklet Detection in Short Subsequences frame 2 frame m frame 1 • Given: E = [ E 1 , . . . , E m ] ... • Want: overlapping subsequences x 8 x 7 x o x 3 x 6 x 2 x 5 x 4 x 1 • Posterior over positions and configurations: People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 14
Tracklet Detection in Short Subsequences frame 2 frame m frame 1 • Given: E = [ E 1 , . . . , E m ] ... • Want: overlapping subsequences x 8 x 7 X o ∗ = [ x o ∗ x o x o 1 , . . . , x o ∗ m ] x 3 x 6 body positions x 2 x 5 x 4 x 1 • Posterior over positions and configurations: People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 14
Tracklet Detection in Short Subsequences frame 2 frame m frame 1 • Given: E = [ E 1 , . . . , E m ] ... • Want: overlapping subsequences x 8 x 7 0 Y ∗ = [ y ∗ X o ∗ = [ x o ∗ 50 1 , . . . , y ∗ m ] x o x o 1 , . . . , x o ∗ m ] 100 x 3 x 6 body configurations 150 body positions x 2 x 5 200 x 4 250 x 1 − 200 − 150 − 100 − 50 0 50 100 • Posterior over positions and configurations: People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 14
Tracklet Detection in Short Subsequences frame 2 frame m frame 1 • Given: E = [ E 1 , . . . , E m ] ... • Want: overlapping subsequences x 8 x 7 0 Y ∗ = [ y ∗ X o ∗ = [ x o ∗ 50 1 , . . . , y ∗ m ] x o x o 1 , . . . , x o ∗ m ] 100 x 3 x 6 body configurations 150 body positions x 2 x 5 200 x 4 250 x 1 − 200 − 150 − 100 − 50 0 50 100 • Posterior over positions and configurations: p ( X o ∗ , Y ∗ | E ) ∝ p ( E | X o ∗ , Y ∗ ) p ( X o ∗ ) p ( Y ∗ ) . People-Tracking-by-Detection and People-Detection-by-Tracking - CVPR 2008 14
Recommend
More recommend