Pictorial Structures Revisited: People Detection and Articulated Pose Estimation Mykhaylo Andriluka Stefan Roth Bernt Schiele Department of Computer Science TU Darmstadt Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009
Generic model for human detection and pose estimation Human pose estimation [Felzenszwalb&Huttenlocher, ICCV’05], [Ren et al., ICCV’05], [Sigal&Black, CVPR’06], [Zhang et al., CVPR’06], [Jiang&Marin, CVPR’08], [Ramanan, NIPS’06], [Ferrari et al., CVPR’08], [Ferrari et al., CVPR’09] often rather simple appearance model focus on finding optimal assembly of parts People Detection [Viola et al., ICCV’03], [Dalal&Triggs, CVPR’05], [Leibe et al., CVPR’05], [Andriluka et al., CVPR’08] complex appearance model no pose model or limited to walking motion Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 2
Generic model for human detection and pose estimation Human pose estimation [Felzenszwalb&Huttenlocher, ICCV’05], [Ren et al., ICCV’05], [Sigal&Black, CVPR’06], [Zhang et al., CVPR’06], [Jiang&Marin, CVPR’08], [Ramanan, NIPS’06], [Ferrari et al., CVPR’08], [Ferrari et al., CVPR’09] often rather simple appearance model focus on finding optimal assembly of parts People Detection [Viola et al., ICCV’03], [Dalal&Triggs, CVPR’05], [Leibe et al., CVPR’05], [Andriluka et al., CVPR’08] complex appearance model no pose model or limited to walking motion Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 3
Can we make pictorial structures model effective for these tasks? [Fischler&Elschlager, 1973] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 4
Can we make pictorial structures model effective for these tasks? Yes... if the model components are chosen right. Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 5
Pictorial Structures Model • Body is represented as flexible L configuration of body parts - configuration of parts L = { l 0 , l 1 , . . . , l N } D = { d 0 , d 1 , . . . , d N } - part evidence d i d i posterior over body poses p ( L | D ) ∝ p ( D | L ) p ( L ) prior on body poses likelihood of observations Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 6
Pictorial Structures Model Pictorial structures allow exact and efficient inference. - Gaussian pairwise part - tree-structured prior relationships - independent part appearance model - discretized part locations posterior marginals sum- product BP � p ( l i | D ) ∝ p ( L | D ) L \ l i l 9 l 5 l 6 l 7 l 8 l 10 l 2 l 3 l 1 l 4 Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 7
Can we make pictorial structures model effective for these tasks? So... what are the right components? Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 8
Model Components Appearance Model: Prior and Inference: likelihood estimated orientation 1 of part 1 pose . − 60 . − 40 − 20 . 0 20 Local 40 . 60 Features AdaBoost 80 100 − 50 0 50 ... ... part likelihood posteriors orientation K of part N . . . . Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 9
Model Components Appearance Model: Prior and Inference: likelihood estimated orientation 1 of part 1 pose . − 60 . − 40 − 20 . 0 20 Local 40 . 60 Features AdaBoost 80 100 − 50 0 50 ... ... part likelihood posteriors orientation K of part N . . . . Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 10
Likelihood Model • Build on recent advances in object detection: ‣ state-of-the-art image descriptor: Shape Context [Belongie et al., PAMI’02; Mikolajczyk&Schmid, PAMI’05] ‣ dense representation ‣ discriminative model: AdaBoost classifier for each body part - Shape Context: 96 dimensions (4 angular, 3 radial, 8 gradient orientations) - Feature Vector: concatenate the descriptors inside part bounding box - head: 4032 dimensions - torso: 8448 dimensions Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 11
Likelihood Model • Part likelihood derived from the boosting score: decision stump weight decision stump output �� � t α i,t h t ( x ( l i )) p ( d i | l i ) = max ˜ , ε 0 � t α i,t small constant to deal with part part location occlusions Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 12
Likelihood Model Head Torso Upper leg Input image Our part likelihoods . . . . [Ramanan, NIPS’06] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 13
Likelihood Model Head Torso Upper leg Input image Our part likelihoods . . . . [Ramanan, NIPS’06] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 14
Likelihood Model Head Torso Upper leg Input image Our part likelihoods . . . . [Ramanan, NIPS’06] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 15
Model Components Appearance Model: Prior and Inference: likelihood estimated orientation 1 of part 1 pose . − 60 . − 40 − 20 . 0 20 Local 40 . 60 Features AdaBoost 80 100 − 50 0 50 ... ... part likelihood posteriors orientation K of part N . . . . Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 16
Kinematic Tree Prior • Represent pairwise part relations l 2 [Felzenszwalb & Huttenlocher, IJCV’05] l 1 � p ( L ) = p ( l 0 ) p ( l i | l j ) , ( i,j ) ∈ E p ( l 2 | l 1 ) = N ( T 12 ( l 2 ) | T 21 ( l 1 ) , Σ 12 ) part locations relative transformed to the joint part locations − 50 − 50 − 40 − 40 − 30 − 30 − 20 l 2 − 20 − 10 − 10 + 0 0 l 1 10 10 20 20 30 30 40 40 50 50 − 50 0 50 − 50 0 50 Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 17
Kinematic Tree Prior • Prior parameters: { T ij , Σ ij } • Parameters of the prior are estimated with maximum likelihood mean pose several independent samples − 60 − 60 − 60 − 60 − 40 − 40 − 40 − 40 − 20 − 20 − 20 0 0 0 20 20 20 − 20 40 40 40 60 60 60 0 80 80 80 100 100 100 20 120 120 120 − 80 − 60 − 40 − 20 0 20 40 60 80 − 80 − 60 − 40 − 20 0 20 40 60 80 − 80 − 60 − 40 − 20 0 20 40 60 80 − 60 − 60 − 60 40 − 40 − 40 − 40 − 20 − 20 − 20 60 0 0 0 20 20 20 40 40 40 80 60 60 60 80 80 80 100 100 100 100 120 120 120 − 50 0 50 − 80 − 60 − 40 − 20 0 20 40 60 80 − 80 − 60 − 40 − 20 0 20 40 60 80 − 80 − 60 − 40 − 20 0 20 40 60 80 Figure 2. (left) Kinematic prior learned on the multi-view and Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 18
Evaluation Scenarios 1. Human Pose Estimation “People” dataset [Ramanan, NIPS’06] 2. Upper-body Pose Estimation “Buffy” dataset [Ferrari et al., CVPR’08] 3. Pedestrian Detection “TUD Pedestrians” dataset [Andriluka et al., CVPR’08] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 19
Evaluation Scenarios 1. Human Pose Estimation “People” dataset [Ramanan, NIPS’06] 2. Upper-body Pose Estimation “Buffy” dataset [Ferrari et al., CVPR’08] 3. Pedestrian Detection “TUD Pedestrians” dataset [Andriluka et al., CVPR’08] Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 20
Scenario 1: Qualitative Results (g) (a) (d) Our model 8/10 8/10 7/10 [Ramanan, NIPS’06] 7/10 0/10 3/10 (l) (k) (i) Our model 6/10 7/10 8/10 [Ramanan, NIPS’06] 3/10 3/10 4/10 (bottom). The numbers on the left of Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 21
Scenario 1: Quantitative Results Upper Lower Upper Method Torso Forearm Head Total legs legs arm [Ramanan, NIPS’06] 52 30 29 17 13 37 27 2nd parse Our inference, edge features from 63 48 37 26 20 45 37 [Ramanan, NIPS’06] Our part detectors 29 12 18 3 4 40 14 (SC) Our prior, our part 81 63 55 47 31 75 55 detectors (SC) Our prior, our part 78 58 54 44 31 66 52 detectors (SIFT) Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 22
Scenario 1: Quantitative Results Upper Lower Upper Method Torso Forearm Head Total legs legs arm [Ramanan, NIPS’06] 52 30 29 17 13 37 27 2nd parse Our prior, edge features from 63 48 37 26 20 45 37 [Ramanan, NIPS’06] Our part detectors 29 12 18 3 4 40 14 (SC) Our prior, our part 81 63 55 47 31 75 55 detectors (SC) Our prior, our part 78 58 54 44 31 66 52 detectors (SIFT) Pictorial Structures Revisited: People Detection and Articulated Pose Estimation - CVPR 2009 23
Recommend
More recommend