Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tomè Chris Russell Lourdes Agapito
We introduce a novel approach to solve the problem of 3D human pose estimation from a single RGB image Input image Output 3D Pose
Our method reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks .
Our approach • First, we learn a probabilistic model of 3D human pose from 3D mocap data • We integrate this model within a novel end-to-end CNN architecture for joint 2D and 3D human pose estimation • Our method achieves state-of-the-art results on the Human3.6M probabilistic 2D landmarks 3D pose dataset 3D pose model This model lifts 2D joint positions ( landmarks ) into 3D
Our approach • Next, we train a novel end-to-end multi-stage CNN for 2D landmark estimation Stage6 Stage2 Stage1
Our approach • Next, we train a novel end-to-end multi-stage CNN for 2D landmark estimation Stage6 Stage2 Stage1 • Each stage includes a new layer based on our probabilistic 3D pose model of human poses to enforce 3D pose constraints
Detailed architecture
2D Joint Feature prediction extraction Convolutional layers 2D joint feature extraction prediction C C C C P C P C P C 1 x 1 9 x 9 1 x 1 9 x 9 2 x 9 x 9 2 x 9 x 9 2 x 5 x 5
2D Joint Feature prediction extraction belief maps For each landmark, a 2D belief map is generated This defines how confident the architecture is that a specific landmark occurs at any given pixel (u,v) of the input image
2D Joint Feature prediction extraction belief maps Our pre-learned probabilistic model lifts 2D landmarks into 3D and injects 3D pose information 3D pose Probabilistic 3D pose model
2D Joint Feature prediction extraction belief maps belief maps The 3D pose is used to generate a new set of 2D belief maps 3D pose Probabilistic 3D pose model
Belief maps are 2D Joint fused Feature prediction extraction together belief maps 2D FUSION belief maps 3D pose Probabilistic 3D pose model
2D Joint Feature prediction extraction belief maps 2D FUSION belief maps 3D pose Probabilistic 3D pose model
2D Joint Feature prediction extraction belief maps 2D FUSION belief maps STAGE t=1 Probabilistic 3D pose model 3D pose
The 2D belief maps from each stage are used as input to the next stage STAGE t=6 STAGE t=3 2D Joint Feature 2D Joint Shared feature prediction extraction prediction extraction belief maps belief maps 2D 2D FUSION FUSION belief maps belief maps STAGE t=1 Probabilistic 3D pose model 3D pose Probabilistic 3D pose model 3D pose STAGE t=2 The accuracy of the belief maps increases progressively through the stages
Output Belief maps STAGE t=6 Probabilistic STAGE t=3 3D pose model 2D Joint Feature 2D Joint Shared feature prediction extraction prediction extraction belief maps belief maps 2D 2D FUSION FUSION belief maps belief maps STAGE t=1 Probabilistic 3D pose model 3D pose Probabilistic 3D pose model 3D pose STAGE t=2 Output Output 3D pose 3D Pose End-to-end learning Final lifting
Our approach achieves state-of-the-art results on the Human3.6M dataset
Example results on the Human3.6M dataset
Recommend
More recommend