lifting from the deep convolutional 3d pose estimation
play

Lifting from the Deep: Convolutional 3D Pose Estimation from a - PowerPoint PPT Presentation

Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tom Chris Russell Lourdes Agapito We introduce a novel approach to solve the problem of 3D human pose estimation from a single RGB image Input


  1. Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image Denis Tomè Chris Russell Lourdes Agapito

  2. We introduce a novel approach to solve the problem of 3D human pose estimation from a single RGB image Input image Output 3D Pose

  3. Our method reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks .

  4. Our approach • First, we learn a probabilistic model of 3D human pose from 3D mocap data • We integrate this model within a novel end-to-end CNN architecture for joint 2D and 3D human pose estimation • Our method achieves state-of-the-art results on the Human3.6M probabilistic 2D landmarks 3D pose dataset 3D pose model This model lifts 2D joint positions ( landmarks ) into 3D

  5. Our approach • Next, we train a novel end-to-end multi-stage CNN for 2D landmark estimation Stage6 Stage2 Stage1

  6. Our approach • Next, we train a novel end-to-end multi-stage CNN for 2D landmark estimation Stage6 Stage2 Stage1 • Each stage includes a new layer based on our probabilistic 3D pose model of human poses to enforce 3D pose constraints

  7. Detailed architecture

  8. 2D Joint Feature prediction extraction Convolutional layers 2D joint feature extraction prediction C C C C P C P C P C 1 x 1 9 x 9 1 x 1 9 x 9 2 x 9 x 9 2 x 9 x 9 2 x 5 x 5

  9. 2D Joint Feature prediction extraction belief maps For each landmark, a 2D belief map is generated This defines how confident the architecture is that a specific landmark occurs at any given pixel (u,v) of the input image

  10. 2D Joint Feature prediction extraction belief maps Our pre-learned probabilistic model lifts 2D landmarks into 3D and injects 3D pose information 3D pose Probabilistic 3D pose model

  11. 2D Joint Feature prediction extraction belief maps belief maps The 3D pose is used to generate a new set of 2D belief maps 3D pose Probabilistic 3D pose model

  12. Belief maps are 2D Joint fused Feature prediction extraction together belief maps 2D FUSION belief maps 3D pose Probabilistic 3D pose model

  13. 2D Joint Feature prediction extraction belief maps 2D FUSION belief maps 3D pose Probabilistic 3D pose model

  14. 2D Joint Feature prediction extraction belief maps 2D FUSION belief maps STAGE t=1 Probabilistic 3D pose model 3D pose

  15. The 2D belief maps from each stage are used as input to the next stage STAGE t=6 STAGE t=3 2D Joint Feature 2D Joint Shared feature prediction extraction prediction extraction belief maps belief maps 2D 2D FUSION FUSION belief maps belief maps STAGE t=1 Probabilistic 3D pose model 3D pose Probabilistic 3D pose model 3D pose STAGE t=2 The accuracy of the belief maps increases progressively through the stages

  16. Output Belief maps STAGE t=6 Probabilistic STAGE t=3 3D pose model 2D Joint Feature 2D Joint Shared feature prediction extraction prediction extraction belief maps belief maps 2D 2D FUSION FUSION belief maps belief maps STAGE t=1 Probabilistic 3D pose model 3D pose Probabilistic 3D pose model 3D pose STAGE t=2 Output Output 3D pose 3D Pose End-to-end learning Final lifting

  17. Our approach achieves state-of-the-art results on the Human3.6M dataset

  18. Example results on the Human3.6M dataset

Recommend


More recommend