Human Pose Estimation with Deep Learning Wei Yang
Applications Understand Activities Family Robots 2 American Heist (2014) - The Bank Robbery Scene
What do we need to know to recognize a crime scene? 3
stand stand Cues Scene: bank Abnormal pose Lay down Activity: robbery Hands up Lay down Lay down 4
Why is human pose estimation challenging? 5
#1. Articulation #2. Occlusion #3. Scale variation 6
#1. Articulation #2. Occlusion #3. Scale variation 7
#1. Articulation #2. Occlusion #3. Scale variation 8
Applications Understand Activities Family Robots 9
3D Human Poses Real-Time Imitation of Human Whole-Body Motions by Humanoids. J. Koenemann, F. Burget, and M. Bennewitz. ICRA, 2014. 10
Deep Learning Based Methods Fully Convolutional Network π heatmaps πΌ π 2 Regression with Euclidean Loss: π = 1 π ΰ·‘ 2 Ο π=1 πΌ π β πΌ π 2 where ΰ·‘ πΌ π βΌ π π π , Ξ£ , π‘. π’. , π = 1, β― , π 11
Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 12
Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 13
Why the Scale Matters? 14 Yipin Yang, Yao Yu, Yu Zhou, Sidan Du, James Davis, Ruigang Yang. Semantic Parametric Reshaping of Human Body Models. In 3DV Workshop on Dynamic Shape Measurement and Analysis, 2014.
Why the Scale Matters? Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 15
Previous work Multi-scal scale e testing ng Multi-branch anch network ork The model itself is not Need much more memory scale invariant and computation Felzenszwalb, Pedro F., et al. "Object detection with Tompson, Jonathan, et al. "Efficient object localization using discriminatively trained part-based models." TPAMI, 2010 . convolutional networks." CVPR . 2015. 16
Hourglass Newell A, Yang K, Deng J. Stacked hourglass networks for human pose estimation[C]//European 17 Conference on Computer Vision. Springer, Cham, 2016: 483-499.
Pyramid Residual Modules 256x256 128 Γ 128 (a) π² (π) Stack 1 Stack π 64 Γ 64 PRM + Pool Conv PRM Hourglass Hourglass Identity Mapping Ratio 1 Ratio π (b) π π 1 π· π 0 π Detailed hourglass structure π² (π+1) Convolution Pyramid Residual module Score maps Addition 18 Newell et al. Stacked Hourglass Networks for Human Pose Estimation. ECCV, 2016
Initialization of Multi-Branch Networks Single le-br branch anch networks Multi-branch anch network orks VGG Inceptions Traditional weight initialization methods, e.g., Gaussian, Xavier, MSRA (Kaiming), are not applicable for multi-branch networks . Xavier Glorot, Yoshua Bengio ; Proceedings of the Thirteenth International Conference on Artificial 19 Intelligence and Statistics, PMLR 9:249-256, 2010.
Initialization of Multi-Branch Networks (π) π² 2 π² (π) (π) (π) π² 1 π² π π Backward Forward Conv / FC Conv / FC (π) π³ 2 (π) (π) π³ (π) π³ 1 π³ π π (π) (π) π· π π· π (π) + π (π) Ξπ² (π) = ΰ· π π π Ξπ³ (π) π³ (π) = π (π) ΰ· π² π π=1 π=1 π² (π+1) = π(π³ (π) ) Ξπ³ (π) = π β² (π³ π )Ξπ² (π+1) π π π π Var π π π π π π Var π π π½π· π = 1 π½π· π = 1 * π½ = 0.5 for ReLU and 1 for Tanh and Sigmoid. 20
Initialization of Multi-Branch Networks MSR init Ours init 1.1 0.9 0.7 OUTPUT STD 0.5 0.3 0.1 1 2 3 4 5 6 7 8 9 10 11 -0.1 LAYER INDEX 21 He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." ICCV. 2015.
Qualitative Results MPII dataset LSP dataset 22
Evaluation Metric PCK : Percentage of Correct Keypoints π½ β max(β, π₯) 23
Results on MPII Human Pose State-of-the-art performance 24 http://human-pose.mpi-inf.mpg.de/#results
Image Classification Top-1 Test Error on CIFAR-10 25
Semantic Segmentation: PASCAL VOC 2012 dataset (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM (a) Image (b) DeepLab (c) DeepLap+PRM
Section Summary β’ Feature pyramid module β’ Generalizable for various networks and tasks β’ Weight initialization for multi-branch networks Learning Feature Pyramids for Human Pose Estimation Wei Yang , Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang ICCV, 2017 27
Outline Scale 3D Pose Gray Black Feature pyramid In-the-wild 3D learning pose estimation ICCV 2017 CVPR 2018 28
Challenges: No Annotation Constrai ained ned scenes nes In In-the-wil ild scenes es Domain No annotation Discrepancy 29
Which one is more plausible? Discriminator 30
Weakly Supervised Adversarial Learning Images w/o GT 3D dataset Real Fake π» πΈ 3D Human Pose Estimator Multi-source Discriminator Prediction Ground-truth 31
Adversarial Learning Fool Generator Discriminator π΄πππ π― π΄πππ π¬ Tell Euclidean Loss Classification Loss 32
Generator Depth module 2D module 256x256 128 Γ 128 Stack 1 Stack π 64 Γ 64 Residual Residual Depth Conv Hourglass β¦ 2D score maps 3D Poses 33
Discriminator 34
Multi-Source Discriminator Real or Fake samples CNN Image π½ 256 Fully Connected layers Real Geometric CNN π descriptor π [Ξπ¦ 2 , Ξπ§ 2 , Ξπ¨ 2 ] [Ξπ¦, Ξπ§, Ξπ¨] Fake CNN Raw poses 64 64 Concatenation 2D Heatmaps Depthmaps 35
Effectiveness of Adversarial Learning 36
Ablation Study on H36M Dataset MPJPE (error in mm) on H36M 8 % less error Image+Pose+Geo 59.7 (Ours) Image+Geo 60.3 Image+Pose 61.3 64.8 Jointly learn 2D + depth 65.2 Fix 2D, finetune depth 64.9 Zhou et al. ICCVβ17 58 60 62 64 66 Full Geo Pose Baseline Baseline (fix 2D) State-of-art* 37 *Zhou et al. ICCVβ17
Results on Images in the Wild baseline Ours 38
Multi-view Results 39
Section Summary β’ Weakly supervised adversarial learning for 3D pose estimation in the wild β’ Multi-source discriminator 3D Human Pose Estimation in the Wild by Adversarial Learning Wei Yang , Wanli Ouyang, Xiaolong Wang, Hongsheng Li, Xiaogang Wang CVPR, 2018 40
Code β’ Open-source PyTorch code β’ https://github.com/bearpaw/pytorch-pose β’ ICCV 17 β’ https://github.com/bearpaw/PyraNet 41
Thanks! wyang@ee.cuhk.edu.hk http://www.ee.cuhk.edu.hk/~wyang/ @bearpaw 42
Recommend
More recommend