Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras
Goal Learn human-body landmark detectors from unlabelled videos and unaligned annotations Human images Unaligned poses … , … Pose estimate
Model architecture
Autoencoding reconstruction input image code encoder decoder
Autoencoding reconstruction input image code encoder decoder not interpretable L
Filtering geometric information reconstruction input image 2D keypoints encoder decoder
Filtering geometric information reconstruction input image 2D keypoints encoder decoder no appearance information for image reconstruction L
Conditional generation reconstruction input images 2D keypoints geometry decoder encoder appearance encoder
Result: unsupervised 2D keypoints discovery Unsupervised learning of object landmarks through conditional image generation . Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018
Unsupervised 2D keypoints discovered landmarks what we actually want vs. Unsupervised learning of object landmarks through conditional image generation . Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018
Learning to label as image translation reconstruction input image bottleneck encoder decoder discriminator looks like a skeleton?
Image Translation = CycleGAN rgb rgb rgb skeleton skeleton (reconstruction) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks . Zhu and Park et al., 2017.
needs to smuggle appearance to facilitate Cheating CycleGANs the reconstruction reconstruction input image bottleneck encoder decoder discriminator looks like a skeleton?
Cheating CycleGANs source reconstruction bottleneck log-bottleneck The model cheats and encodes appearance information together with geometry smuggling the appearance
Tightening the screw pre-trained offline skeleton reconstruction input image handcrafted bottleneck image keypoint analytical encoder decoder detector renderer style image discriminator appearance encoder looks like a skeleton?
Tightening the screw pre-trained offline skeleton reconstruction input image handcrafted bottleneck image keypoint analytical encoder decoder detector renderer style image discriminator appearance encoder looks like a skeleton?
Our model in detail clean skeleton skeleton detected reconstruction input image images image keypoints analytical keypoint encoder decoder renderer detector unpaired style image skeleton images discriminator appearance encoder looks like a skeleton?
Results
Human pose estimation Simplified Human3.6M Dataset prediction Human3.6M prediction Pennaction prediction
Human pose estimation Pennaction Human3.6M
Unsupervised to labeled keypoints unsupervised methods our method what we actually want directly predicting labelled keypoints discovered landmarks supervised linear regression
Human pose estimation unsupervised discovery + supervised regression 20.0 8.0 %-MSE norm. by image size 7.0 19.5 6.0 19.0 5.0 18.5 MSE in pixels 4.0 no paired data 18.0 3.0 17.5 2.0 17.0 1.0 0.0 16.5 hourglass Thewlis et al. Zhang et al. ours hourglass ours ours (supervised) (supervised) (supervised) Simplified Human3.6M Human3.6M
Ablations 4.5 %-MSE norm. by image size Simplified Human3.6M 4.0 3.5 3.0 2.5 CycleGAN + apperance + clean (analytical) - 2nd cycle = ours conditioning skeleton renderer bottleneck
Disentangling style and geometry
Disentangling style and geometry Mixing appearance and geometry by conditioning on a different identity geometry style reconstruction
Conclusion Learn landmark detectors from unlabeled videos and unaligned pose annotations . Using no paired data / labelled images . Prevent appearance leakage in CycleGAN through: (a) novel bottleneck with a differentiable sketch renderer . (b) Conditioning the generator on an appearance image. Outperform state-of-the-art supervised and unsupervised landmark detectors for human pose. Method factorizes object appearance and geometry → transfer style / pose.
Learning Human Pose from Unaligned Data through Image Translation www.robots.ox.ac.uk/~vgg/ research/unsupervised_pose/ Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras
Recommend
More recommend