learning human pose from unaligned data through image
play

Learning Human Pose from Unaligned Data through Image Translation - PowerPoint PPT Presentation

Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras Goal Learn human-body landmark detectors from unlabelled videos and unaligned


  1. Learning Human Pose from Unaligned Data through Image Translation Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras

  2. Goal Learn human-body landmark detectors from unlabelled videos and unaligned annotations Human images Unaligned poses … , … Pose estimate

  3. Model architecture

  4. Autoencoding reconstruction input image code encoder decoder

  5. Autoencoding reconstruction input image code encoder decoder not interpretable L

  6. Filtering geometric information reconstruction input image 2D keypoints encoder decoder

  7. Filtering geometric information reconstruction input image 2D keypoints encoder decoder no appearance information for image reconstruction L

  8. Conditional generation reconstruction input images 2D keypoints geometry decoder encoder appearance encoder

  9. Result: unsupervised 2D keypoints discovery Unsupervised learning of object landmarks through conditional image generation . Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

  10. Unsupervised 2D keypoints discovered landmarks what we actually want vs. Unsupervised learning of object landmarks through conditional image generation . Jakab, Gupta, Bilen, Vedaldi. Proc. NeurIPS, 2018

  11. Learning to label as image translation reconstruction input image bottleneck encoder decoder discriminator looks like a skeleton?

  12. Image Translation = CycleGAN rgb rgb rgb skeleton skeleton (reconstruction) Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks . Zhu and Park et al., 2017.

  13. needs to smuggle appearance to facilitate Cheating CycleGANs the reconstruction reconstruction input image bottleneck encoder decoder discriminator looks like a skeleton?

  14. Cheating CycleGANs source reconstruction bottleneck log-bottleneck The model cheats and encodes appearance information together with geometry smuggling the appearance

  15. Tightening the screw pre-trained offline skeleton reconstruction input image handcrafted bottleneck image keypoint analytical encoder decoder detector renderer style image discriminator appearance encoder looks like a skeleton?

  16. Tightening the screw pre-trained offline skeleton reconstruction input image handcrafted bottleneck image keypoint analytical encoder decoder detector renderer style image discriminator appearance encoder looks like a skeleton?

  17. Our model in detail clean skeleton skeleton detected reconstruction input image images image keypoints analytical keypoint encoder decoder renderer detector unpaired style image skeleton images discriminator appearance encoder looks like a skeleton?

  18. Results

  19. Human pose estimation Simplified Human3.6M Dataset prediction Human3.6M prediction Pennaction prediction

  20. Human pose estimation Pennaction Human3.6M

  21. Unsupervised to labeled keypoints unsupervised methods our method what we actually want directly predicting labelled keypoints discovered landmarks supervised linear regression

  22. Human pose estimation unsupervised discovery + supervised regression 20.0 8.0 %-MSE norm. by image size 7.0 19.5 6.0 19.0 5.0 18.5 MSE in pixels 4.0 no paired data 18.0 3.0 17.5 2.0 17.0 1.0 0.0 16.5 hourglass Thewlis et al. Zhang et al. ours hourglass ours ours (supervised) (supervised) (supervised) Simplified Human3.6M Human3.6M

  23. Ablations 4.5 %-MSE norm. by image size Simplified Human3.6M 4.0 3.5 3.0 2.5 CycleGAN + apperance + clean (analytical) - 2nd cycle = ours conditioning skeleton renderer bottleneck

  24. Disentangling style and geometry

  25. Disentangling style and geometry Mixing appearance and geometry by conditioning on a different identity geometry style reconstruction

  26. Conclusion Learn landmark detectors from unlabeled videos and unaligned pose annotations . Using no paired data / labelled images . Prevent appearance leakage in CycleGAN through: (a) novel bottleneck with a differentiable sketch renderer . (b) Conditioning the generator on an appearance image. Outperform state-of-the-art supervised and unsupervised landmark detectors for human pose. Method factorizes object appearance and geometry → transfer style / pose.

  27. Learning Human Pose from Unaligned Data through Image Translation www.robots.ox.ac.uk/~vgg/ research/unsupervised_pose/ Tomas Jakab Ankush Gupta Andrea Vedaldi Hakan Bilen Presented by Triantafyllos Afouras

Recommend


More recommend