supervision by registration an
play

Supervision-by-Registration: An Unsupervised Approach to Improve the - PowerPoint PPT Presentation

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors Xuanyi Dong 1 , Shoou-I Yu 2 , Xinshuo Weng 2 , Shih-En Wei 2 , Yi Yang 1 , Yaser Sheikh 2 1 Cai University of Technology Sydney, 2 Oculus


  1. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors Xuanyi Dong 1 , Shoou-I Yu 2 , Xinshuo Weng 2 , Shih-En Wei 2 , Yi Yang 1 , Yaser Sheikh 2 1 Cai University of Technology Sydney, 2 Oculus Research, Facebook CVPR 2018, Salt Lake City

  2. Facial Landmark Detection

  3. A Challenging Problem Temporal Poses Identity Consistency (expressions/viewpoints) Sagonas et al. 300 Faces in-the-Wild Challenge: The first facial landmark localization Challenge. ICCV , 2013.

  4. Landmark Detection Methods Image-based Detection Video-based Detection ● DeepReg [Shi et al, NNLS’ 14] ● Convolutional Pose Machine [Wei et al, CVPR’ 16] ● Hourglass Network [Newell et al, ECCV’ 16] ● …. ● Pros ○ Accurate across poses/identity ● Cons ○ Lack of temporal consistency (jittering)

  5. Landmark Detection Methods Image-based Detection Video-based Detection ● DeepReg [Shi et al, NNLS’ 14] ● Recurrent Encoder-Decoder Network ● [Peng et al, ECCV’ 16] Convolutional Pose Machine [Wei et al, CVPR’ 16] ● Two-Streams Transformer [Liu et al, ● Hourglass Network [Newell et al, ECCV’ TPAMI’ 17] ● 16] Supervision-by-Registration [Ours] ● …. ● …. ● ● Pros Pros ○ ○ Accurate across poses/identity Temporal-consistent ● ● Cons Cons ○ ○ Lack of temporal consistency Require per-frame annotations, (jittering) difficult to scale up

  6. What is Supervision-by-Registration?

  7. Lucas-Kanade Tracking Operation: Differentiable

  8. Registration Loss: Forward-Backward Scheme Noh et al. Learning Deconvolution Network for Semantic Segmentation? ICCV , 2015.

  9. Soft-Argmax Differentiable Operation Sample Heatmap Output

  10. Implementation ● Used VGG16 as the backbone architecture ● Used CPM as the base facial landmark detector (can be replaced by others. E.g., stacked hourglass network) ● Operate LK tracking on images/conv1 features

  11. Results: on Image Datasets

  12. Results: on Video Datasets ● AUC@0.08 error for each individual video of 300-VW category C. The numbers are percentages.

  13. Demo

  14. Take Home Messages ● Registration can be a free supervision signal to enforce temporal consistency ● More generally, self-supervision is powerful!

Recommend


More recommend