unsupervised discovery of object landmarks as structural
play

Unsupervised Discovery of Object Landmarks as Structural - PowerPoint PPT Presentation

Unsupervised Discovery of Object Landmarks as Structural Representations Yuting Zhang 1 , Yijie Guo 1 , Yixin Jin 1 , Yijun Luo 1 , Zhiyuan He 1 , Honglak Lee 1,2 1 University of Michigan, Ann Arbor 2 Google Brain Structural representations of


  1. Unsupervised Discovery of Object Landmarks as Structural Representations Yuting Zhang 1 , Yijie Guo 1 , Yixin Jin 1 , Yijun Luo 1 , Zhiyuan He 1 , Honglak Lee 1,2 1 University of Michigan, Ann Arbor 2 Google Brain

  2. Structural representations of images • Computer vision seeks to understand visual structures. • Poses, contours, 3D shapes, … • Physically conceptualized, perceptible by humans • Deep neural networks can learn latent representations. • Desired properties: distributed, sparse, transferable, … • Not as conceptualized and interpretable as explicit structures • Extra supervision is needed to bridge the gap between latent representations and explicit structures • costly to obtain and often unavailable Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  3. Structural representations of images • Computer vision seeks to understand visual structures. • Poses, contours, 3D shapes, … • Physically conceptualized, perceptible by humans • Deep neural networks can learn latent representations. • Desired properties: distributed, sparse, transferable, … • Not as conceptualized and interpretable as explicit structures • Typically, extra supervision is needed to bridge the gap between latent representations and explicit structures • costly to obtain and often unavailable Can we train a deep neural network to get image representations of explicit structures without supervision ? Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  4. The explicit structure Can we train a deep neural network to get image representations of explicit structures without supervision ? • We consider a specific type of explicit structures: Object landmarks • Compact representation of object shapes • Generally applicable to many object categories Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  5. Our framework Image representation Unsupervised landmark discovery Task Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  6. Our framework Image representation Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  7. Our framework Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  8. Technical outline Unsupervised • Unsupervised object Image landmark reconstruction discovery landmark discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  9. Technical outline Unsupervised • Unsupervised object Image landmark reconstruction discovery landmark discovery • A fully differentiable neural Latent features network architecture Training signal • The image reconstruction can encourage the learning of informative landmarks and features. Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  10. Technical outline Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  11. Overview of our neural network architecture Landmark coordinates Input Reconstructed image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  12. Overview of our neural network architecture Landmark Landmark coordinates coordinates Unsupervised landmark discovery • A differentiable formulation • Unsupervised constraints to define a valid landmark detector Input Input Reconstructed image image image Related work: James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV , 2017. Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  13. Landmark detector: Architecture Channel-wise softmax Input Landmark Encoder-decoder Foreground Background image coordinates with skip-links Heatmap to coordinate Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  14. Landmark detector: Architecture Channel-wise softmax Input Landmark Encoder-decoder Foreground Background image coordinates with skip-links Heatmap to coordinate Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  15. From heatmaps to coordinates Ours: A foreground Isotropic Gaussian heatmap approximation ✓  σ �◆ 0 N ( x, y ) , 0 σ Landmark coordinate • Averaged coordinate weighted by the heatmap • ( x , y ) is differentiable with respect to the heatmap Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  16. Landmark discovery ( x 1 , y 1 ) Can be arbitrary ( x 2 , y 2 ) without physical … meanings ( x K , y K ) • The neural network can be used to output landmark coordinates. • However, without additional training objectives, the landmark coordinates can be arbitrary latent features . 3 desirable properties for a landmark detector Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  17. Property 1: Concentration of heatmap values Original Gaussian heatmap heatmap For a detector, the output heatmap should Earlier concentrate in a local region. stage • Encourage the Gaussian variance to be small. Later stage Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  18. Property 2: Separation of landmarks • Different landmarks should cover different visual semantics. • Penalize if the pairwise distances among landmarks are too small. 1 ,...,K ! �k ( x k 0 , y k 0 ) � ( x k , y k ) k 2 X 2 L sep = exp 2 σ 2 sep k 6 = k 0 Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  19. Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  20. Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  21. Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  22. Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  23. Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g K k ) � ( x k , y k ) k 2 X k g ( x 0 k , y 0 L eqv = 2 k =1 • Equivariance for landmark discovery has been explored by Thewlis et al, 2017. • Ours are directly formulated on the landmark coordinate. (Thewlis et al, 2017) James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV , 2017. Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  24. Property 3: Equivariance – the transformation • Random thin-plate-spline (TPS) to synthesize the transformation g • Global affine: Translation, Scaling, Rotation • Local TPS: • For videos, also use the optical flows as the transformation g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  25. Overview of our neural network architecture Landmark Landmark coordinates coordinates Unsupervised landmark discovery Input Input Reconstructed image image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  26. Overview of our neural network architecture Landmark coordinates Input Reconstructed image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  27. Overview of our neural network architecture Landmark coordinates Landmark-based extraction of latent features • Weighted average-pooling with differentiable pooling masks Input Input Reconstructed image image image Latent features Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  28. Overview of our neural network architecture Landmark-based extraction of latent features Input image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  29. Landmark-based feature extraction Gaussian heatmap H # channels W Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

  30. Landmark-based feature extraction Weighted global average pooling H # channels # channels W Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Recommend


More recommend