Unsupervised Discovery of Object Landmarks as Structural - PowerPoint PPT Presentation

Unsupervised Discovery of Object Landmarks as Structural Representations Yuting Zhang 1 , Yijie Guo 1 , Yixin Jin 1 , Yijun Luo 1 , Zhiyuan He 1 , Honglak Lee 1,2 1 University of Michigan, Ann Arbor 2 Google Brain

Structural representations of images • Computer vision seeks to understand visual structures. • Poses, contours, 3D shapes, … • Physically conceptualized, perceptible by humans • Deep neural networks can learn latent representations. • Desired properties: distributed, sparse, transferable, … • Not as conceptualized and interpretable as explicit structures • Extra supervision is needed to bridge the gap between latent representations and explicit structures • costly to obtain and often unavailable Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Structural representations of images • Computer vision seeks to understand visual structures. • Poses, contours, 3D shapes, … • Physically conceptualized, perceptible by humans • Deep neural networks can learn latent representations. • Desired properties: distributed, sparse, transferable, … • Not as conceptualized and interpretable as explicit structures • Typically, extra supervision is needed to bridge the gap between latent representations and explicit structures • costly to obtain and often unavailable Can we train a deep neural network to get image representations of explicit structures without supervision ? Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

The explicit structure Can we train a deep neural network to get image representations of explicit structures without supervision ? • We consider a specific type of explicit structures: Object landmarks • Compact representation of object shapes • Generally applicable to many object categories Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Our framework Image representation Unsupervised landmark discovery Task Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Our framework Image representation Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Our framework Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Technical outline Unsupervised • Unsupervised object Image landmark reconstruction discovery landmark discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Technical outline Unsupervised • Unsupervised object Image landmark reconstruction discovery landmark discovery • A fully differentiable neural Latent features network architecture Training signal • The image reconstruction can encourage the learning of informative landmarks and features. Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Technical outline Unsupervised Image landmark reconstruction discovery Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark coordinates Input Reconstructed image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark Landmark coordinates coordinates Unsupervised landmark discovery • A differentiable formulation • Unsupervised constraints to define a valid landmark detector Input Input Reconstructed image image image Related work: James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV , 2017. Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Landmark detector: Architecture Channel-wise softmax Input Landmark Encoder-decoder Foreground Background image coordinates with skip-links Heatmap to coordinate Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

From heatmaps to coordinates Ours: A foreground Isotropic Gaussian heatmap approximation ✓  σ �◆ 0 N ( x, y ) , 0 σ Landmark coordinate • Averaged coordinate weighted by the heatmap • ( x , y ) is differentiable with respect to the heatmap Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Landmark discovery ( x 1 , y 1 ) Can be arbitrary ( x 2 , y 2 ) without physical … meanings ( x K , y K ) • The neural network can be used to output landmark coordinates. • However, without additional training objectives, the landmark coordinates can be arbitrary latent features . 3 desirable properties for a landmark detector Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Property 1: Concentration of heatmap values Original Gaussian heatmap heatmap For a detector, the output heatmap should Earlier concentrate in a local region. stage • Encourage the Gaussian variance to be small. Later stage Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Property 2: Separation of landmarks • Different landmarks should cover different visual semantics. • Penalize if the pairwise distances among landmarks are too small. 1 ,...,K ! �k ( x k 0 , y k 0 ) � ( x k , y k ) k 2 X 2 L sep = exp 2 σ 2 sep k 6 = k 0 Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Property 3: Equivariance • For a transformation g that does not change local visual semantics. • The landmarks on the two images should satisfy the same transformation g . g K k ) � ( x k , y k ) k 2 X k g ( x 0 k , y 0 L eqv = 2 k =1 • Equivariance for landmark discovery has been explored by Thewlis et al, 2017. • Ours are directly formulated on the landmark coordinate. (Thewlis et al, 2017) James Thewlis, Hakan Bilen, and Andrea Vedaldi, “Unsupervised learning of object landmarks by factorized spatial embeddings,” In ICCV , 2017. Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Property 3: Equivariance – the transformation • Random thin-plate-spline (TPS) to synthesize the transformation g • Global affine: Translation, Scaling, Rotation • Local TPS: • For videos, also use the optical flows as the transformation g Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark Landmark coordinates coordinates Unsupervised landmark discovery Input Input Reconstructed image image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark coordinates Input Reconstructed image image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark coordinates Landmark-based extraction of latent features • Weighted average-pooling with differentiable pooling masks Input Input Reconstructed image image image Latent features Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Overview of our neural network architecture Landmark-based extraction of latent features Input image Latent features Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Landmark-based feature extraction Gaussian heatmap H # channels W Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Landmark-based feature extraction Weighted global average pooling H # channels # channels W Our paper: Unsupervised Discovery of Object Landmarks as Structural Representations

Unsupervised Discovery of Object Landmarks as Structural - PowerPoint PPT Presentation

Unsupervised Discovery of Object Landmarks as Structural Representations Yuting Zhang 1 , Yijie Guo 1 , Yixin Jin 1 , Yijun Luo 1 , Zhiyuan He 1 , Honglak Lee 1,2 1 University of Michigan, Ann Arbor 2 Google Brain Structural representations of

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Object Oriented Object 3 Programming Object 1 Object 2 Object 4 For : COP 3330. Object

10/23/2013 What is the Landmarks Preservation Commission? Preservation 101: The Landmarks

MoPOP NY P NYC LANDMARKS PRESERVATION COMMISSION BOARD HEARING 07.24.18 LANDMARKS

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

246 WEST 11TH STREET Landmarks Public Meeting DECEMBER 3, 2019 FRONT FACADE PHOTOS 1980S TAX

Landmarks Revisited Silvia Richter 1 Malte Helmert 2 Matthias Westphal 2 1 Griffith University

Object-Oriented Databases Object Oriented Databases ODMG Standard Object Model, Object

Object oriented Object oriented Object oriented Object oriented approach and UML approach and

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Structural Matrices in MDOF Systems Structural Matrices Evaluation of Structural Giacomo Boffi

19 Auto Lecture encoders : Ankur Bambhanoliya Scribes : Donald Hamnett Motivation

A Tutorial on Deep Probabilistic Generative Models Ryan P. Adams Princeton University Machine

Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

More Than a Query Language: SQL in the 21 st Century @MarkusWinand @ModernSQL

Web-based Attacks on Local IoT Devices Gunes Acar Danny Huang Frank Li Arvind Narayanan

ADVANCED DATABASE SYSTEMS Server-side Logic Execution @ Andy_Pavlo // 15- 721 // Spring 2019

A B A Bett etter Mod er Model f el for or Pen T en Test esting ing Mike Saunders