Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - PowerPoint PPT Presentation

GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser University Stanford University

Shapes have different topologies ?

Shapes have different geometries Ovsjanikov et al. 2011

Shapes have hierarchical compositionality Wang et al. 2011

Motivating Question How can we capture - topological variation - geometric variation - hierarchical composition in a single, generative, fixed-dimensional representation? “Shape DNA” Encode Generate

Sequences of commands to Maya/AutoCAD Posed template [Anguelov05] Deformable template [Allen03] Parametrized procedure [Weber95] Probabilistic procedure [Talton09] Learned grammar (single exemplar) Learned grammar (multi-exemplar) Probabilistic grammar [Bokeloh10] [Talton12] [Müller06]

Structural PGM vs Volumetric DNN Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

Structural PGM vs Volumetric DNN ? Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

Structural PGM vs Volumetric DNN ? GRASS Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

GRASS: Generative neural networks over unlabeled part layouts  GRASS factorizes a shape into a hierarchical layout of simplified parts, plus fine-grained part geometries  Weakly supervised:   segments   labels   manually-specified “ground truth” hierarchies  Structure-aware: learns a generative distribution over richly informative structures

Three Challenges • Challenge 1: Ingest and generate arbitrary part layouts with a fixed-dimensional network • Convolution doesn’t work over arbitrary graphs • Challenge 2: Map a layout invertibly to a fixed-D code (“ Shape DNA ”) that implicitly captures adjacency, symmetry and hierarchy • Challenge 3: Map layout features to fine geometry

Huge variety of (attributed) graphs  Arbitrary numbers/types of vertices (parts), arbitrary numbers of connections (adjacencies/symmetries)  For linear graphs (chains) of arbitrary length, we can use a recurrent neural network (RNN/LSTM) Li et al. 2008, Wikipedia

Key Insight • Edges of a graph can be collapsed sequentially to yield a hierarchical structure • Looks like a parse tree for a sentence! • … and there are unsupervised sentence parsers

Recursive Neural Network (RvNN)  Repeatedly merge two nodes into one  Each node has an n -D feature vector, computed recursively  p = f ( W [ c 1 ; c 2 ] + b ) Socher et al. 2011

Different types of merges, varying cardinalities! Adjacency Translational Rotational Reflectional symmetry symmetry symmetry • How to encode them to the same code space? • How to decode them appropriately, given just a code?

Recursively merging parts 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Bottom-up merging 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑡 (𝑦, p) 𝑔 𝑡 (𝑦, p) Refl. sym. Refl. sym. Adjacency 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) encoder

Recursively merging parts Root code 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Bottom-up merging 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) How to determine the Symmetry 𝑔 𝑡 (𝑦, p) 𝑔 𝑡 (𝑦, p) Refl. sym. Refl. sym. encoder merge order? 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Symmetry Symmetry generator parameters

Training with reconstruction loss 𝑌′ 𝑜 -D root code 𝑌 RvNN decoder RvNN encoder 𝑀 = 𝑌 − 𝑌′ 2 • Learn weights from a variety of randomly sampled merge orders for each box structure

In testing • Encoding: Given a box structure, determine the merge order as: • The hierarchy that gives the lowest reconstruction error RvNN decoder RvNN encoder

Inferring symmetry hierarchical reconstruction loss Low reconstruction loss High reconstruction loss

In testing • Encoding: Given a box structure, determine the merge order as: • The hierarchy that gives the lowest reconstruction error • Decoding: Given an arbitrary code, how to generate the corresponding structure? ? Some code Box structure RvNN decoder

How to know what type of encoder to use? Adjacent or symmetry ? Node Classifier

Making the network generative • Variational Auto-Encoder (VAE): Learn a distribution that approximates the data distribution of true 3D structures 𝑄(𝑌) ≈ 𝑄 𝑕𝑢 (𝑌) • Marginalize over a latent “DNA” code maximize Likelihood Parameters

Variational Bayes formulation maximize maximize 𝑨 should reconstruct Assuming 𝑨 ’s follow a 𝑌 , given that it was normal distribution drawn from 𝑅(𝑨|𝑌)

Variational Autoencoder (VAE) maximize Reconstruction loss KL divergence loss 𝐿𝑀 Decoder Encoder 𝑨 𝑌 𝑌′ = 𝑔(𝑨; 𝜄) 𝑅 𝑨 𝑌 𝑄(𝑌|𝑨) 𝑀 = 𝑌 − 𝑌′ 2

Variational Autoencoder (VAE) 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder Enc Enc Enc

Sampling near 𝜈 is robust 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder (𝜈, 𝜏)

Sampling far away from 𝜈 ? 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder 𝑨 𝑞 ~𝑞(𝑨) (𝜈, 𝜏) 

Adversarial training: VAE-GAN 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝐻(𝑨) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder or Discriminator Generator Generative Adversarial Network 𝑨 𝑞 ~𝑞(𝑨) Variational Auto-Encoder Real box • Reuse of modules! structures • VAE decoder  GAN generator • VAE encoder  GAN discriminator

Benefit of adversarial training VAE

Part geometry synthesis Concatenated part code 32D 32x32x32 output part volume ? part code

Results: Shape synthesis

Results: Inferring consistent hierarchies

Results: Shape retrieval

Results: Shape retrieval Concatenated part code

Results: Shape interpolation 6-fold 5-fold 4-fold 4-fold 3-fold 4-fold 5-fold 5-fold

Results: Shape interpolation

Discussion • What does our model learn? • Hierarchical organization of part structures • A reasonable way to generate 3D structure • Part by part • Bottom-up • Hierarchical organization • This is the usual way how a human modeler Refl. sym. Refl. sym. Refl. sym. Refl. sym. creates a 3D model • Hierarchical scene graph

Discussion • A general guideline for 3D shape generation • Coarse-to-fine: • First generate coarse structure • Then generate fine details • May employ different representations and models

Acknowledgement • Anonymous reviewers • Help on data preparation • Yifei Shi, Min Liu, Chengjie Niu and Yizhi Wang • Research grants from • NSFC, NSERC, NSF • Google Focused Research Award • Gifts from the Adobe, Qualcomm and Vicarious corporations. • Jun Li is a visiting PhD student of University of Bonn, supported by the CSC

Thank you! Code & data available at www.kevinkaixu.net

Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - PowerPoint PPT Presentation

GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Statistical Shape Models Eigenpatches model regions Assume shape is fixed What if it

Shape Features WangRuchen CVBIOUC http://vision.ouc.edu.cn/~zhenghaiyong How to Convex hull

John Heartfeld J. Otto Seibold Tempest Half life Piet Mondrian The 7 elements of art 1. line

Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context

Lecture 14: Shape Google: Rigid Shape Statistics COMPSCI/MATH 290-04 Chris Tralie, Duke

Shape Contexts Newton Petersen 4/25/2008 "Shape Matching and Object Recognition Using

1 Shape- -Context: Matching Context: Matching Scale Invariance in Clutter ? Shape Scale

Computer Vision Statistical Shape Analysis Shape Shape is the geometric information that

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Regulating Menthol Tobacco Products: The Current Policy Landscape Tobacco Policy Webinar

Machine Learning and Social Participation Hello! We are Marlia Monteiro and Yasodara

Memberships Lets go back to June/July 2016... Newly ConvertKit Certified Expert

Action and Human-Robot Interaction Ron Petrick Edinburgh Centre for Robotics & Department

Sacred Business DoTerra A Team Vicky Lee Sept. 28, 2020

Language Technology Tools for supporting the Multilingual (Semantic) Web Thierry Declerck, DFKI

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Obesity and Obstetric Complications No disclosures Naomi E. Stotland, MD Associate Professor

Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - PowerPoint PPT Presentation

GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser

CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 /

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Statistical Shape Models Eigenpatches model regions Assume shape is fixed What if it

Shape Features WangRuchen CVBIOUC http://vision.ouc.edu.cn/~zhenghaiyong How to Convex hull

John Heartfeld J. Otto Seibold Tempest Half life Piet Mondrian The 7 elements of art 1. line

Shape Context Matching For Efficient OCR Sudeep Pillai May 14, 2012 Sudeep Pillai Shape Context

Lecture 14: Shape Google: Rigid Shape Statistics COMPSCI/MATH 290-04 Chris Tralie, Duke

Shape Contexts Newton Petersen 4/25/2008 &quot;Shape Matching and Object Recognition Using

1 Shape- -Context: Matching Context: Matching Scale Invariance in Clutter ? Shape Scale

Computer Vision Statistical Shape Analysis Shape Shape is the geometric information that

Hypo contact and Sasakian SU ( 2 ) -structures in 5-dimensions structures on Lie groups Sasakian

Adversarially Regularized Autoencoders Junbo (Jake) Zhao, Yoon Kim, Kelly Zhang, Alexander M.

Understanding Geometric Attributes with Autoencoders Alasdair Newson Tlcom ParisTech

Educating Text Autoencoders: Latent Representation Guidance via Denoising Tianxiao Shen Jonas

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

Hierarchical Importance Weighted Autoencoders Chin-Wei Huang Kris Sankaran Eeshan Dhekane

Regulating Menthol Tobacco Products: The Current Policy Landscape Tobacco Policy Webinar

Machine Learning and Social Participation Hello! We are Marlia Monteiro and Yasodara

Memberships Lets go back to June/July 2016... Newly ConvertKit Certified Expert

Action and Human-Robot Interaction Ron Petrick Edinburgh Centre for Robotics &amp; Department

Sacred Business DoTerra A Team Vicky Lee Sept. 28, 2020

Language Technology Tools for supporting the Multilingual (Semantic) Web Thierry Declerck, DFKI

GPP 501 Microeconomic Analysis for Public Policy Fall 2017 Given by Kevin Milligan Vancouver

Obesity and Obstetric Complications No disclosures Naomi E. Stotland, MD Associate Professor

Shape Contexts Newton Petersen 4/25/2008 "Shape Matching and Object Recognition Using

Action and Human-Robot Interaction Ron Petrick Edinburgh Centre for Robotics & Department