autoencoders for shape structures
play

Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha - PowerPoint PPT Presentation

GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser


  1. GRASS: Generative Recursive Autoencoders for Shape Structures Jun Li Kai Xu Siddhartha Chaudhuri NUDT NUDT, Shenzen University, IIT Bombay Shandong University Ersin Yumer Hao (Richard) Zhang Leonidas Guibas Adobe Research Simon Fraser University Stanford University

  2. Shapes have different topologies ?

  3. Shapes have different geometries Ovsjanikov et al. 2011

  4. Shapes have hierarchical compositionality Wang et al. 2011

  5. Motivating Question How can we capture - topological variation - geometric variation - hierarchical composition in a single, generative, fixed-dimensional representation? “Shape DNA” Encode Generate

  6. Sequences of commands to Maya/AutoCAD Posed template [Anguelov05] Deformable template [Allen03] Parametrized procedure [Weber95] Probabilistic procedure [Talton09] Learned grammar (single exemplar) Learned grammar (multi-exemplar) Probabilistic grammar [Bokeloh10] [Talton12] [Müller06]

  7. Structural PGM vs Volumetric DNN Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

  8. Structural PGM vs Volumetric DNN ? Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

  9. Structural PGM vs Volumetric DNN ? GRASS Strongly supervised [Kalogerakis et al. ’12] Unsupervised [Wu et al. ’15] Pros: direct model of compositional structure, (relatively) Pros: arbitrary geometry/topology, unsupervised low-dimensional, high quality output Cons: low-resolution, no explicit separation of structure vs fine Cons: limited topological variation, no continuous geometric geometry, no guarantee of symmetry/adjacency, no hierarchy, variation (for generation), no hierarchy, huge effort to lots of parameters, lots of training data segment & label training data

  10. GRASS: Generative neural networks over unlabeled part layouts  GRASS factorizes a shape into a hierarchical layout of simplified parts, plus fine-grained part geometries  Weakly supervised:   segments   labels   manually-specified “ground truth” hierarchies  Structure-aware: learns a generative distribution over richly informative structures

  11. Three Challenges • Challenge 1: Ingest and generate arbitrary part layouts with a fixed-dimensional network • Convolution doesn’t work over arbitrary graphs • Challenge 2: Map a layout invertibly to a fixed-D code (“ Shape DNA ”) that implicitly captures adjacency, symmetry and hierarchy • Challenge 3: Map layout features to fine geometry

  12. Huge variety of (attributed) graphs  Arbitrary numbers/types of vertices (parts), arbitrary numbers of connections (adjacencies/symmetries)  For linear graphs (chains) of arbitrary length, we can use a recurrent neural network (RNN/LSTM) Li et al. 2008, Wikipedia

  13. Key Insight • Edges of a graph can be collapsed sequentially to yield a hierarchical structure • Looks like a parse tree for a sentence! • … and there are unsupervised sentence parsers

  14. Recursive Neural Network (RvNN)  Repeatedly merge two nodes into one  Each node has an n -D feature vector, computed recursively  p = f ( W [ c 1 ; c 2 ] + b ) Socher et al. 2011

  15. Different types of merges, varying cardinalities! Adjacency Translational Rotational Reflectional symmetry symmetry symmetry • How to encode them to the same code space? • How to decode them appropriately, given just a code?

  16. Recursively merging parts 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Bottom-up merging 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑡 (𝑦, p) 𝑔 𝑡 (𝑦, p) Refl. sym. Refl. sym. Adjacency 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) encoder

  17. Recursively merging parts Root code 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Bottom-up merging 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) How to determine the Symmetry 𝑔 𝑡 (𝑦, p) 𝑔 𝑡 (𝑦, p) Refl. sym. Refl. sym. encoder merge order? 𝑔 𝑏 (𝑦 1 , 𝑦 2 ) Symmetry Symmetry generator parameters

  18. Training with reconstruction loss 𝑌′ 𝑜 -D root code 𝑌 RvNN decoder RvNN encoder 𝑀 = 𝑌 − 𝑌′ 2 • Learn weights from a variety of randomly sampled merge orders for each box structure

  19. In testing • Encoding: Given a box structure, determine the merge order as: • The hierarchy that gives the lowest reconstruction error RvNN decoder RvNN encoder

  20. Inferring symmetry hierarchical reconstruction loss Low reconstruction loss High reconstruction loss

  21. In testing • Encoding: Given a box structure, determine the merge order as: • The hierarchy that gives the lowest reconstruction error • Decoding: Given an arbitrary code, how to generate the corresponding structure? ? Some code Box structure RvNN decoder

  22. How to know what type of encoder to use? Adjacent or symmetry ? Node Classifier

  23. Making the network generative • Variational Auto-Encoder (VAE): Learn a distribution that approximates the data distribution of true 3D structures 𝑄(𝑌) ≈ 𝑄 𝑕𝑢 (𝑌) • Marginalize over a latent “DNA” code maximize Likelihood Parameters

  24. Variational Bayes formulation maximize maximize 𝑨 should reconstruct Assuming 𝑨 ’s follow a 𝑌 , given that it was normal distribution drawn from 𝑅(𝑨|𝑌)

  25. Variational Autoencoder (VAE) maximize Reconstruction loss KL divergence loss 𝐿𝑀 Decoder Encoder 𝑨 𝑌 𝑌′ = 𝑔(𝑨; 𝜄) 𝑅 𝑨 𝑌 𝑄(𝑌|𝑨) 𝑀 = 𝑌 − 𝑌′ 2

  26. Variational Autoencoder (VAE) 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder Enc Enc Enc

  27. Sampling near 𝜈 is robust 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder (𝜈, 𝜏)

  28. Sampling far away from 𝜈 ? 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder 𝑨 𝑞 ~𝑞(𝑨) (𝜈, 𝜏) 

  29. Adversarial training: VAE-GAN 𝑨 𝑡 ~𝑂(𝜈, 𝜏) 𝑔 𝜈 𝜈 𝐹𝑜𝑑(𝑦) 𝐻(𝑨) 𝑔 𝑚 𝜏 𝑔 𝜏 Encoder Decoder or Discriminator Generator Generative Adversarial Network 𝑨 𝑞 ~𝑞(𝑨) Variational Auto-Encoder Real box • Reuse of modules! structures • VAE decoder  GAN generator • VAE encoder  GAN discriminator

  30. Benefit of adversarial training VAE

  31. Part geometry synthesis Concatenated part code 32D 32x32x32 output part volume ? part code

  32. Results: Shape synthesis

  33. Results: Inferring consistent hierarchies

  34. Results: Shape retrieval

  35. Results: Shape retrieval Concatenated part code

  36. Results: Shape interpolation 6-fold 5-fold 4-fold 4-fold 3-fold 4-fold 5-fold 5-fold

  37. Results: Shape interpolation

  38. Discussion • What does our model learn? • Hierarchical organization of part structures • A reasonable way to generate 3D structure • Part by part • Bottom-up • Hierarchical organization • This is the usual way how a human modeler Refl. sym. Refl. sym. Refl. sym. Refl. sym. creates a 3D model • Hierarchical scene graph

  39. Discussion • A general guideline for 3D shape generation • Coarse-to-fine: • First generate coarse structure • Then generate fine details • May employ different representations and models

  40. Acknowledgement • Anonymous reviewers • Help on data preparation • Yifei Shi, Min Liu, Chengjie Niu and Yizhi Wang • Research grants from • NSFC, NSERC, NSF • Google Focused Research Award • Gifts from the Adobe, Qualcomm and Vicarious corporations. • Jun Li is a visiting PhD student of University of Bonn, supported by the CSC

  41. Thank you! Code & data available at www.kevinkaixu.net

Recommend


More recommend