Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and - PowerPoint PPT Presentation

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Amir A. Soltani Haibin Huang Jiajun Wu Tejas Kulkarni Josh Tenenbaum Samples Out-of-Sample Generalization 07/21/2017

Motivation - Autonomous Vehicles

Motivation - Robotics

Motivation ● Computer Vision cannot simply rely on 2D data to solve 3D problems ● We need to have good 3D representations to solve inverse problems ● A generative model for 3D is a good starting point (A lot more needed though) ● Good progress has been made in the past 2 or 3 years ● Still, the choice of 3D representation is being debated ● Each representation has advantages and disadvantages ● So far there is not a good agreement on which representation to use

Choice of Representation Voxels Multi-view Meshes Point clouds Template-based

3D Representation - Voxels Computational complexity is very high (O 3 ) if used naively ● ● Cannot Model High-Res Shapes ● Details can easily get lost ● Highly sparse at higher resolutions ● Cannot model regular structures easily

3D Representation - Voxels ● Directly predicting high-res voxel-based outputs is very hard ● Highest so far is 64 x 64 x 64 ● One model per object category Wu et al, NIPS 2016

3D Representation - Point Clouds ● Things start to get mathematically-involved from here ● The choice of loss function, non-differentiability issues etc ● Not obvious how many points to have ● Details Will Be Missing ● Not a lot of work done using point clouds so far Image courtesy: Hao Su

3D Representation - Point Clouds Su et al, CVPR 2017

3D Representation - Meshes ● Cannot directly apply out-of-the-box models on ● Need to Construct Special Kind of Kernels for CNNs ● Mathematically Involved ● Can be seen as a graph as well Image courtesy: Hao Su

3D Representation - Template-Based (CAD) ● Again, Not Able to Easily Apply Out-of-Box Models on ● Data Is Very Hard to Obtain ● Hard to Model Shapes Never Seen Before ● Offers Compositionality Intrinsically and Explicitly ● Might be a Good Option for Learning Functionalities Image courtesy: Haibin Huang

3D Representation - Multi-View ● Multi-view representation is very lightweight ● Offers Flexibility (Depth Maps) and Eases the Computation Significantly ● Although 2D, Still Explicitly Models 3D Shapes ● Allows Generating Hi-Res, Detailed, Novel Objects ● Without the machinery required for new voxel-based models ● Can easily apply out-of-the-box CNN models on ● Not Mathematically Involved ● More Intuitive

Motivations ● Synthesize/Generate Hi-Res, Detailed and Novel Shapes ● Use Some Sort of a Representation Whose Data is Easily Obtainable ● No Doubt that it is Very Easy to Obtain 2D images or RGBD or just D ● Have Out-of-Sample Generalizability ● A Step Forward Towards Obtaining 3D Concepts Efficiently to Solve Inverse Vision Problems ● Model 3D via 2D (inspired by biological vision) ● Share the Same Representations For All Categories

Pipeline - Data Set ● Used ShapeNet Core ● Contains Aligned, Normalized Shapes ● ~37k for train, ~3k for test ● Normalized and Aligned ● Render 20 views of depth maps ● Camera Positions Fixed

Pipeline - Architectures ● Train 3 Different VAE Models ● AllVPNet: Train with All 20 Views ● DropoutNet: Train with 2-5 Randomly Chosen Views ● SingleVPNet: Train with 1 Randomly Chosen View ● Z Layer Has 100 Nodes for Unconditional and 40 for Conditional ● L1 Loss Function is Used During Training

Pipeline - Architectures L1 L1

Pipeline - 3D Reconstruction ● Deterministic Function is Used to Generate the Final 3D Point Cloud ● Number of Points is Between ~30k to ~400k depending on Shape Complexity ● Not fixed!

Results - Sampling Random Sampling

Results - Sampling Random Samples’ Nearest Neighbors Training set Reconstruction Random Sample

Results - Sampling Random Samples

Results - Sampling More Random Samples

Results Samples Conditional Sampling

Results - Sampling Conditional Sampling

Results - Sampling Conditional Samples

Results - Conditional Sampling Nearest Neighbors Training set Reconstruction Cond. Sample Training set Reconstruction Cond. Sample

Results - Conditional Sampling Nearest Neighbors Training set Reconstruction Cond. Sample

Results - Reconstruction

Results - Classification Classification, Reconstruction Error

Results - Reconstruction Out-of-Sample Generalization ● Put Silhouettes/Depth Maps into 224 x 224 canvases ● Images Scaled to Fit ● Camera Pose Not Fixed ● Different Size and Orientation ● NYUD and Silhouettes from the Internet ● The Rest of The Results Are All Obtained Through SingleVPNet Model

Results - Reconstruction Out-of-Sample Generalization (NYUD)

Results - Reconstruction Out-of-Sample Generalization (Uncond. SinlgeVPNet - NYUD Silhouettes)

Results - Reconstruction Out-of-Sample Generalization (Silhouettes From Web)

Results - Representation Analysis Consistent Representation ● Naturally Would Like to Get The Same Shape Across All Views ● Intuitively-Thinking, Uncertainty is Actually Part of Consistency ● Obtaining Good Priors Is Important!

Results - Analysis Consistent Representation

Results - Analysis Priors Matter!

Results - Analysis What 3D shape is this?

Results - Analysis

Results - Analysis ● Model’s Prediction: “airplane” ● Quite meaningful and intuitive ● Obtaining good, inductive biases is hard but helps a lot! ● Behaves like a hierarchical prior

Results - Analysis Implicitly Learning About Parts

Concolusion ● We showed an effective paradigm for learning 3D shapes using multiview representation ● Samples obtained look realistic, novel and detailed ● Out-of-sample generalization is attainable via good generative models + meaningful priors ● Hierarchical priors can effectively induce enough bias to generate meaningful results ● Strong inductive biases helps get meaningful 3D shapes on highly occluded inputs ● Parts can be learned implicitly. Hard to explicitly learn parts for real-word tasks

Future Directions and Challenges ● Current data sets are not sufficient to learn about 3D vision ● 3D shapes are the end product of an underlying process: physics ● Current data-driven approaches do get us to where we want to be ● 3D shapes are composed of things like material, mass, etc ● To meaningfully interact with 3D shapes we need to do more! ● Learning fast, and accurate physics simulators might be a good starting point

Future Directions and Challenges Thank you!

Results - Conditional Sampling Conditional Samples

Results - Classification, Recon. Err. ● The Goal Is Not to do Classification or Recon. But to Have Hierarchical Priors ● Strong Regularization

Results - IoU IoU numbers for ShapeNet Core

Results - Conditional Sampling More Conditional Samples

Results Conditional Sampling More Conditional Samples

Results - Analysis What about this?

Results - Analysis

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and - PowerPoint PPT Presentation

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Amir A. Soltani Haibin Huang Jiajun Wu Tejas Kulkarni Josh Tenenbaum Samples Out-of-Sample Generalization 07/21/2017 Motivation -

3/13/2012 Shapes, Inc. Modeling the Shapes, Inc. Business We have been hired to model the

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Shapes, Inc. We have been hired to model the business objects of Shapes, Inc. Following are their

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Where do we use maths in the classroom? Time What time is shown on the clock? 2D Shapes What

AR Idea Ardavan Mirhosseini Kirsti Langen Shapes Shapes to Scan Shapes for Game Gecko Shark

AAKASH NIHALANI PROJECT 1 2D shapes refer to shapes with length and width. This shape is flat and

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

3D Modeling with Depth Sensors Andreas Geiger, Torsten Sattler Spring 2017

Event Shapes in t t and QCD Events @ LHC Using transverse, 3D Event Shapes in Multivariate

SHAPES FORESIGHT EXERCISES Awareness Week Monday Foresight in SHAPES Fraunhofer INT

Automatische Rotation von DKIM- Schlsseln Berlin | 10.05.2014 | Stefan Neben System Engineer,

, r J Muine Bheag Business Park, Royal Oak Road, Bagenalstown, Co . Ca rl ow, Ireland. CARLOW

I. Encoding and decoding planar curves Jeff Erickson University of Illinois, Urbana-Champaign

THE ORDER OF THE QCD PHASE TRANSITION WITH TWO LIGHT FLAVORS M. DElia Genoa University &

Architektur paralleler Plattformen Freie Universitt Berlin Fachbereich Informatik

Resea Research porta tals How software supports accompanying research Whats the context?

Dr. Wolfgang Both (wolfgang.both@senwtf.berlin.de) Berlin Open Data In einem umfassenden

Information- -seeking behavior in complementary and seeking behavior in complementary and

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and - PowerPoint PPT Presentation

Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Amir A. Soltani Haibin Huang Jiajun Wu Tejas Kulkarni Josh Tenenbaum Samples Out-of-Sample Generalization 07/21/2017 Motivation -

3/13/2012 Shapes, Inc. Modeling the Shapes, Inc. Business We have been hired to model the

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Shapes, Inc. We have been hired to model the business objects of Shapes, Inc. Following are their

Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

for each dst in my.out_edges if dst.depth &gt; my.depth+1 then dst.depth = my.depth+1

SYNTHESIZING 3D SOUND SYNTHESIZING 3D SOUND AND AND SOUND LOCALIZATION SOUND LOCALIZATION

Where do we use maths in the classroom? Time What time is shown on the clock? 2D Shapes What

AR Idea Ardavan Mirhosseini Kirsti Langen Shapes Shapes to Scan Shapes for Game Gecko Shark

AAKASH NIHALANI PROJECT 1 2D shapes refer to shapes with length and width. This shape is flat and

An enumerative relationship between maps and 4-regular maps Michael La Croix April 9, 2008 An

Evolution of valley depth and width Evolution of valley depth and width Evolution of valley depth

Cumbernauld Academy Existing aerial view from west Site Plan Aerial view from South Aerial view

3D Modeling with Depth Sensors Andreas Geiger, Torsten Sattler Spring 2017

Event Shapes in t t and QCD Events @ LHC Using transverse, 3D Event Shapes in Multivariate

SHAPES FORESIGHT EXERCISES Awareness Week Monday Foresight in SHAPES Fraunhofer INT

Automatische Rotation von DKIM- Schlsseln Berlin | 10.05.2014 | Stefan Neben System Engineer,

, r J Muine Bheag Business Park, Royal Oak Road, Bagenalstown, Co . Ca rl ow, Ireland. CARLOW

I. Encoding and decoding planar curves Jeff Erickson University of Illinois, Urbana-Champaign

THE ORDER OF THE QCD PHASE TRANSITION WITH TWO LIGHT FLAVORS M. DElia Genoa University &amp;

Architektur paralleler Plattformen Freie Universitt Berlin Fachbereich Informatik

Resea Research porta tals How software supports accompanying research Whats the context?

Dr. Wolfgang Both (wolfgang.both@senwtf.berlin.de) Berlin Open Data In einem umfassenden

Information- -seeking behavior in complementary and seeking behavior in complementary and

for each dst in my.out_edges if dst.depth > my.depth+1 then dst.depth = my.depth+1

THE ORDER OF THE QCD PHASE TRANSITION WITH TWO LIGHT FLAVORS M. DElia Genoa University &