Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations Vincent Sitzmann Michael Zollhöfer Gordon Wetzstein
single image camera pose Novel Views Surface Normals intrinsics
Self-supervised Scene Representation Learning { } , ,… Lat Latent ent 3D 3D Scenes cenes { } , ,… Obse serva vations + + Image + Pose & Intrinsics What can we learn about latent 3D scenes from observations? Vision: Learn rich representations just by watching video!
Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations , ,… , ,… Model Image Loss
Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Represe sentation , ,… , ,… Persistent feature representation of scene. Image Loss
Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er Represe sentation , ,… , ,… Persistent feature Render from different representation of camera perspectives. scene. Image Loss
2D baseline: Autoencoder Re Re-Re Rende dered d Obse serva vations Obse serva vations Conv Conv , ,… , ,… Latent Code Encoder Decoder + Output Pose Image Loss
2D baseline: Autoencoder Re Re-Re Rende dered d Obse serva vations Obse serva vations Conv , ,… , ,… Latent Code Decoder Output Pose Image Loss
Doesn’t capture 3D properties of scenes. Trained on ~2500 shapenet cars with 50 observations each. Need 3D inductive bias!
Related Work 3D inductive ve bias s / Self Se lf-su supervi vise sed 3D st structure with pose sed images Scene Represe sentation Learning Tatarchenko et al., 2015 Worrall et al., 2017 Eslami et al., 2018 … 2D Generative ve Models Goodfellow et al., 2014 Kingma et al., 2013 Kingma et al., 2018 … 3D Computer Visi sion Choy et al., 2016 Huang et al., 2018 Park et al., 2018 … Voxe xel-base sed Represe sentations Memory inefficient: ! " # . • Sitzmann et al., 2019 • Doesn’t parameterize scene surfaces smoothly. Lombardi et al., 2019 • Generalization is hard. Phuoc et al., 2019 …
Scene Representation Networks Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er , ,… , ,… Represe sentation Image Loss
Scene Representation Networks Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er , ,… , ,… Represe sentation Image Loss
Free Space ! " Objects ! #
Model scene as function Φ that maps coordinates to features. [] Free Space $ % " ∈ … [] Objects Φ: ℝ ) → ℝ + " ∈ … [] … Free " ∈ Space … $ &
Scene Representation Network parameterizes Φ as MLP. [] Free Space ) * Sc Scene " ∈ Represe sentation … Net etwor ork [] Φ: ℝ & → ℝ ( Objects " ∈ … [] … Free " ∈ Space … ) +
Scene Representation Network parameterizes Φ as MLP. Sc Scene Can sample anywhere, Represe sentation Net etwor ork at arbitrary resolutions. Φ: ℝ $ → ℝ & Parameterizes scene surfaces smoothly. Memory scales with scene complexity.
Scene Representation Networks Neur eural al Scene cene Represe sentation Re Re-Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & Neur eural al Rend ender erer er , ,… , ,… Image Loss
Scene Representation Networks Neur eural al Scene cene Represe sentation Re Re-Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & Neur eural al Rend ender erer er , ,… , ,… Image Loss
Neural Renderer. Free Space ! " ! #
Neural Renderer.
Neural Renderer.
Neural Renderer Step 1: Intersection Testing. Idea: march along ray until arrived at surface. ? ? ? ? ?
Neural Renderer Step 1: Intersection Testing. feature $ # vector Scene Represe sentation Φ: ℝ ( → ℝ * ! " ! # world coordinates
Neural Renderer Step 1: Intersection Testing. Ray Marching LSTM feature # " * "+, vector Step length Scene Represe sentation Φ: ℝ ' → ℝ ) Feasible step length: Distance to closest scene surface ! - ! "+, ! " world coordinates
Neural Renderer Step 1: Intersection Testing. Iteration 0
Neural Renderer Step 1: Intersection Testing. Iteration 1
Neural Renderer Step 1: Intersection Testing. Iteration 2
Neural Renderer Step 1: Intersection Testing. Iteration 3
Neural Renderer Step 2: Color Generation Iteration 4
Neural Renderer Step 1: Intersection Testing. Iteration …
Neural Renderer Step 1: Intersection Testing.
Neural Renderer Step 2: Color Generation Scene Represe sentation Φ: ℝ $ → ℝ & Color MLP
Can now train end-to-end with posed images only! Neur eural al Scene cene Neur eural al Rend ender erer er Represe sentation Re-Re Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & , ,… , ,… Image Loss
Generalizing across a class of scenes
Each scene represented by its own SRN. parameters ! & ∈ ℝ % parameters ! " ∈ ℝ % parameters ! ' ∈ ℝ % parameters ! ( ∈ ℝ %
Each scene represented by its own SRN. parameters ! * ∈ ℝ $ parameters ! ( ∈ ℝ $ ! " live on k-dimensional subspace of ℝ $ , % < ' . parameters ! + ∈ ℝ $ parameters ! , ∈ ℝ $
Each scene represented by its own SRN. embedding ! & ∈ ℝ % parameters ) & ∈ ℝ * embedding ! " ∈ ℝ % parameters ) " ∈ ℝ * Represent each scene with low-dimensional embedding embedding ! ' ∈ ℝ % parameters ) ' ∈ ℝ * embedding ! ( ∈ ℝ % parameters ) ( ∈ ℝ *
Each scene represented by its own SRN. embedding ) & ∈ ℝ * parameters ! & ∈ ℝ % Hyp ypernetwork k embedding ) " ∈ ℝ * parameters ! " ∈ ℝ % Ψ: ℝ * → ℝ % , z / ↦ Ψ ) 1 = ! 1 embedding ) ' ∈ ℝ * parameters ! ' ∈ ℝ % embedding ) ( ∈ ℝ * parameters ! ( ∈ ℝ %
Results
Novel View Synthesis – Baseline Comparison struction of objects in held-out test set Shapenet v2 – si single-sh shot reconst SRNs (Ours) Tatarchenko et al. Training § Shapenet cars / chairs. § 50 observations per object. Tatarchenko et al. Worrall et al. 2015 Testing • Cars / chairs from unseen test set • Single observation! Worrall et al. Deterministic 2017 Input pose GQN Deterministic GQN, adapted Eslami et al. SRNs 2018
Novel View Synthesis – SRN Output In Input pose se struction of objects in held-out test set Shapenet v2 – si single-sh shot reconst
Sampling at arbitrary resolutions 512x512 32x32 64x64 128x128 256x256 Surface Normals RGB
Generalization to unseen camera poses Camera close-up Camera Roll SRNs
Generalization to unseen camera poses Camera close-up Camera Roll SRNs Doesn’t reconstruct Doesn’t reconstruct Tatarchenko et al. geometry geometry
Latent code interpolation RGB Surface Normals
Latent code interpolation RGB Surface Normals
Can represent room-scale scenes, but aren’t compositional. Training set novel-view synthesis on Work-in-progress: Compositional SRNs GQN rooms (Eslami et al. 2018) with generalize to unseen numbers of objects! Shapenet cars, 50 observations.
Scene Representation Networks: Continuous 3D-structure-aware Neural Scene Representations Vincent Sitzmann Michael Zollhöfer Gordon Wetzstein Find me at Poster # 71! vsitzmann.github.io Looki king fo for rese search posi sitions @vincesitzmann in n sc scene represe sentation lear earni ning ng . Single-shot reconstruction Interpolation Camera pose extrapolation
Recommend
More recommend