scene represe sentation networks ks
play

Scene Represe sentation Networks: ks: Continuous - PowerPoint PPT Presentation

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations Vincent Sitzmann Michael Zollhfer Gordon Wetzstein single image camera pose Novel Views Surface Normals intrinsics Self-supervised Scene


  1. Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations Vincent Sitzmann Michael Zollhöfer Gordon Wetzstein

  2. single image camera pose Novel Views Surface Normals intrinsics

  3. Self-supervised Scene Representation Learning { } , ,… Lat Latent ent 3D 3D Scenes cenes { } , ,… Obse serva vations + + Image + Pose & Intrinsics What can we learn about latent 3D scenes from observations? Vision: Learn rich representations just by watching video!

  4. Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations , ,… , ,… Model Image Loss

  5. Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Represe sentation , ,… , ,… Persistent feature representation of scene. Image Loss

  6. Self-supervised Scene Representation Learning Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er Represe sentation , ,… , ,… Persistent feature Render from different representation of camera perspectives. scene. Image Loss

  7. 2D baseline: Autoencoder Re Re-Re Rende dered d Obse serva vations Obse serva vations Conv Conv , ,… , ,… Latent Code Encoder Decoder + Output Pose Image Loss

  8. 2D baseline: Autoencoder Re Re-Re Rende dered d Obse serva vations Obse serva vations Conv , ,… , ,… Latent Code Decoder Output Pose Image Loss

  9. Doesn’t capture 3D properties of scenes. Trained on ~2500 shapenet cars with 50 observations each. Need 3D inductive bias!

  10. Related Work 3D inductive ve bias s / Self Se lf-su supervi vise sed 3D st structure with pose sed images Scene Represe sentation Learning Tatarchenko et al., 2015 Worrall et al., 2017 Eslami et al., 2018 … 2D Generative ve Models Goodfellow et al., 2014 Kingma et al., 2013 Kingma et al., 2018 … 3D Computer Visi sion Choy et al., 2016 Huang et al., 2018 Park et al., 2018 … Voxe xel-base sed Represe sentations Memory inefficient: ! " # . • Sitzmann et al., 2019 • Doesn’t parameterize scene surfaces smoothly. Lombardi et al., 2019 • Generalization is hard. Phuoc et al., 2019 …

  11. Scene Representation Networks Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er , ,… , ,… Represe sentation Image Loss

  12. Scene Representation Networks Re Re-Re Rende dered d Obse serva vations Obse serva vations Neur eural al Scene cene Neur eural al Rend ender erer er , ,… , ,… Represe sentation Image Loss

  13. Free Space ! " Objects ! #

  14. Model scene as function Φ that maps coordinates to features. [] Free Space $ % " ∈ … [] Objects Φ: ℝ ) → ℝ + " ∈ … [] … Free " ∈ Space … $ &

  15. Scene Representation Network parameterizes Φ as MLP. [] Free Space ) * Sc Scene " ∈ Represe sentation … Net etwor ork [] Φ: ℝ & → ℝ ( Objects " ∈ … [] … Free " ∈ Space … ) +

  16. Scene Representation Network parameterizes Φ as MLP. Sc Scene Can sample anywhere, Represe sentation Net etwor ork at arbitrary resolutions. Φ: ℝ $ → ℝ & Parameterizes scene surfaces smoothly. Memory scales with scene complexity.

  17. Scene Representation Networks Neur eural al Scene cene Represe sentation Re Re-Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & Neur eural al Rend ender erer er , ,… , ,… Image Loss

  18. Scene Representation Networks Neur eural al Scene cene Represe sentation Re Re-Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & Neur eural al Rend ender erer er , ,… , ,… Image Loss

  19. Neural Renderer. Free Space ! " ! #

  20. Neural Renderer.

  21. Neural Renderer.

  22. Neural Renderer Step 1: Intersection Testing. Idea: march along ray until arrived at surface. ? ? ? ? ?

  23. Neural Renderer Step 1: Intersection Testing. feature $ # vector Scene Represe sentation Φ: ℝ ( → ℝ * ! " ! # world coordinates

  24. Neural Renderer Step 1: Intersection Testing. Ray Marching LSTM feature # " * "+, vector Step length Scene Represe sentation Φ: ℝ ' → ℝ ) Feasible step length: Distance to closest scene surface ! - ! "+, ! " world coordinates

  25. Neural Renderer Step 1: Intersection Testing. Iteration 0

  26. Neural Renderer Step 1: Intersection Testing. Iteration 1

  27. Neural Renderer Step 1: Intersection Testing. Iteration 2

  28. Neural Renderer Step 1: Intersection Testing. Iteration 3

  29. Neural Renderer Step 2: Color Generation Iteration 4

  30. Neural Renderer Step 1: Intersection Testing. Iteration …

  31. Neural Renderer Step 1: Intersection Testing.

  32. Neural Renderer Step 2: Color Generation Scene Represe sentation Φ: ℝ $ → ℝ & Color MLP

  33. Can now train end-to-end with posed images only! Neur eural al Scene cene Neur eural al Rend ender erer er Represe sentation Re-Re Re Rende dered d Obse serva vations Obse serva vations Φ: ℝ $ → ℝ & , ,… , ,… Image Loss

  34. Generalizing across a class of scenes

  35. Each scene represented by its own SRN. parameters ! & ∈ ℝ % parameters ! " ∈ ℝ % parameters ! ' ∈ ℝ % parameters ! ( ∈ ℝ %

  36. Each scene represented by its own SRN. parameters ! * ∈ ℝ $ parameters ! ( ∈ ℝ $ ! " live on k-dimensional subspace of ℝ $ , % < ' . parameters ! + ∈ ℝ $ parameters ! , ∈ ℝ $

  37. Each scene represented by its own SRN. embedding ! & ∈ ℝ % parameters ) & ∈ ℝ * embedding ! " ∈ ℝ % parameters ) " ∈ ℝ * Represent each scene with low-dimensional embedding embedding ! ' ∈ ℝ % parameters ) ' ∈ ℝ * embedding ! ( ∈ ℝ % parameters ) ( ∈ ℝ *

  38. Each scene represented by its own SRN. embedding ) & ∈ ℝ * parameters ! & ∈ ℝ % Hyp ypernetwork k embedding ) " ∈ ℝ * parameters ! " ∈ ℝ % Ψ: ℝ * → ℝ % , z / ↦ Ψ ) 1 = ! 1 embedding ) ' ∈ ℝ * parameters ! ' ∈ ℝ % embedding ) ( ∈ ℝ * parameters ! ( ∈ ℝ %

  39. Results

  40. Novel View Synthesis – Baseline Comparison struction of objects in held-out test set Shapenet v2 – si single-sh shot reconst SRNs (Ours) Tatarchenko et al. Training § Shapenet cars / chairs. § 50 observations per object. Tatarchenko et al. Worrall et al. 2015 Testing • Cars / chairs from unseen test set • Single observation! Worrall et al. Deterministic 2017 Input pose GQN Deterministic GQN, adapted Eslami et al. SRNs 2018

  41. Novel View Synthesis – SRN Output In Input pose se struction of objects in held-out test set Shapenet v2 – si single-sh shot reconst

  42. Sampling at arbitrary resolutions 512x512 32x32 64x64 128x128 256x256 Surface Normals RGB

  43. Generalization to unseen camera poses Camera close-up Camera Roll SRNs

  44. Generalization to unseen camera poses Camera close-up Camera Roll SRNs Doesn’t reconstruct Doesn’t reconstruct Tatarchenko et al. geometry geometry

  45. Latent code interpolation RGB Surface Normals

  46. Latent code interpolation RGB Surface Normals

  47. Can represent room-scale scenes, but aren’t compositional. Training set novel-view synthesis on Work-in-progress: Compositional SRNs GQN rooms (Eslami et al. 2018) with generalize to unseen numbers of objects! Shapenet cars, 50 observations.

  48. Scene Representation Networks: Continuous 3D-structure-aware Neural Scene Representations Vincent Sitzmann Michael Zollhöfer Gordon Wetzstein Find me at Poster # 71! vsitzmann.github.io Looki king fo for rese search posi sitions @vincesitzmann in n sc scene represe sentation lear earni ning ng . Single-shot reconstruction Interpolation Camera pose extrapolation

Recommend


More recommend