deep models for 3d reconstruction
play

Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision - PowerPoint PPT Presentation

Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, T ubingen Computer Vision and Geometry Group, ETH Z urich October 12, 2017 Max Planck Institute for Intelligent Systems Autonomous


  1. Deep Models for 3D Reconstruction Andreas Geiger Autonomous Vision Group, MPI for Intelligent Systems, T¨ ubingen Computer Vision and Geometry Group, ETH Z¨ urich October 12, 2017 Max Planck Institute for Intelligent Systems Autonomous Vision Group

  2. 3D Reconstruction [Furukawa & Hernandez: Multi-View Stereo: A Tutorial] Task: ◮ Given a set of 2D images ◮ Reconstruct 3D shape of object/scene 2

  3. 3D Reconstruction Pipeline Input Images 3

  4. 3D Reconstruction Pipeline Input Images Camera Poses 3

  5. 3D Reconstruction Pipeline Input Images Camera Poses Dense Correspondences 3

  6. 3D Reconstruction Pipeline Input Images Camera Poses Dense Correspondences Depth Maps 3

  7. 3D Reconstruction Pipeline Input Images Camera Poses Dense Correspondences Depth Map Fusion Depth Maps 3

  8. 3D Reconstruction Pipeline Input Images Camera Poses Dense Correspondences 3D Reconstruction Depth Map Fusion Depth Maps 3

  9. 3D Reconstruction Pipeline Input Images Camera Poses Dense Correspondences 3D Reconstruction Depth Map Fusion Depth Maps 3

  10. Large 3D Datasets and Repositories [Newcombe et al ., 2011] [Choi et al ., 2011] [Dai et al ., 2017] [Wu et al ., 2015] [Chang et al ., 2015] [Chang et al ., 2017] 4

  11. Can we learn 3D Reconstruction from Data?

  12. OctNet: Learning Deep 3D Representations at High Resolutions [Riegler, Ulusoy, & Geiger, CVPR 2017]

  13. Deep Learning in 2D [LeCun, 1998] 7

  14. Deep Learning in 3D 8

  15. Deep Learning in 3D ◮ Existing 3D networks limited to ∼ 32 3 voxels 8

  16. 3D Data is often Sparse [Geiger et al ., 2012] 9

  17. 3D Data is often Sparse [Li et al ., 2016] 9

  18. 3D Data is often Sparse [Li et al ., 2016] Can we exploit sparsity for efficient deep learning? 9

  19. Network Activations Layer 1: 32 3 Layer 2: 16 3 Layer 3: 8 3 10

  20. Network Activations Layer 1: 32 3 Layer 2: 16 3 Layer 3: 8 3 10

  21. Network Activations Layer 1: 32 3 Layer 2: 16 3 Layer 3: 8 3 Idea: ◮ Partition space adaptively based on sparse input 10

  22. Convolution 11

  23. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  24. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  25. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  26. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  27. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  28. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  29. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  30. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  31. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  32. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  33. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  34. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  35. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  36. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  37. 0.125 0.250 0.125 0.000 0.000 0.000 0.125 0.250 0.125 Convolution 11

  38. Convolution 11

  39. Convolution 11

  40. Convolution ◮ Differentiable ⇒ allows for end-to-end learning 11

  41. Efficient Convolution This operation can be implemented very efficiently: ◮ 4 different cases ◮ First case requires only 1 evaluation! 12

  42. Pooling 13

  43. Pooling 13

  44. Pooling 13

  45. Pooling ◮ Unpooling operation defined similarly 13

  46. Results: 3D Shape Classification Airplane Convolution Convolution Fully Fully and Pooling and Pooling Conn. Conn. 14

  47. Results: 3D Shape Classification 80 OctNet 70 DenseNet Memory [GB] 60 50 40 30 20 10 0 8 3 16 3 32 3 64 3 128 3 256 3 Input Resolution 15

  48. Results: 3D Shape Classification 16 OctNet 14 DenseNet 12 Runtime [s] 10 8 6 4 2 0 8 3 16 3 32 3 64 3 128 3 256 3 Input Resolution 15

  49. Results: 3D Shape Classification 0 . 95 0 . 90 Accuracy 0 . 85 0 . 80 0 . 75 OctNet DenseNet 0 . 70 8 3 16 3 32 3 64 3 128 3 256 3 Input Resolution ◮ Input: voxelized meshes from ModelNet 16

  50. Results: 3D Shape Classification OctNet 1 0 . 94 OctNet 2 0 . 92 Accuracy OctNet 3 0 . 90 0 . 88 0 . 86 8 3 16 3 32 3 64 3 128 3 256 3 Input Resolution ◮ Input: voxelized meshes from ModelNet 16

  51. Results: 3D Shape Classification 17

  52. Results: 3D Semantic Labeling Input Prediction ◮ Dataset: RueMonge2014 18

  53. Results: 3D Semantic Labeling Skip Skip Convolution Convolution Unpooling Unpooling and Pooling and Pooling and Conv. and Conv. ◮ Decoder octree structure copied from encoder 19

  54. Results: 3D Semantic Labeling IoU [Riemenschneider et al ., 2014] 42.3 [Martinovic et al ., 2015] 52.2 [Gadde et al ., 2016] 54.4 OctNet 64 3 45.6 OctNet 128 3 50.4 OctNet 256 3 59.2 20

  55. OctNetFusion: Learning Depth Fusion from Data [Riegler, Ulusoy, Bischof & Geiger, 3DV 2017]

  56. Volumetric Fusion w ( p ) ˆ w i ( p ) d i ( p ) + ˆ d ( p ) d i +1 ( p ) = ◮ p ∈ R 3 : voxel location w i ( p ) + ˆ w ( p ) ◮ d : distance, w : weight w i +1 ( p ) = w i ( p ) + ˆ w ( p ) [Curless and Levoy, SIGGRAPH 1996] 22

  57. Volumetric Fusion ◮ Pros : ◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...) Ground Truth Volumetric Fusion 23

  58. Volumetric Fusion ◮ Pros : ◮ Simple, fast, easy to implement ◮ Defacto ”gold standard” (KinectFusion, Voxel Hashing, ...) ◮ Cons : ◮ Requires many redundant views to reduce noise ◮ Can’t handle outliers / complete missing surfaces Ground Truth Volumetric Fusion 23

  59. TV-L1 Fusion ◮ Pros : ◮ Prior on surface area ◮ Noise reduction Ground Truth Volumetric Fusion TV-L1 Fusion 23

  60. TV-L1 Fusion ◮ Pros : ◮ Prior on surface area ◮ Noise reduction ◮ Cons : ◮ Simplistic local prior (penalizes surface area, shrinking bias) ◮ Can’t complete missing surfaces Ground Truth Volumetric Fusion TV-L1 Fusion 23

  61. Learned Fusion ◮ Pros : ◮ Learn noise suppression from data ◮ Learn surface completion from data Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion 23

  62. Learned Fusion ◮ Pros : ◮ Learn noise suppression from data ◮ Learn surface completion from data ◮ Cons : ◮ Requires large 3D datasets for training ◮ How to scale to high resolutions? Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion 23

  63. Learning 3D Fusion Skip Skip Convolution Convolution Unpooling Unpooling and Pooling and Pooling and Conv. and Conv. Input Representation: Output Representation: ◮ TSDF ◮ Occupancy ◮ Higher-order statistics ◮ TSDF 24

  64. Learning 3D Fusion Skip Skip Convolution Convolution Unpooling Unpooling and Pooling and Pooling and Conv. and Conv. What is the problem? 24

  65. Learning 3D Fusion Skip Skip Convolution Convolution Unpooling Unpooling and Pooling and Pooling and Conv. and Conv. What is the problem? ◮ Octree structure unknown ⇒ needs to be inferred as well! 24

  66. OctNetFusion Architecture 64³ 64³ ∆ 64 Octree Structure Input Output Features 128³ 128³ ∆ 128 Octree Structure Input Output Features 256³ 256³ ∆ 256 Input Output 25

  67. Results: Surface Reconstruction VolFus TV-L1 Ours Ground Truth 64 3 128 3 256 3 26

  68. Results: Volumetric Completion [Firman, 2016] Ours Ground Truth 27

  69. Thank you!

Recommend


More recommend