for 3d perception
play

for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and - PowerPoint PPT Presentation

High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1 The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5


  1. High Dimensional Convolutional Neural Networks for 3D perception Chris Choy, Ph.D. candidate @ Stanford Vision and Learning Lab 1

  2. The Success of Convolutional Networks FCNN [Long et al.] AlexNet [Krizhevsky et al.] R-CNN [Girshick et al.] 5 GAN [Goodfellow et al.]

  3. The Success of Convolutional Networks Versatility Speech Recognition, Abdel-Hamid et al. Object Detection Semantic Segmentation Experience Machine Translation Efficiency 6

  4. Examples of 3D Vision Tasks 3D Registration 3D Reconstruction 7 3D Object Pose Estimation 3D Object Tracking

  5. 3D Vision in Action Microsoft HoloLens Amazon AR View Nvidia Research, 2019 8

  6. 3D Reconstruction Supervised Reconstruction 3D Perception 3D Semantic Segmentation 3D Feature Learning Perception on a Set of 3D Data 4D Spatio-Temporal Perception 4D and 6D for Registration 15

  7. 3D Reconstruction Supervised Reconstruction 3D Perception 3D Semantic Segmentation 3D Feature Learning Perception on a Set of 3D Data 4D Spatio-Temporal Perception 4D and 6D for Registration 16

  8. ● 3D-Recurrent Reconstruction Neural Networks, Chris , Danfei, JunYoung , Kevin, Silvio, ECCV’16 ● Universal Correspondence Networks, Chris , JunYoung , Silvio, Manmohan, NIPS’16 ● Weakly supervised 3D Reconstruction with Adversarial Constraint, JunYoung, Chris , Manmohan, Animehs , Silvio, 3DV’17 3D Reconstruction ● DeformNet: Free-Form Deformation Network for 3D Shape Reconstruction from a Single Image, Andrey, Jingwei, Animesh, Viraj, JunYoung, Chris , Silvio, WACV’18 ● Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings, Kevin, Chris , Manolis , Angel, Thomas, Silvio, ACCV’18 ● 4D-Spatio Temporal ConvNets: Minkowski Convolutional Neural Networks, Chris , JunYoung , Silvio, CVPR’19 17

  9. 3D Reconstruction from Few Images ● Single or Multi-view images of an object ● Online retail stores Input Images 3D Reconstruction TODO 18

  10. 3D Reconstruction from Few Images ● Wide baseline ● Specular / texture-less region ● Single view 19

  11. 3D Reconstruction Observations (Images) Algorithms Depth Estimation Structure from Motion MVS Tomography Object-centric Reconstruction … [Eigen et al., Saxena et al., …] [Longuet-Higgins, Haming et al., Snavely et al., …] 3D Representation 20

  12. 3D Recurrent Reconstruction Neural Networks ● End-to-end 3D reconstruction ● Unified framework ● Single-view & Multi-view reconst. ● 3D-Convolutional LSTM ● Update hidden states Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 22

  13. 23

  14. 24

  15. 25

  16. 26

  17. 27

  18. Update / maintain prediction Increasing confidence on armrests Number of images Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 30

  19. Robustness to texture and # views Chris, Danfei, JunYoung, Kevin, Silvio, 3D-Recurrent Reconstruction Neural Networks , ECCV’16 33

  20. 3D Reconstruction Supervised Reconstruction 3D Perception 3D Semantic Segmentation 3D Feature Learning Perception on a Set of 3D Data 4D Spatio-Temporal Perception 4D and 6D for Registration 35

  21. ● SegCloud: Semantic Segmentation of 3D Point Clouds, Lyne, Chris , Iro, JunYoung Silvio, 3DV’17 ● 4D-Spatio Temporal ConvNets: Minkowski Convolutional Neural 3D Perception Networks, Chris , JunYoung, Silvio, CVPR’19 ● Fully Convolutional Geometric Features, Chris , Jaesik, Vladlen , ICCV’19 36

  22. Sparsity of 3D data vs. O(N 3 ) volume O(N 2 ) surface 37

  23. 20cm voxel : 18% 38

  24. 10cm voxel : 9% 39

  25. 5cm voxel : 4.5% 40

  26. 2.5cm voxel : 1.8% 41

  27. Sparse Representations and Convolution Conti tinuo nuous us Discre screte te Graph ph Repr presentat ntation on Representatio tation Repr presentat ntation on Graph Net Points and PointNet Occupancy Net OctNet and Octree [Kipf & Wellings] [Qi et al.] [Mescheder et al.] [Riegler et al.] Deep SDF Conv on Graph [Park et al.] [Defferrard et al.] …. Deep Level Sets [Michalkiewicz et al.] Hybrid rid Continuous Convolution Repr presentat ntation on Sparse Tensor • PointCNN …. [Graham et al., Choy et al.] • Monte Carlo Conv Contiuous + Graph • Surface / Tangent Conv …. 43

  28. Sparse Matrix (0, 0) ● Majority of elements are 0 ● Efficient representation ● Non-zero elements only ● Compressed sparse row (CSR) ● List of lists ● COOrdinate list ● Etc. ● Example: 2x2 matrix COOrdinate (COO) representation ○ 4 at (0, 0) ○ 1 at (1, 1) ○ 45

  29. Sparse Tensor (0, 0, 0) ● High-dimensional extension ● COOrdinate representation 4 at (0, 0, 0) ○ 1 at (1, 1, 0) ○ 9 at (1, 1, 1) ○ 46

  30. Convolution on a Sparse Tensor Sparse Convolution Convolution Cannot support arbitrary sparsity Dense Tensor Kernel Static Sparsity Pattern [Graham et al., Submanifold Sparse ConvNet, 2017] 47 [Graham and Maaten, 3D Sparse ConvNet, 2018]

  31. Generalized Convolution Can support arbitrary sparsity Sparse Tensor Kernel Dynamic Sparsity Pattern [Graham et al.] [Choy et al.] 50 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  32. Generalized Convolution Sparsity pattern manipulation Can support arbitrary sparsity Ex) C = A + B Ex) Pruning High-dimensional ConvNet Sparse Tensor Kernel Volume of dense convolution kernel: O(N D ) Sparse convolution kernel: O(D) Dynamic Sparsity Pattern Generative Tasks 51

  33. Generalized Convolution: Special Cases Sparse Tensor Kernel Dynamic Sparsity Pattern • Octree Generative Networks • Dilated Convolution • Separable Convolution Arbitrary sparsity • Sparse Convolution • Dense Convolution 52

  34. Minkowski Engine A convolutional neural network library for sparse tensors ● Convolution ● [Max/Avg/Global] Pool ● Broadcast ● [Batch/Instance] Normalization ● Tensor arithmetic ● Pruning ● … 60 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  35. Minkowski Network ● Very deep convolutional neural networks possible in 3D 42-layer deep neural networks for semantic segmentation ○ 101 layers for classification ○ ● Reuse network architectures from years of research in 2D 4D MinkNet18 ResNet18 61 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  36. Minkowski Engine for other applications 62 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  37. Sparsity Pattern Reconstruction 65 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  38. 3D Perception: Semantic Segmentation ● Partition 3D scans or data into semantic parts ● Label each voxel or 3D point as one of semantic labels 66

  39. 3D Semantic Segmentation on Sparse Tensors ● Sparse tensors for all input/output feature maps ● U-shaped network Hierarchical map ○ Increases receptive field size exponentially ○ 67 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  40. Results: ScanNet 70 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  41. Results: Stanford 3D 72 Choy et al., 4D Spati tio-Te Temp mpora ral l ConvNet ets: : Minkows wski Convolu luti tional l Neura ral l Network rks , CVPR’19

  42. 3D Reconstruction Supervised Reconstruction 3D Perception 3D Semantic Segmentation 3D Feature Learning Perception on a Set of 3D Data 4D Spatio-Temporal Perception 4D and 6D for Registration 74

  43. ● Universal Correspondence Network, Chris , JunYoung, Silvio, Manmohan, NIPS’16 ● Fully Convolutional Geometric Features, Chris , Jaesik, Vladlen , ICCV’19 3D Feature Learning 75

  44. 3D Geometric Feature ● A vector representation of the local / global 3D geometry Correspondence, registration, tracking, scene flow, ... ○ 76

  45. Prior works in 3D Geometric Features Learn arned d Feature tures Hand-de designe gned d Feature ures 3DMatch, CGF, PointNet, PPF, FoldNet, Spin Image, USC, SHOT, PFH, FPFH PPFFold, CapsuleNet, DirectReg, SmoothNet ● Extract a small 3D patch Limits context, receptive field ○ Features extracted separately ○ ● Preprocessing Normal, Signed Distance Function, curvatures ○ 77 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res , ICCV’19

  46. Fully Convolutional Metric Learning ● No preprocessing, no patch extraction no receptive field limit by crop size ○ Efficient reuse of shared computation ○ ● Hardest Negative Mining Choy et al., Univers ersal l Corres responde dence e Network rk , NIPS’16 Choy et al., Fully Convolu lutio ional l Geomet metric ric Featu tures res , ICCV’19 80

Recommend


More recommend