scene understanding with 3d deep networks
play

Scene Understanding with 3D Deep Networks Thomas Funkhouser - PowerPoint PPT Presentation

Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University Disclaimer: I am talking about the work of these people Shuran Song Fisher Yu Yinda Zhang Andy Zeng Maciej Halber Jianxiong Xiao Angela Dai Matt Fisher


  1. Scene Understanding with 3D Deep Networks Thomas Funkhouser Princeton University

  2. Disclaimer: I am talking about the work of these people … Shuran Song Fisher Yu Yinda Zhang Andy Zeng Maciej Halber Jianxiong Xiao Angela Dai Matt Fisher Matthias Niessner

  3. Goal Understanding indoor scenes observed in RGB-D images • Robotics • Augmented reality • Virtual tourism • Surveillance • Home remodeling • Real estate • Telepresence • Forensics • Games • etc.

  4. Goal Understanding indoor scenes observed in RGB-D images Semantic Segmentation Input RGB-D Image(s)

  5. Goal Understanding indoor scenes observed in RGB-D images in 3D Semantic Segmentation Input RGB-D Image(s) 3D Scene Understanding

  6. Goal Understanding indoor scenes observed in RGB-D images in 3D • Surface reconstruction • Amodal object detection • Object relationships • Materials, lights, etc. • Physical properties • Novel views Semantic Segmentation • Info sharing • Spatial inference • Simulation • etc.

  7. Goal for This Talk Learn ConvNets to recognize patterns in voxels • Local shape descriptor • Amodal object detection • Semantic scene completion

  8. Talk Outline Small Local shape descriptor Scale Amodal object detection Semantic scene completion Large

  9. Talk Outline Small Local shape descriptor Scale Amodal object detection Semantic scene completion Large A. Zeng, S. Song, M. Niessner, M. Fisher, J. Xiao, T. Funkhouser, “3DMatch: Learning Local Geometric Descriptors from 3D Reconstructions,” submitted to CVPR 2017

  10. Local Shape Descriptor Goal: train a discriminating 3D local shape descriptor from data Local shape descriptor Local shape descriptor … … 0.58 0.21 0.92 0.67 0.04 0.53 0.58 0.21 0.92 0.67 0.04 0.53 Match!

  11. Local Shape Descriptor Challenge: where to get training data?

  12. Local Shape Descriptor: “3D Match” Approach: train on wide-baseline correspondences in RGB-D reconstructions “Ground truth” match between RGB-D Images from different views

  13. Local Shape Descriptor: “3D Match” Approach: train on wide-baseline correspondences in RGB-D reconstructions

  14. Local Shape Descriptor: “3D Match” Method: sample true/false correspondences from RGB-D reconstructions, train Siamese network

  15. Local Shape Descriptor: “3D Match” Result: learns to discriminate local shapes found in real-world data

  16. Local Shape Descriptor: “3D Match” Results Result 1: learned feature descriptor predicts RGB-D point correspondences more accurately than hand-tuned descriptors Match classification error at 95% recall Fragment Alignment Success Rate

  17. Local Shape Descriptor: “3D Match” Results Result 2: feature descriptor learned from RGB-D reconstructions provides matching for recognizing poses of small objects in Amazon Picking Challenge Object pose prediction accuracy Predicting pose of 3D object model in RGB-D scan

  18. Local Shape Descriptor: “3D Match” Results Result 3: feature descriptor learned from RGB-D reconstructions provides discriminative matching of semantic correspondences on 3D meshes

  19. Talk Outline Small Local Shape Descriptor Scale Amodal object detection Semantic scene completion Large S. Song and J. Xiao, “Deep Sliding Shapes for Amodal 3D Object Detection in RGB- D Images,” CVPR 2016

  20. Object Detection Goal: given a RGB-D image, find objects (labeled 3D amodal bounding boxes) Input: Single RGB-D Output: labeled 3D Amodal Boxes

  21. Object Detection Most previous work: Image 3D Amodal Encode Depth Map 2D Contour 2D Region 2D Object 2D Instance Coarse Pose Point Cloud as Extra Channels Detection Proposal Detection Segmentation Classification Alignment Detection Result Depth Map 3D Input 2D Operations 3D 3D Output [CVPR13] Perceptual Organization and Recognition of Indoor Scenes from RGB-D Images [IJCV14] Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and semantic segmentation [ECCV14] Object Detection and Segmentation using Semantically Rich Image and Depth Features [CVPR15] Aligning 3D Models to RGB-D Images of Cluttered Scenes [CVPR16] Cross Modal Distillation for Supervision Transfer

  22. Object Detection: “Deep Sliding Shapes” Approach: Image 3D Deep Learning 3D Amodal Detection Result Depth Map 3D Input 3D Operations 3D Output

  23. Object Detection: “Deep Sliding Shapes” bed RGB-D Image Object Recognition Network Region Proposal Network

  24. Object Detection: “Deep Sliding Shapes” bed RGB-D Image Object Recognition Network Region Proposal Network

  25. Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF

  26. Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF 2.5 m 5.2 m 5.2 m

  27. Object Detection: “Deep Sliding Shapes” Data encoding: 1) Estimate major directions of room 2) Compute TSDF

  28. Object Detection: “Deep Sliding Shapes” 3D region proposal network: Region Proposal Network TSDF 3D Region Proposals

  29. Object Detection: “Deep Sliding Shapes” 3D region proposal network: Physical Size ×50 Pixel Area ×3

  30. Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network:

  31. Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: ReLU + Pool ReLU + Pool Input: TSDF Conv 1 Conv 2

  32. Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth Receptive field: 0.4 m 3

  33. Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class 0.6×0.2×0.4 m ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth 0.6×0.2×0.4 m 0.5×0.5×0.2 m Level 1 Anchors Receptive field: 0.4 m 3

  34. Object Detection: “Deep Sliding Shapes” Multiscale 3D region proposal network: Conv Softmax Class Conv Softmax Class ReLU + Pool ReLU + Pool ReLU + Pool ReLU + Pool Input: TSDF Conv L1 Conv L1 Conv 1 Conv 2 Conv 3 3D Box Smooth Conv 4 3D Box Smooth Receptive field: 1 m 3 Receptive field: 0.4 m 3

  35. Object Detection: “Deep Sliding Shapes” Conv Softmax Class ReLU + Pool Conv L1 Conv 4 3D Box Smooth Level 2 Anchors Receptive field: 1 m 3

  36. Object Detection: “Deep Sliding Shapes” bed RGB-D Image Object Recognition Network Region Proposal Network

  37. Object Detection: “Deep Sliding Shapes” Joint object recognition network: project to 2D

  38. Object Detection: “Deep Sliding Shapes” Joint object recognition network: TSDF Image Patch

  39. Object Detection: “Deep Sliding Shapes” Joint object recognition network:

  40. Object Detection: “Deep Sliding Shapes” Joint object recognition network: ReLU + Pool ReLU + Pool FC Class Softmax Conv 1 Conv 2 Conv 3 ReLU FC 2 3D ConvNet 2D VGG on ImageNet Concatenation FC 3D Box L1 Smooth FC 3

  41. Object Detection: “Deep Sliding Shapes” Experiments Train and test on amodal boxes provided in SUN RGB-D S. Song, S. Lichtenberg, and J. Xiao, “SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite,” CVPR 2015

  42. Object Detection: “Deep Sliding Shapes” Results Quantitative comparisons: 3D Non-Deep Learning 2D Deep Learning 3D Deep Learning Object detection accuracy on NYU v2 dataset (mAP)

  43. Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: sofa Ours: bathtub

  44. Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: chair Ours: sofa

  45. Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: table Ours: bed

  46. Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: miss Ours: table and chairs

  47. Object Detection: “Deep Sliding Shapes” Results Qualitative comparisons: Sliding Shapes: toilet Ours: garbage bin+bed

  48. Talk Outline Small Local Shape Descriptor Scale Amodal object detection Semantic scene completion Large S. Song, F. Yu, A. Zeng, A. Chang, M. Savva, and T. Funkhouser, “ Semantic Scene Completion from a Single Depth Image,” submitted to CVPR 2017

  49. Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class Input: Single view depth map Output: Semantic scene completion

  50. Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class visible surface free space occluded space outside view outside room 3D Scene

  51. Semantic Scene Completion Goal: given an RGB-D image, label all voxels by semantic class visible surface free space occluded space outside view outside room 3D Scene

  52. Semantic Scene Completion Prior work: segmentation OR completion Silberman et al. surface segmentation scene completion Firman et al. 3D Scene The occupancy and the object identity This paper are tightly intertwined ! semantic scene completion

Recommend


More recommend