deep hough voting for 3d object detection in point clouds
play

Deep Hough Voting for 3D Object Detection in Point Clouds Charles - PowerPoint PPT Presentation

Deep Hough Voting for 3D Object Detection in Point Clouds Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas; The IEEE International Conference on Computer Vision (ICCV), 2019 Jan Bayer, Computational Robotics Laboratory, Czech Technical


  1. Deep Hough Voting for 3D Object Detection in Point Clouds Charles R. Qi, Or Litany, Kaiming He, Leonidas J. Guibas; The IEEE International Conference on Computer Vision (ICCV), 2019 Jan Bayer, Computational Robotics Laboratory, Czech Technical University in Prague

  2. Introduction Goal: detect object classes, and bounding boxes from ● 3D point clouds Input: Uncolored 3D point clouds ● Robust to illumination changes – Contribution ● A reformulation of Hough voting in the context of – deep learning through an end-to-end difgerentiable architecture State-of-the-art 3D object detection performance – on SUN RGB-D and ScanNet An in-depth analysis of the importance of voting Deep Hough Voting for 3D Object Detection in Point Clouds, – Qi et al. ICCV 2019 for 3D object detection in point clouds

  3. 3D object detection methods Extended 2D-based detectors to 3D ● 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019. – Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016 – 3D CNN detectors → high cost of 3D convolutions – Projection to 2D bird’s eye view images ● Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017 – Designed for outdoor LIDAR data – 2D-based detectors, projection to point cloud ● Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018 – 2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017 – Strictly dependent on the 2D detector – 2D object detection quickly reduces the search space –

  4. Extended 2D-based detectors to 3D 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019. ● Fuse both 2D RGB input features with 3D scan geometry features –

  5. 3D object detection methods Extended 2D-based detectors to 3D ● 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019. – Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016 – 3D CNN detectors → high cost of 3D convolutions – Projection to 2D bird’s eye view images ● Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017 – Designed for outdoor LIDAR data – 2D-based detectors, projection to point cloud ● Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018 – 2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017 – Strictly dependent on the 2D detector – 2D object detection quickly reduces the search space –

  6. Projection to 2D bird’s eye view images MV3D: Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017 Data from front RGB camera, and LIDAR → 3 views are used to generate 2D features ● Fused features are used to jointly predict ● object class and do oriented 3D box regression

  7. 3D object detection methods Extended 2D-based detectors to 3D ● 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019. – Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016 – 3D CNN detectors → high cost of 3D convolutions – Projection to 2D bird’s eye view images ● Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017 – Designed for outdoor LIDAR data – 2D-based detectors, projection to point cloud ● Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018 – 2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017 – Strictly dependent on the 2D detector – 2D object detection quickly reduces the search space –

  8. 2D-based detectors, projection to point cloud F-PointNet Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018 ● 2D CNN object detector to propose 2D regions and classify their content ● Similar architecture to older approach 2D-driven ●

  9. 3D object detection methods Extended 2D-based detectors to 3D ● 3D-SIS: 3D Semantic Instance Segmentation of RGB-D Scans, Hou et al. CVPR 2019. – Deep sliding shapes for amodal 3d object detection in rgb-d images, Song et al. CVPR 2016 – 3D CNN detectors → high cost of 3D convolutions – Projection to 2D bird’s eye view images ● Multi-view 3d object detection network for autonomous driving, Chen et al. CVPR 2017 – Designed for outdoor LIDAR data – 2D-based detectors, projection to point cloud ● Frustum pointnets for 3d object detection from rgb-d data, Qi et al. CVPR 2018 – 2d-driven 3d object detection in rgb-d images, Lahoud et al. CVPR 2017 – Strictly dependent on the 2D detector – 2D object detection quickly reduces the search space –

  10. VoteNet Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  11. Pointnet Pointnet ● Pointnet: Deep learning on point sets for 3d classifjcation and segmentation, Qi et al. CoRR 2016 – For input point cloud of the size N – Generates N local features (one for each input point) ● Generates single global feature ● Processing the combination of local and global features → classifjcation, and 3d scene segmentation ●

  12. Pointnet++ Improves Pointnet recognition of fjne-grained patterns, and complex scenes segmentation – For N input points generate M feature points for classifjcation, segmentation requires upsampling to – provide information for all the input points Pointnet++: Deep hierarchical feature learning on point sets in a metric space, Qi et al. CoRR 2017

  13. VoteNet – point cloud feature learning backbone 4 Set abstraction layers ● Sub sampling: 2048, 1024, 512, 256 – 2 Upsampling layers ● Upsampling to 1024 points with C =256 – Interpolate the features on input points – to output points (weighted average of 3 nearest input point features) Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  14. VoteNet Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  15. Hough voting Sampling the image generates patches ● Network is used to regress features for k-NN search ● Codebook contains pre-computed associations between features and 6D object poses ● Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation, Kehl et al. ECCV 2016 ●

  16. VoteNet - voting Deep NN generates votes directly from the input – features More effjcient than kNN lookups ● MLP net with fully connected layers ● For each seed is generated one vote ● independently on others – Vote is 3d ofgset of the object center, relative to the feature position Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  17. VoteNet Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  18. VoteNet – object proposal and classification Sampling and grouping ● Votes are divided into K clusters by spatial – clustering Classifjcation, object location, and boundaries ● PointNet-like network aggregates the votes in – order to generate object proposals Output – set of object – proposals: Objectness score ● Bounding box ● parameters – Center – Heading – Scale Semantic classifjcation ● Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019 score

  19. Indoor evaluation datasets: description SUN RGB-D: A RGB-D Scene Understanding Benchmark Suite, Song et al. CVPR 2015 ● Sensors: Asus Xtion, Intel Ralsense, Microsoft Kinect – Available at: http://rgbd.cs.princeton.edu/ – 10,335 RGB-D images (including NYU, B3DO, SUN3D) – Annotated 64,595 3D bounding boxes – 800 object categories, 47 scene categories – Built method: SIFT+RANSAC + point-plane ICP – ScanNet: Richly-annotated 3D Reconstructions of Indoor Scenes, Dai et al. CVPR 2017 ● Available at: http://www.scan-net.org/ScanNet/ – RGB-D data from real-world environments – 2.5 million views, 15013 scans, 707 spaces – Annotated with 3D camera poses – Surface reconstructions, CAD models – Instance-level semantic segmentations. –

  20. Object detection results on SUN RGB-D set Evaluation metric: mean Average Precision (mAP) ● Intersection over Union (IoU) for thresholding correctly matched objects – 5000 RGB-D training images with amodal oriented 3D bounding boxes for 37 object categories. ● Evaluated with 3D IoU threshold 0.25 ● VoteNet model is 4x smaller than F-PointNet model ● Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  21. Object detection results - SUN RGB-D dataset Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  22. Object detection results on ScanNetV2 set 1200 training examples, hundreds of rooms, 18 object categories ● VoteNet used non colored point clouds, while others not ● Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

  23. Object detection results on ScanNetV2 set Deep Hough Voting for 3D Object Detection in Point Clouds,Qi et al. ICCV 2019

Recommend


More recommend