lifelong visual mapping
play

Lifelong Visual Mapping Linguang Zhang Princeton Vision Group - PowerPoint PPT Presentation

Lifelong Visual Mapping Linguang Zhang Princeton Vision Group Papers Toward lifelong object segmentation from change detection in dense rgb-d maps. Towards Semantic KinectFusion. Slam++: Simultaneous localisation and mapping at the


  1. Lifelong Visual Mapping Linguang Zhang Princeton Vision Group

  2. Papers • Toward lifelong object segmentation from change detection in dense rgb-d maps. • Towards Semantic KinectFusion. • Slam++: Simultaneous localisation and mapping at the level of objects. • 3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world.

  3. Current SLAM Systems • NO truly ‘pick up and play’ SLAM systems. � • Low-cost devices. • Used by non-expert users. • Sparse feature-based SLAM: � • world is modeled as an unconnected point cloud. • can be improved by key frame SLAM systems (bundle adjustment). • Dense SLAM: � • GPGPU processing hardware. • Reconstruct and track full surface models. • KinectFusion (extension using a sliding volume: Kintinuous). • Truly scalable, multi-resolution, loop closure capable dense non-parametric surface representation has not been developed. • Wasteful in environments with symmetry.

  4. Tools • KinectFusion • Kintinuous: https://www.youtube.com/watch?v=D3yYjaLmiqU • Iterative Closest Point: https://www.youtube.com/watch? v=PCPfuZ7njmQ

  5. Lifelong Object Segmentation • Input: two RGB-D maps with regions of overlap where the map have changed (obtained from Kintinuous system) • Goals: • Discover objects by differencing. • Training segmentation algorithms. • If the same object is discovered, refine the features and the segmentation method.

  6. Object Discovery • RGB-D SLAM - Kintinuous. • Map Alignment - manually initialized and refined by ICP. • Differencing (Symmetric). • . • Filtering. • Small scattered points - remove clusters smaller than a 3 x 3 x 3 cm cube (estimated from Kintinuous volumetric resolution). • Free-space filtering.

  7. Free-space Filtering before filtering after filtering

  8. Free-space Filtering occupied unseen free space Only assume the clusters protrude into the free-space as objects.

  9. Segmentation Methods • Graph structure. • Treat every point in the map as a node. • Neighboring nodes are defined by the points within a radius r’ (twice the volumetric resolution). • Connect neighboring nodes by undirected edges with weights (initialized by T ). • Kill edges with a weight below a dynamic threshold. • Threshold (controlled by k ) grows after each joining. • k is correlated to the segment size.

  10. Edge Weights • Color edge weights: � • Normal edge weights: Convex Convex parts more likely correspond to objects. Concave parts correspond to object boundaries.

  11. Segmentation Fitting • Scoring. � • Optimization.

  12. Different Segmentation Methods Color Convexity surface normal

  13. Segmentation Fitting • Object representation. • based on Principle Component Analysis (PCA). • each feature is used as a normal distribution with mean being the measured value and variance being • Object matching.

  14. Lifelong Learning • Manually group discovered objects together to guarantee the correct convergence for the variance of the features. • Update scores and feature distributions. • new score is the weighted average using the number of times each object was observed as the weights.

  15. Results trash bin stuffed bunny Precision Recall Curve

  16. Results

  17. Semantic KinectFusion • Contributions. • using a TSDF volume to build a keyframe representation of the environment. • Semantic Bundle Adjustment.

  18. Keyframe Representation • Surface voxels: voxels having a non-truncated function value in TSDF. • Adding a new field to each TSDF voxel to keep track of the list of keyframes. • Appending the keyframe index to the keyframe list when merging a keyframe into the volume.

  19. Optimization • Optimize the pose graph. (G2O optimizer) • cost function: • Novel matching scheme - for each keyframe: • find its corresponding surface voxels by back-projection. • get keyframe list and project the voxel back to all corresponding keyframes. • 2D local search and find the nearest 3D point. • After all these, recreate the TSDF.

  20. Semantic KinectFusion • Extracting 3D features. (In experiments: SIFT3D, Color-SHOT.) “A Representation for 3-D Surface Matching” • “A combined texture-shape descriptor for enhanced 3D feature matching” • • Generate set of candidate hypotheses on objects’ presence. • RANSAC-based 6DOF pose estimation. • Validation graph • edges: • virtual edges: • Clear wrong hypotheses and include all the constraints into the global graph.

  21. System Overview

  22. Results

  23. Results

  24. Results

  25. Object-oriented SLAM - SLAM++ • Stronger assumptions: • World has intrinsic symmetry - repetitive objects. • Pre-define the objects. • Video: https://www.youtube.com/watch?v=tmrAh1CqCRo

  26. Characteristics of Object SLAM • The map only contains a few discrete entities. • Enables the possibility to jointly optimize over all object positions to make globally consistent map. • Tracking one object in 6DOF is enough to localize a camera. • Lost camera or loop closure detection can be performance based on a small number of object measurements.

  27. SLAM Map Representation • Graph representation: • object - world pose • object - camera pose • camera - world pose • additional factors: • camera - camera pose (ICP). • structural priors: objects must be grounded on the same plane.

  28. Real-Time Object Recognition • Point-Pair Features (PPFs). • 4D descriptors of relative position and normals of pairs of oriented points. • Randomly sample points and pair up in all possible combinations to generate PPFs. • Matching against models and producing a vote for each match. • Active Object Search: • Generate a mask image space from view prediction. (avoid occlusion)

  29. Active Object Search

  30. Camera Tracking and Object Pose Estimation • Camera model tracking. • In KinectFusion, track against incomplete models at early stage. • For SLAM++, track against high quality models. • Tracking for model initialization. (reject incorrect objects) • Given a candidate object and detected pose, run camera-model ICP estimation on the detected object pose. • Camera-Object pose constraints. • Run dense ICP estimate between the live frame and each visible model object.

  31. Relocalization

  32. Loop Closure • Small loops: • standard ICP tracking mechanism • Large loops: • matching fragments within the main long-term graph (same as relocalization)

  33. Loop Closure

  34. Real-world Example Whelan, Thomas, et al. "3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world." https://www.youtube.com/watch?v=XqDUniEY954

  35. System Overview

  36. Steps • Avoidance-based exploration. • Dense SLAM. • Planar simplification. • Object detection. • Path planning. • Onboard localization and control.

  37. Problem: compounding of failure rates • Frame-drops in the wireless streaming of raw- RGB-D image sequence. • Failure of the planar segmentation algorithm. • Failure of object segmentation -> recognition due to noise. Conclusion: techniques involved are quite robust alone. But the reliability is not enough when combining them together into a complete framework.

Recommend


More recommend