Lifelong Visual Mapping Linguang Zhang Princeton Vision Group
Papers • Toward lifelong object segmentation from change detection in dense rgb-d maps. • Towards Semantic KinectFusion. • Slam++: Simultaneous localisation and mapping at the level of objects. • 3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world.
Current SLAM Systems • NO truly ‘pick up and play’ SLAM systems. � • Low-cost devices. • Used by non-expert users. • Sparse feature-based SLAM: � • world is modeled as an unconnected point cloud. • can be improved by key frame SLAM systems (bundle adjustment). • Dense SLAM: � • GPGPU processing hardware. • Reconstruct and track full surface models. • KinectFusion (extension using a sliding volume: Kintinuous). • Truly scalable, multi-resolution, loop closure capable dense non-parametric surface representation has not been developed. • Wasteful in environments with symmetry.
Tools • KinectFusion • Kintinuous: https://www.youtube.com/watch?v=D3yYjaLmiqU • Iterative Closest Point: https://www.youtube.com/watch? v=PCPfuZ7njmQ
Lifelong Object Segmentation • Input: two RGB-D maps with regions of overlap where the map have changed (obtained from Kintinuous system) • Goals: • Discover objects by differencing. • Training segmentation algorithms. • If the same object is discovered, refine the features and the segmentation method.
Object Discovery • RGB-D SLAM - Kintinuous. • Map Alignment - manually initialized and refined by ICP. • Differencing (Symmetric). • . • Filtering. • Small scattered points - remove clusters smaller than a 3 x 3 x 3 cm cube (estimated from Kintinuous volumetric resolution). • Free-space filtering.
Free-space Filtering before filtering after filtering
Free-space Filtering occupied unseen free space Only assume the clusters protrude into the free-space as objects.
Segmentation Methods • Graph structure. • Treat every point in the map as a node. • Neighboring nodes are defined by the points within a radius r’ (twice the volumetric resolution). • Connect neighboring nodes by undirected edges with weights (initialized by T ). • Kill edges with a weight below a dynamic threshold. • Threshold (controlled by k ) grows after each joining. • k is correlated to the segment size.
Edge Weights • Color edge weights: � • Normal edge weights: Convex Convex parts more likely correspond to objects. Concave parts correspond to object boundaries.
Segmentation Fitting • Scoring. � • Optimization.
Different Segmentation Methods Color Convexity surface normal
Segmentation Fitting • Object representation. • based on Principle Component Analysis (PCA). • each feature is used as a normal distribution with mean being the measured value and variance being • Object matching.
Lifelong Learning • Manually group discovered objects together to guarantee the correct convergence for the variance of the features. • Update scores and feature distributions. • new score is the weighted average using the number of times each object was observed as the weights.
Results trash bin stuffed bunny Precision Recall Curve
Results
Semantic KinectFusion • Contributions. • using a TSDF volume to build a keyframe representation of the environment. • Semantic Bundle Adjustment.
Keyframe Representation • Surface voxels: voxels having a non-truncated function value in TSDF. • Adding a new field to each TSDF voxel to keep track of the list of keyframes. • Appending the keyframe index to the keyframe list when merging a keyframe into the volume.
Optimization • Optimize the pose graph. (G2O optimizer) • cost function: • Novel matching scheme - for each keyframe: • find its corresponding surface voxels by back-projection. • get keyframe list and project the voxel back to all corresponding keyframes. • 2D local search and find the nearest 3D point. • After all these, recreate the TSDF.
Semantic KinectFusion • Extracting 3D features. (In experiments: SIFT3D, Color-SHOT.) “A Representation for 3-D Surface Matching” • “A combined texture-shape descriptor for enhanced 3D feature matching” • • Generate set of candidate hypotheses on objects’ presence. • RANSAC-based 6DOF pose estimation. • Validation graph • edges: • virtual edges: • Clear wrong hypotheses and include all the constraints into the global graph.
System Overview
Results
Results
Results
Object-oriented SLAM - SLAM++ • Stronger assumptions: • World has intrinsic symmetry - repetitive objects. • Pre-define the objects. • Video: https://www.youtube.com/watch?v=tmrAh1CqCRo
Characteristics of Object SLAM • The map only contains a few discrete entities. • Enables the possibility to jointly optimize over all object positions to make globally consistent map. • Tracking one object in 6DOF is enough to localize a camera. • Lost camera or loop closure detection can be performance based on a small number of object measurements.
SLAM Map Representation • Graph representation: • object - world pose • object - camera pose • camera - world pose • additional factors: • camera - camera pose (ICP). • structural priors: objects must be grounded on the same plane.
Real-Time Object Recognition • Point-Pair Features (PPFs). • 4D descriptors of relative position and normals of pairs of oriented points. • Randomly sample points and pair up in all possible combinations to generate PPFs. • Matching against models and producing a vote for each match. • Active Object Search: • Generate a mask image space from view prediction. (avoid occlusion)
Active Object Search
Camera Tracking and Object Pose Estimation • Camera model tracking. • In KinectFusion, track against incomplete models at early stage. • For SLAM++, track against high quality models. • Tracking for model initialization. (reject incorrect objects) • Given a candidate object and detected pose, run camera-model ICP estimation on the detected object pose. • Camera-Object pose constraints. • Run dense ICP estimate between the live frame and each visible model object.
Relocalization
Loop Closure • Small loops: • standard ICP tracking mechanism • Large loops: • matching fragments within the main long-term graph (same as relocalization)
Loop Closure
Real-world Example Whelan, Thomas, et al. "3D mapping, localisation and object retrieval using low cost robotic platforms: A robotic search engine for the real-world." https://www.youtube.com/watch?v=XqDUniEY954
System Overview
Steps • Avoidance-based exploration. • Dense SLAM. • Planar simplification. • Object detection. • Path planning. • Onboard localization and control.
Problem: compounding of failure rates • Frame-drops in the wireless streaming of raw- RGB-D image sequence. • Failure of the planar segmentation algorithm. • Failure of object segmentation -> recognition due to noise. Conclusion: techniques involved are quite robust alone. But the reliability is not enough when combining them together into a complete framework.
Recommend
More recommend