real time structure and object model aware sparse slam
play

Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre - PowerPoint PPT Presentation

Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre of Excellence for Robotic Vision Mehdi Hosseinzadeh Australian Centre for Robotic Vision The University of Adelaide Invited Talk, Second LPM Workshop, 23 May 2019 ICRA 2019


  1. Real-Time Structure and Object- Model Aware Sparse SLAM ARC Centre of Excellence for Robotic Vision Mehdi Hosseinzadeh Australian Centre for Robotic Vision The University of Adelaide Invited Talk, Second LPM Workshop, 23 May 2019 ICRA 2019 Montreal, Canada www.roboticvision.or roboticvision.org g

  2. We want to build a robot that we can ask “Go get me a spoon!” www.roboticvision.or roboticvision.org g

  3. “The goal of semantic SLAM is to create maps that include meanings , both to robots and human. Maps that include semantic information make it easier for robots and human to communicate and reason about goals .” Meanings = Semantics, affordance, relations, … www.roboticvision.or roboticvision.org g

  4. Q: How to add semantics/objects to the map? www.roboticvision.or roboticvision.org g

  5. Different approaches • Semantic Mapping – Offline approach – Online/Incremental approach • Semantic SLAM – Indirect methods – Direct methods www.roboticvision.or roboticvision.org g

  6. Semantic Mapping • Incorporate semantics in the mapping without informing localization / #,) ( ( ( ( ' " ' # ' )*" ' ) … . )*" $ % . " + " + # + , + - & " & # ! " ! # • SemanticFusion, … www.roboticvision.or roboticvision.org g

  7. Semantic Mapping • Offline approach – Map reconstruction followed by 3D semantic segmentation • Online/Incremental approach – Incremental map reconstruction and semantic segmentation www.roboticvision.or roboticvision.org g

  8. Semantic SLAM • Semantics/Objects inform also localization – Indirectly by improving the data- association • Changing the topology of the factor graph – Directly by involving in “object space” optimization www.roboticvision.or roboticvision.org g

  9. Indirect Semantic SLAM • High-level landmarks in the graph are typically coordinate frames connected to the object . %,( ICP ' ' & % ' & ()$ & ( ' … & $ ! " - $ - ()$ * $ * % * + * , # $ ICP # % Pose/Centre of the Object • SLAM++, … www.roboticvision.or roboticvision.org g

  10. Direct Semantic SLAM • High-level landmarks in the graph: latent space representation or coarse representation of objects • Object or structure landmarks are optimized as independent landmarks directly in BA . %,( ' ' ' ' & $ & % & ()$ & ( … - ()$ ! " - $ * $ * % * + * , # $ # % Input Reconstruction Coarse Fine Representation Representation www.roboticvision.or roboticvision.org g

  11. Our Goal • Incorporating generic objects as quadrics – Detected by a real-time deep-learned object detector • Incorporating finer reconstruction of objects – Reconstructed by a point set CNN – To refine the shape of quadric • Incorporating dominant planar structure of the scene • in a sparse key-frame based point-SLAM – More accurate localization – Semantically rich maps www.roboticvision.or roboticvision.org g

  12. Plane and Quadric Representations in our sparse SLAM • Proposed Dual Quadric Representation • sparse SLAM compatible • allows online updates • allows semantic constraints • estimates rough extent and orientation • bounding boxes as observations in images • Structure of the scene – Planes • normalized homogenous representations • Priors – Manhattan constraints – Affordance constraints – Shape priors www.roboticvision.or roboticvision.org g

  13. Quadric Geometry • Point Quadric 𝑅 • a quadric surface in 3D space (ellipsoids, …) can be represented by a homogeneous quadratic form defined on the 3D projective space ℙ # which satisfies the following equation 𝑦 % 𝑅𝑦 = 0 § • the relationship between a point quadric and its projection into an image plane (a conic) is not straightforward. Dual Quadric 𝑅 ∗ • • is represented as the envelope of a set of tangent planes, viz: • 𝜌 % 𝑅 ∗ 𝜌 = 0 • 𝜌 is a plane tangent to the point quadric 𝑅 www.roboticvision.or roboticvision.org g

  14. Quadric Geometry • Dual quadric (ellipsoid) decomposition 𝑆 𝑢 𝑀𝑀 % 𝑆 % 0 0 𝑅 ∗ = 𝑈𝑅 + ∗ 𝑈 % = 0 % 1 0 % 𝑢 % −1 1 • Decoupled update in the underlying manifolds 𝑅 ∗ ⊕ ∆𝑅 ∗ = 𝑈, 𝑀 ⊕ ∆𝑈, ∆𝑀 = (𝑈. ∆𝑈, 𝑀 + ∆𝑀) www.roboticvision.or roboticvision.org g

  15. Landmarks and Constraints Objects: 3D Points: ● Represented by a quadric (9D) ● ORB features ● Decomposed to ● Tracked based on: ○ Inlier matched points and semantics Object Point reprojection error conic observation Camera Point-Plane Constraint 3d plane observation Supporting/Tangency Affordance: ● Imposed based on: ○ geometric tangency in the map 3D Planes: ○ vicinity of the semantic ● Minimal rep (normalized objects in the frame homogenous plane) Plane ● Matched by the 3d geometry of Manhattan Assumption: the planes and inlier matched ● Orthogonal planes points ● Parallel planes www.roboticvision.or roboticvision.org g

  16. Quadrics Projective Geometry Observation Model 𝑃 𝑣 𝐶 =>? 𝐃 ∗ 𝐶 ∗ 𝑤 www.roboticvision.or roboticvision.org g

  17. Quadric Initialization 3D Quadric is ready for Step 2: initialization in factor graph Optimize the Observation factor 𝒈 𝑹 independentely Step 1: Enclosing sphere of 3D points www.roboticvision.or roboticvision.org g

  18. Point-Cloud Reconstruction and Shape Priors Normalized Point Cloud CNN Minimum Enclosing Ellipsoid Quadric Prior Factor . R , t , s Registration . Our SLAM . Quadric Point Cloud in Quadric Representation Representation www.roboticvision.or roboticvision.org g

  19. Point-Cloud Reconstructions www.roboticvision.or roboticvision.org g

  20. Monocular Plane Detection " Clustering in Clustering in Depth Normal Space ! Regressing to Planar Regions RGB Input Joint CNN # ' % ' & $ & $ % Agreement Second First Plane $ = (+, -, ., /) 1 of Plane Hypothesis Plane Hypotheses Hypothesis Detection www.roboticvision.or roboticvision.org g

  21. Pipeline of our system ORB-SLAM2 Point Feature Extraction and Matching Depth Create Local Map with Joint CNN Plane Plane Surface RGB Frame Points, Planes, Quadrics, Normal Detection Matching Network 1 Object Point-Clouds Semantic Segmentation YOLOv3 Object Object Local Map Tracking Tracking Detector Camera Pose Optimisation Generate Point-Cloud Hypotheses for Registration Constraints Reconstructor Add KeyFrame CNN 2 Update Local Map of Estimated Global Bundle Bag of Words Local Bundle Points, Planes, Adjustment Loop Detection Adjustment Map Quadrics 1 V. Nekrasov, T. Dharmasiri, A. Spek, T. Drummond, C. Shen, and I. Reid, “Real-time joint semantic segmentation and depth estimation using asymmetric annotations,” ICRA 2019. 2 H. Fan, H. Su, and L. J. Guibas, “A point set generation network for 3d object reconstruction from a single image.”, CVPR 2017. www.roboticvision.or roboticvision.org g

  22. Results fr2/desk nyu office 1b nyu office_1 ORB-Features Reconstructed Map Reconstructed Map Segmented Planes Detected Objects (Side) (Top) www.roboticvision.or roboticvision.org g

  23. Different plane detectors Reconstructed map for fr1/xyz Reconstructed map for fr1/xyz with the baseline plane detector with the proposed plane detector www.roboticvision.or roboticvision.org g

  24. Outdoor large scale Reconstructed map for KITTI-7 sequence with our SLAM system from different viewpoints with/without rendering quadrics. Proposed object observation and point-cloud-induced prior factors are effective in this reconstruction. www.roboticvision.or roboticvision.org g

  25. Quantitative Results Ablation study against point-based monocular ORB-SLAM2 Table: Comparison against monocular ORB-SLAM2. PP , PP+M , PO , and PPO+MS mean points-planes only, points- planes+Manhattan constraint, points-objects only, and all of the landmarks with Manhattan and supporting constraints, respectively. RMSE is reported for ATE in cm for 7 sequences in TUM datasets. Numbers in bold in each row represent the best performance for each sequence. Numbers in [ ] show the percentage of improvement over monocular ORB-SLAM2. www.roboticvision.or roboticvision.org g

  26. Challenges • Engineering of a multicomponent system • Ad-hoc semantic/geometric factors • Initialization matters … – Quadrics • Partial detections and occlusions • Missing real orientation of objects www.roboticvision.or roboticvision.org g

  27. Challenges www.roboticvision.or roboticvision.org g

  28. Future Directions • Additional learned pose/orientation factors ! )*+,- Object 5 6789 Point . ∗ 0 ! ! ' & : Camera 2 3 4 6D Pose ! ! ( % ! $ Plane 1 ! ! # ∥ * Left image from PoseCNN www.roboticvision.or roboticvision.org g

  29. Future Directions • Topology of the factor graph (observation/constraint factors) based on scene graphs * Left image from VGfM www.roboticvision.or roboticvision.org g

  30. Demo 1 Real-Time Monocular Object-Model Aware Sparse SLAM (ICRA 2019) www.roboticvision.or roboticvision.org g

  31. Demo 2 Real-Time Monocular Object-Model Aware Sparse SLAM (ICRA 2019) www.roboticvision.or roboticvision.org g

  32. www.roboticvision.or roboticvision.org g

Recommend


More recommend