Single-View and Multi-View Planar Models for Dense Monocular Mapping - PowerPoint PPT Presentation

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, José M. Fácil and Javier Civera SLAMLab – Robotics, Perception and Real-Time Group Universidad de Zaragoza, Spain International Workshop on Lines, Planes and Manhattan Models for 3-D Mapping (LPM 2017) September 28, 2017, IROS 2017, Vancouver.

Index • Motivation • Background (direct mapping) • Dense monocular mapping. • Superpixels in monocular mapping • Superpixel triangulation. • Dense mapping using superpixels. • Superpixel fitting. • Learning-based planar models in monocular mapping • Data-driven primitives • Layout • Deep models • Conclusions

Motivation • The scene model is limited in feature-based monocular SLAM. • Our goal: Dense mapping from monocular (RGB) image sequences

Background: Dense Monocular Mapping High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d. Dense

Dense Monocular Mapping: Low Texture High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense

Superpixels (mid-level) • Image segmentation based on color and 2D distance. • Decent features for textureless areas • We assume that homogeneous color regions are almost planar. High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d. Dense Superpixels Dense + Sup.

Dense Mapping: Low Texture High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense

Semi-dense Mapping: Low Texture High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d.

2D Superpixels: Low Texture High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Superpixels

Superpixel Triangulation  Multiview model: Homography ( ℎ = 𝐿(𝑆 + 𝑢𝑜/𝑒)𝐿 −1 )  Error: Contour reprojection error (ɛ)  Montecarlo Initialization: For every superpixel we create several reasonable { 𝑜, 𝑒 } hypothesis and rank them by their error. H

Superpixel Triangulation  Multiview model: Homography ( ℎ = 𝐿(𝑆 + 𝑢𝑜/𝑒)𝐿 −1 )  Error: Contour reprojection error (ɛ)  Mapping: Minimize the reprojection error. H

Superpixels in low-textured areas High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Superpixels

Using Superpixels in Monocular SLAM

Dense + Superpixels

Dense + Superpixels High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense + Sup.

Dense + Superpixels (5 centimetres error!) PMVS (high-gradient pixels) Dense (TV-regularization) Video (input) Based on Richard A Newcombe, Steven J Lovegrove, and Yasutaka Furukawa and Jean Ponce. Accurate, Andrew J Davison. Dtam: Dense tracking and mapping in dense, and robust multiview stereopsis. IEEE real-time. In Computer Vision (ICCV), 2011 IEEE Transactions on Pattern Analysis and Machine International Conference on, pages 23202327. IEEE, 2011. Intelligence, 32(8):13621376, 2010. PMVS + Superpixels Superpixels Dense + Superpixels Alejo Concha, Wajahat Hussain, Luis Montano and Javier Alejo Concha and Javier Civera. Using Civera, Manhattan and Piecewise-Planar Constraints for Superpixels in Monocular SLAM. ICRA Dense Monocular Mapping, RSS 2014. 2014

Fitting 3D Superpixels to Semi-dense Maps • TV-regularization is expensive, GPU might be needed for real-time. • Semidense mapping and superpixels is a reasonable option cheaper than TV-regularization (CPU) and with a small loss on density. • Having a semidense map superpixels can be initialized via SVD more accurately and at a lower cost. • LIMITATION: We need parallax!! Code at https://github.com/alejocb/dpptam

Data-driven primitives (mid-level)  Feature discovery on RGB-D training data.  Extracts patterns that are consistent in D and discriminative in RGB  At test time, from a single RGB view we can predict mid-level depth patterns. .

Multiview Layout (high-level) (a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)

Superpixels, Data-Driven Primitives and Layout

Superpixels, Data-Driven Primitives and Layout • NYU dataset, high- parallax sequences

Superpixels, Data-Driven Primitives and Layout • NYU dataset, low-parallax sequences

Single-View Depth Prediction • Several networks already exist ( Eigen14, Eigen15 , Liu15, Liu15, Chakrabarti16, Cao16, Godard16, Ummenhofer 16…)

Deep Learning Depth vs. Multiview Depth Deep Learning Depth Multiview Depth Fairly accurate in all pixels Very accurate in high-gradient pixels, inaccurate in low-gradient ones Fairly accurate for single view Very accurate for high-parallax motion, inaccurate for low-parallax one No model for the error Good model for the error Approximate scale 3D reconstruction up to scale Errors depend on the image content Errors depend on the geometry

Fusing depth from deep learning and multiple views • The fusion is not trivial. • Our assumption is • No uncertainty for CNN • In general, deep learning depth depth. is more accurate • Errors come from different • Multiple view more accurate sources. for high texture - high parallax

Results • The error of deep learning depth is ~50% lower than multi-view one. • Our fusion reduces the error ~10% over the deep learning results. • The scale invariant metric shows that our fusion fixes the structure. • Deep depth generalizes well (Eigen15 was trained on NYU but is accurate on TUM)

Conclusions (no free lunch!)  Point-based features (low-level)  High accuracy iff ↑texture and ↑parallax.  Superpixels (mid-level)  High accuracy iff ↓texture and ↑parallax .  Data-driven primitives (mid-level)  Fair accuracy for → ↑ texture and ↓parallax.  Not fully dense.  Layout (high-level)  Fair accuracy even for ↓texture and ↓parallax .  Assumes a predetermined scene shape.  Deep learning (mid/high-level)  Fair accuracy even for ↓texture and ↓parallax.  Fully dense.  More general.

Single-View and Multi-View Planar Models for Dense Monocular Mapping - PowerPoint PPT Presentation

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, Jos M. Fcil and Javier Civera SLAMLab Robotics, Perception and Real-Time Group Universidad de Zaragoza, Spain International Workshop on Lines, Planes and

Planar Subdivision Let G =( V , E ) be an undirected graph. G is planar if it can be embedded

Planar Algebras and Subfactors Tangle Planar algebra Connection with subfactor Subfactor

Order - disorder operators in planar and almost planar graphs (2) Hugo Duminil-Copin, I.H. E.S.

1-Fan-Bundle-Planar Drawings of Graphs Patrizio Angelini Michael A. Bekos Michael Kaufmann

Computational Geometry Lecture 9: Planar point location Computational Geometry Lecture 9: Planar

Towards Deep Multi-View Stereo Silvano Galliani October 2, 2017 1 / 40 Towards Deep Multi-View

Chapter 12 and 11.1 Planar graphs, regular polyhedra, and graph colorings Prof. Tesler Math

LAGOS 2017 Delta-Wye Transformations and the Efficient Reduction of Almost-Planar Graphs Isidoro

A master bijection for planar maps and its applications Olivier Bernardi (MIT) Joint work with

1-Bend RAC Drawings of NIC-Planar Graphs in Quadratic Area Steven Chaplick, Fabian Lipp,

Rectangular Planar Array wher e, Rectangular Planar Array an where k = 2 / d The

8.7 PLANAR GRAPHS def: A graph is planar if it can be drawn with- out edge-crossings in the

Planar Pixel Sensor Production at CiS Planar Pixel Sensor Production at CiS Anna Macchiolo - MPP

Computational Geometry Lecture 9: Planar point location 1 Computational Geometry Lecture 9:

Chapter 7 Planar graphs In full: 7.17.3 Parts of: 7.4, 7.67.8 Skip: 7.5 Prof. Tesler

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Unsupervised Monocular Depth Estimation CNN Robust to Training Data Diversity Valery

DeepCap: Monocular Human Performance Capture Using Weak Supervision Marc Habermann, Weipeng Xu ,

From 2D to 3D: Monocular Vision With application to robotics/AR Motivation How many sensors do

Rent3D: Floor-Plan Priors for Monocular Layout Estimation Chenxi Liu 1 , Alexander Schwing 2 ,

COMPUTER VISION FOR ROBOT NAVIGATION Sanketh Shetty Computer Vision and Robotics Laboratory

Visual SLAM for Mobile Instructor - Simon Lucey 16-623 - Designing Computer Vision Apps Example

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

* * 2 :