Single-View and Multi-View Planar Models for Dense Monocular Mapping - - PowerPoint PPT Presentation

single view and multi view planar
SMART_READER_LITE
LIVE PREVIEW

Single-View and Multi-View Planar Models for Dense Monocular Mapping - - PowerPoint PPT Presentation

Single-View and Multi-View Planar Models for Dense Monocular Mapping Alejo Concha, Jos M. Fcil and Javier Civera SLAMLab Robotics, Perception and Real-Time Group Universidad de Zaragoza, Spain International Workshop on Lines, Planes and


slide-1
SLIDE 1

Single-View and Multi-View Planar Models for Dense Monocular Mapping

Alejo Concha, José M. Fácil and Javier Civera SLAMLab – Robotics, Perception and Real-Time Group Universidad de Zaragoza, Spain International Workshop on Lines, Planes and Manhattan Models for 3-D Mapping (LPM 2017) September 28, 2017, IROS 2017, Vancouver.

slide-2
SLIDE 2

Index

  • Motivation
  • Background (direct mapping)
  • Dense monocular mapping.
  • Superpixels in monocular mapping
  • Superpixel triangulation.
  • Dense mapping using superpixels.
  • Superpixel fitting.
  • Learning-based planar models in monocular mapping
  • Data-driven primitives
  • Layout
  • Deep models
  • Conclusions
slide-3
SLIDE 3

Motivation

  • The scene model is limited in

feature-based monocular SLAM.

  • Our goal: Dense mapping from monocular (RGB) image sequences
slide-4
SLIDE 4

Background: Dense Monocular Mapping

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d. Dense

slide-5
SLIDE 5

Dense Monocular Mapping: Low Texture

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense

slide-6
SLIDE 6

Superpixels (mid-level)

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d. Dense Superpixels Dense + Sup.

  • Image segmentation based on color

and 2D distance.

  • Decent features for textureless areas
  • We assume that homogeneous color

regions are almost planar.

slide-7
SLIDE 7

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense

Dense Mapping: Low Texture

slide-8
SLIDE 8

Semi-dense Mapping: Low Texture

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Sparse/Semi-d.

slide-9
SLIDE 9

2D Superpixels: Low Texture

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Superpixels

slide-10
SLIDE 10

Superpixel Triangulation

H

 Multiview model: Homography (ℎ = 𝐿(𝑆 + 𝑢𝑜/𝑒)𝐿−1)  Error: Contour reprojection error (ɛ)  Montecarlo Initialization: For every superpixel we create several

reasonable {𝑜, 𝑒} hypothesis and rank them by their error.

slide-11
SLIDE 11

Superpixel Triangulation

 Multiview model: Homography (ℎ = 𝐿(𝑆 + 𝑢𝑜/𝑒)𝐿−1)  Error: Contour reprojection error (ɛ)  Mapping: Minimize the reprojection error.

H

slide-12
SLIDE 12

Superpixels in low-textured areas

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Superpixels

slide-13
SLIDE 13

Using Superpixels in Monocular SLAM

slide-14
SLIDE 14

Dense + Superpixels

slide-15
SLIDE 15

Dense + Superpixels

High Texture Low Texture Accuracy Density Cost Accuracy Density Cost Dense + Sup.

slide-16
SLIDE 16

PMVS (high-gradient pixels) Dense (TV-regularization) Superpixels PMVS + Superpixels Dense + Superpixels Video (input)

Dense + Superpixels (5 centimetres error!)

Alejo Concha and Javier Civera. Using Superpixels in Monocular SLAM. ICRA 2014 Yasutaka Furukawa and Jean Ponce. Accurate, dense, and robust multiview stereopsis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8):13621376, 2010. Based on Richard A Newcombe, Steven J Lovegrove, and Andrew J Davison. Dtam: Dense tracking and mapping in real-time. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 23202327. IEEE, 2011. Alejo Concha, Wajahat Hussain, Luis Montano and Javier Civera, Manhattan and Piecewise-Planar Constraints for Dense Monocular Mapping, RSS 2014.

slide-17
SLIDE 17

Fitting 3D Superpixels to Semi-dense Maps

  • TV-regularization is expensive, GPU might be needed for real-time.
  • Semidense mapping and superpixels is a reasonable option cheaper than

TV-regularization (CPU) and with a small loss on density.

  • Having a semidense map superpixels can be initialized via SVD more

accurately and at a lower cost.

  • LIMITATION: We need parallax!!

Code at https://github.com/alejocb/dpptam

slide-18
SLIDE 18

Data-driven primitives (mid-level)

.

 Feature discovery on RGB-D training

data.

 Extracts patterns that are consistent

in D and discriminative in RGB

 At test time, from a single RGB view

we can predict mid-level depth patterns.

slide-19
SLIDE 19

Multiview Layout (high-level)

(a) Sparse/Semidense reconstruction. (b) Plane normals from 3D vanishing points (image VP, backprojection, 3D clustering). (c) Plane distances from a sparse/semidense multiview reconstruction. (d) Superpixel segmentation, geometric and photometric feature extraction. (e), (f) Classification (Adaboost)

slide-20
SLIDE 20

Superpixels, Data-Driven Primitives and Layout

slide-21
SLIDE 21

Superpixels, Data-Driven Primitives and Layout

  • NYU dataset, high-

parallax sequences

slide-22
SLIDE 22

Superpixels, Data-Driven Primitives and Layout

  • NYU dataset,

low-parallax sequences

slide-23
SLIDE 23

Single-View Depth Prediction

  • Several networks already

exist (Eigen14, Eigen15, Liu15, Liu15, Chakrabarti16, Cao16, Godard16, Ummenhofer 16…)

slide-24
SLIDE 24

Deep Learning Depth vs. Multiview Depth

Deep Learning Depth Multiview Depth

Fairly accurate in all pixels Very accurate in high-gradient pixels, inaccurate in low-gradient ones Fairly accurate for single view Very accurate for high-parallax motion, inaccurate for low-parallax one No model for the error Good model for the error Approximate scale 3D reconstruction up to scale Errors depend on the image content Errors depend on the geometry

slide-25
SLIDE 25

Fusing depth from deep learning and multiple views

  • The fusion is not trivial.
  • No uncertainty for CNN

depth.

  • Errors come from different

sources.

  • Our assumption is
  • In general, deep learning depth

is more accurate

  • Multiple view more accurate

for high texture - high parallax

slide-26
SLIDE 26

Results

  • The error of deep learning depth is ~50% lower than multi-view one.
  • Our fusion reduces the error ~10% over the deep learning results.
  • The scale invariant metric shows that our fusion fixes the structure.
  • Deep depth generalizes well (Eigen15 was trained on NYU but is accurate
  • n TUM)
slide-27
SLIDE 27

Conclusions (no free lunch!)

 Point-based features (low-level)  High accuracy iff ↑texture and ↑parallax.  Superpixels (mid-level)  High accuracy iff ↓texture and ↑parallax.  Data-driven primitives (mid-level)  Fair accuracy for → ↑ texture and ↓parallax.  Not fully dense.  Layout (high-level)  Fair accuracy even for ↓texture and ↓parallax.  Assumes a predetermined scene shape.  Deep learning (mid/high-level)  Fair accuracy even for ↓texture and ↓parallax.  Fully dense.  More general.

slide-28
SLIDE 28