Region Merging Driven by Deep Learning for RGB-D Segmentation and - PowerPoint PPT Presentation

ICDSC 2019 Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling U. Michieli, M. Camporese, A. Agiollo, G. Pagnutti, P. Zanuttigh September 9 th , 2019

2 Outline ¡ Semantic Segmentation ¡ Proposed Framework ¡ Pre-processing ¡ Over-segmentation and Classification ¡ Merging Phase ¡ Results ¡ Conclusions and Future Work

3 Semantic Segmentation wall wall objects objects furniture furniture floor ¡ Segmentation + labeling (pixel-wise classification) ¡ Deep learning and consumer depth sensors ¡ Very useful for free navigation systems to explore the surroundings

4 Semantic Segmentation

5 Proposed Framework

6 Proposed Framework AIM: propose CNN for region merging and refine boundaries of shapes Use normalized cuts spectral clustering extended for RGBD à but bias toward region of similar sizes Then 2 steps procedure: ¡ Initial over-segmentation to properly separate objects ¡ Region merging procedure to avoid over-segmentation Framework derived from [1] but much faster and simpler [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

7 Framework of [1] 320x240x6 (x, y, z) Geometry 1/ σ g point set vectors Depth Normalized cuts data spectral clustering 160x120x6 Normals Orientation Segment 1/ σ n computation vectors descriptors Convolutional Color Neural Network data RGB to CIELab Color CONs: 1/ σ c (CNN) conversion vectors Pre-processing Over-segmentation and classification • NURBS fitting very slow NURBS • Many hand-tuned Segment 1 fitting Compute Surface No Select two Sort and discard Discard similarity of fitting accuracy segments below similarity thresholds (on depth, union adjacent improved? to be joined threshold segments NURBS Segment 2 fitting Yes color, normals, NURBS Keep union fitting) Merge phase [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

8 Proposed Framework PROs: • Much faster • Fewer thresholds • Same accuracy

9 Proposed Framework - Preprocessing ¡ 3 channels for 3D location ¡ 3 channels for surface normals ¡ 3 channels for color representation à CIELab for perceptual uniformity ¡ Normalization to achieve consistent representation across the 3 domains.

10 Proposed Framework – Oversegmentation ¡ Over-segmentation with normalized cuts spectral clustering with Nystrom acceleration: 9D input ¡ CNN for the semantic labeling of each segment and for guiding the region merging process ¡ 9 conv layers ¡ 15 classes ¡ very simple

� 11 Proposed Framework – Region Merging ¡ Compute adjacency map of the segments ¡ Compute similarity between adjacent segment descriptors with Bhattacharyya coefficient: ' 𝑡 ' 𝑐 ",$ = ∑ ' 𝑡 " $ 𝑢 : class scores 𝑡 " : descriptors (~PDFs) ¡ Sort list on the basis of 𝑐 ",$

12 Proposed Framework Iterative merging procedure Ø Select segments with 𝑐 ",$ > 𝑈 -". Ø CNN classifier to decide whether the two segments will be joined or not • If merged: new segment of the union is created and list updated • If not merged: remove segments from the list

.. .. . . training time : about 11 hours on a NVIDIA Titan X GPU with 𝑚𝑠 = 10 34 , regularization constant = 10 35 , 𝑈 -". = 0.8 training : 50 epochs, batch size of 32 samples, CE & L2 regularization losses, Adam input : 2 outputs of softmax layer of semantic CNN (15 channels each candidate) CNN for classification (6 conv. layers, symm. padding, 2x2 maxpool, ReLU) CNN for Region Merging - PDFs PDFs 560x425x30 (560x425x6) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 280x212x4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 140x106x4 CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 70x53x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 35x26x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 17x13x4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 13

à PDFs richer descriptions, while normals are faster with limited impact on the final accuracy training time : about 3 hours on a NVIDIA Titan X GPU with 𝑚𝑠 = 10 35 , regularization constant = 5 ⋅ 10 3: , 𝑈 -". = 0.75 training : 50 epochs, batch size of 32 samples, CE & L2 regularization losses, Adam input : 2 surface normals of the 2 candidate segments (3 channels each) CNN for classification (6 conv. layers, symm. padding, 2x2 maxpool, ReLU) CNN for Region Merging - Normals normals 560x425x30 (560x425x6) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 280x212x4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 140x106x4 CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 70x53x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 35x26x4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 17x13x4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 14

15 Experimental Results

16 NYUDv2 Dataset [2] 1449 depth maps + color images of indoor scenes with Kinect sensor RGB raw depth GT training set: 795 scenes test set: 654 scenes 894 classes clustered in 15 classes as [3] unknown & unlabeled classes excluded [2] N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. ECCV. Springer. [3] C. Couprie, C. Farabet, L. Najman, and Y. LeCun. 2013. Indoor semantic segmentation using depth information. ICLR.

¡ Randomly select 10 couples of adjacent segments in each image the merging CNN Need a dataset to train Merging CNN – Ground Truth Generation ¡ Assign label 0 otherwise ¡ Assign label 1 if more than 85% of the union of the segments belongs to same object Selection of a in the semantic segmentation ground truth segment adjacent segment Selection of an . . . . . . Ground truth examination label 1 5 6 0 x 4 2 5 x 3 0 ( 5 6 0 x 4 2 5 x 6 ) CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 2 8 0 x 2 1 2 x 4 CONV 4@7x7 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 1 4 0 x 1 0 6 x 4 Region appears to be uniform CONV 4@5x5 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 7 0 x 5 3 x 4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 3 5 x 2 6 x 4 CONV 4@3x3 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 MAXP 2x2 MAXP 2x2 MAXP 2x2 MAXP 2x2 RELU RELU RELU RELU 1 7 x 1 3 x 4 CONV 2@17x13 CONV 4@9x9 CONV 4@9x9 CONV 4@9x9 1x2 ARGMAX Not merged Merged 17

18 Merging CNN – GT Ambiguities ¡ Examples of ambiguities in ground truth: ¡ Inconsistent labeling ¡ Objects not labeled Bed Objects Chair Furniture Ceiling Floor Picture/Deco Sofa Table Wall Windows missing Books Monitor/TV Unknown

19 Merging CNN – Results Predicted: Merge Predicted: Not Merged GT: Merge GT: Not Merged 18 ¡ Good oversegmentation (inter-uniformity)

20 Merging CNN – Results Predicted: Not Merged Predicted: Merge GT: Merge GT: Not Merged 18 ¡ Bad oversegmentation

21 Qualitative Results [1] Color view Semantic CNN Pagnutti et al. [21] Our Approach Ground Truth Bed Objects Chair Furniture Ceiling Floor Picture/Deco Sofa Table Wall Windows Books Monitor/TV Unknown [1] G.Pagnutti, L. Minto, P. Zanuttigh, "Segmentation and Semantic Labeling of RGBD Data with Convolutional Neural Networks and Surface Fitting “, IET Computer Vision, 2017

Region Merging Driven by Deep Learning for RGB-D Segmentation and - PowerPoint PPT Presentation

ICDSC 2019 Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling U. Michieli, M. Camporese, A. Agiollo, G. Pagnutti, P. Zanuttigh September 9 th , 2019 2 Outline Semantic Segmentation Proposed Framework

RGB ar chitect s RGB ar chitect s RGB ar chitect s Concepts behind Blended Learning

RGB-D Mapping Overview CSE 571 Robotics Map RGB-D Mapping `` University of Washington Dieter

Optimal Merging in Quantum k -xor and k -sum Algorithms Mara Naya-Plasencia, Andr

Correcting Image Defects Chaiwoot Boonyasiriwat October 30, 2020 RGB Color Space Most

Track Filtering/Quality/Merging A proposal for data format of track quality and track merging in

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging & Non-Perturbative

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

TULA REGION TULA Moscow REGION Moscow region Kaluga region Tula Novomoskovsk Ryazan

Objects Thinking About Assignment 2 A2 : three color models id1 rgb id1 list RGB : 3

Action Recognition ICIP2019 Tutorial Outline Problem space Datasets RGB RGB-D

RGB YE2011 Corporate Review RGB YE2011 Corporate Review 7 th March 2012 Safe Harbor Statement

rgb@i @iiserpu serpune.a ne.ac.in c.in II IISER ER-Pune Pune Prof. RGB, IISER Pune

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Arquitectura de Software (Estilos Arquitectnicos) Universidad de los Andes Demin Gutierrez

India: Palaces & forts ( part two) 6 Junagarh Fort (Bikaner) 9 Umaid Bhawan Palace

Statistical Learning Theory and Applications 9.520/6.860 in Fall 2016 Class Times:

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 2: Survey of

HUMAN-POWERED DATA MANAGEMENT ! ! Aditya Parameswaran ! ! with H. Garcia-Molina, ! J. Widom, A.

Natural SUSY On Trial: Status of Higgsino Searches at ATLAS Julia Gonski Harvard University 25

A Robust and Efficient Parallel SVD Solver Based on Restarted Lanczos Bidiagonalization Jose E.

Peter Lavender University of Wolverhampton some thoughts on Frank Glendennings legacy

Region Merging Driven by Deep Learning for RGB-D Segmentation and - PowerPoint PPT Presentation

ICDSC 2019 Region Merging Driven by Deep Learning for RGB-D Segmentation and Labeling U. Michieli, M. Camporese, A. Agiollo, G. Pagnutti, P. Zanuttigh September 9 th , 2019 2 Outline Semantic Segmentation Proposed Framework

RGB ar chitect s RGB ar chitect s RGB ar chitect s Concepts behind Blended Learning

RGB-D Mapping Overview CSE 571 Robotics Map RGB-D Mapping `` University of Washington Dieter

Optimal Merging in Quantum k -xor and k -sum Algorithms Mara Naya-Plasencia, Andr

Correcting Image Defects Chaiwoot Boonyasiriwat October 30, 2020 RGB Color Space Most

Track Filtering/Quality/Merging A proposal for data format of track quality and track merging in

Merging DataFrames Merging DataFrames with pandas Population DataFrame In [1]: import pandas as

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging &amp; Non-Perturbative

Comparison Based Merging Upper and Lower bounds EMADS Fall 2003: Comparison Based Merging Page 1

TULA REGION TULA Moscow REGION Moscow region Kaluga region Tula Novomoskovsk Ryazan

Objects Thinking About Assignment 2 A2 : three color models id1 rgb id1 list RGB : 3

Action Recognition ICIP2019 Tutorial Outline Problem space Datasets RGB RGB-D

RGB YE2011 Corporate Review RGB YE2011 Corporate Review 7 th March 2012 Safe Harbor Statement

rgb@i @iiserpu serpune.a ne.ac.in c.in II IISER ER-Pune Pune Prof. RGB, IISER Pune

Human Body Recogni6on and Tracking: Kinect RGB-D Camera How the Kinect RGB-D Camera Works

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Arquitectura de Software (Estilos Arquitectnicos) Universidad de los Andes Demin Gutierrez

India: Palaces &amp; forts ( part two) 6 Junagarh Fort (Bikaner) 9 Umaid Bhawan Palace

Statistical Learning Theory and Applications 9.520/6.860 in Fall 2016 Class Times:

Deconstructing Data Science David Bamman, UC Berkeley Info 290 Lecture 2: Survey of

HUMAN-POWERED DATA MANAGEMENT ! ! Aditya Parameswaran ! ! with H. Garcia-Molina, ! J. Widom, A.

Natural SUSY On Trial: Status of Higgsino Searches at ATLAS Julia Gonski Harvard University 25

A Robust and Efficient Parallel SVD Solver Based on Restarted Lanczos Bidiagonalization Jose E.

Peter Lavender University of Wolverhampton some thoughts on Frank Glendennings legacy

Parton Showers and Matching/Merging Lecture 2 of 2: Matching/Merging & Non-Perturbative

India: Palaces & forts ( part two) 6 Junagarh Fort (Bikaner) 9 Umaid Bhawan Palace