Deep learning 8.4. Networks for semantic segmentation Fran cois - PowerPoint PPT Presentation

Deep learning 8.4. Networks for semantic segmentation Fran¸ cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020

The historical approach to image segmentation was to define a measure of similarity between pixels, and to cluster groups of similar pixels. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 1 / 9

The historical approach to image segmentation was to define a measure of similarity between pixels, and to cluster groups of similar pixels. Such approaches account poorly for semantic content. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 1 / 9

The historical approach to image segmentation was to define a measure of similarity between pixels, and to cluster groups of similar pixels. Such approaches account poorly for semantic content. The deep-learning approach re-casts semantic segmentation as pixel classification, and re-uses networks trained for image classification by making them fully convolutional. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 1 / 9

Shelhamer et al. (2016) proposed the FCN (“Fully Convolutional Network”) that uses a pre-trained classification network ( e.g. VGG 16 layers). The fully connected layers are converted to 1 × 1 convolutional filters, and the final one retrained for 21 output channels (VOC 20 classes + “background”). Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 2 / 9

Shelhamer et al. (2016) proposed the FCN (“Fully Convolutional Network”) that uses a pre-trained classification network ( e.g. VGG 16 layers). The fully connected layers are converted to 1 × 1 convolutional filters, and the final one retrained for 21 output channels (VOC 20 classes + “background”). Since VGG16 has 5 max-pooling with 2 × 2 kernels, with proper padding, the output is 1 / 2 5 = 1 / 32 the size of the input. This map is then up-scaled with a de-convolution layer with kernel 64 × 64 and stride 32 × 32 to get a final map of same size as the input image. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 2 / 9

Shelhamer et al. (2016) proposed the FCN (“Fully Convolutional Network”) that uses a pre-trained classification network ( e.g. VGG 16 layers). The fully connected layers are converted to 1 × 1 convolutional filters, and the final one retrained for 21 output channels (VOC 20 classes + “background”). Since VGG16 has 5 max-pooling with 2 × 2 kernels, with proper padding, the output is 1 / 2 5 = 1 / 32 the size of the input. This map is then up-scaled with a de-convolution layer with kernel 64 × 64 and stride 32 × 32 to get a final map of same size as the input image. Training is achieved with full images and pixel-wise cross-entropy, starting with a pre-trained VGG16. All layers are fine-tuned, although fixing the up-scaling de-convolution to bilinear does as well. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 2 / 9

3d 2 × conv/relu 1 + maxpool 2 , 64d 2 × conv/relu 1 + maxpool 4 , 128d 3 × conv/relu VGG without 1 + maxpool 8 , 256d its last layer 3 × conv/relu 1 + maxpool 16 , 512d 3 × conv/relu 1 + maxpool 32 , 512d 2 × fc-conv/relu 1 32 , 4096d Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 3 / 9

3d 2 × conv/relu 1 + maxpool 2 , 64d 2 × conv/relu 1 + maxpool 4 , 128d 3 × conv/relu 1 + maxpool 8 , 256d 3 × conv/relu 1 + maxpool 16 , 512d 3 × conv/relu 1 + maxpool 32 , 512d 2 × fc-conv/relu 1 32 , 4096d fc-conv 1 32 , 21d deconv × 32 21d Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 3 / 9

Although the FCN achieved almost state-of-the-art results when published, its main weakness is the coarseness of the signal from which the final output is produced (1 / 32 of the original resolution). Shelhamer et al. proposed an additional element, that consists of using the same prediction/up-scaling from intermediate layers of the VGG network. Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 4 / 9

3d 2 × conv/relu 1 + maxpool 2 , 64d 2 × conv/relu 1 + maxpool 4 , 128d 3 × conv/relu 1 + maxpool 8 , 256d fc-conv 3 × conv/relu 1 + maxpool 16 , 512d fc-conv 3 × conv/relu 1 + maxpool 32 , 512d 2 × fc-conv/relu 1 32 , 4096d fc-conv 1 32 , 21d deconv 1 1 16 , 21d × 2 16 , 21d 1 + 16 , 21d 1 8 , 21d deconv 1 × 2 8 , 21d 1 + 8 , 21d deconv × 8 21d Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 4 / 9

FCN-8s SDS [14] Ground Truth Image Left column is the best network from Shelhamer et al. (2016). Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 5 / 9

Image Ground Truth Output Input learning. and 6.3 FCNs tation tion. this upper r images r The P achieve Results with a network trained from mask only (Shelhamer et al., 2016). Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 6 / 9

The most sophisticated object detection methods achieve instance segmentation and estimate a segmentation mask per detected object. Mask R-CNN (He et al., 2017) adds a branch to the Faster R-CNN model to estimate a mask for each detected region of interest. class box RoIAlign RoIAlign conv conv conv conv Figure 1. The MaskR-CNN framework for instance segmentation. (He et al., 2017) Fran¸ cois Fleuret Deep learning / 8.4. Networks for semantic segmentation 7 / 9

Deep learning 8.4. Networks for semantic segmentation Fran cois - PowerPoint PPT Presentation

Deep learning 8.4. Networks for semantic segmentation Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The historical approach to image segmentation was to define a measure of similarity between pixels, and to cluster groups of

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

Learning Deep Structured Models for Semantic Segmentation Guosheng Lin Semantic Segmentation

An Overview of Semantic Image Segmentation with Deep Learning Simone Bonechi Outline

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic segmentation Image classification Object detection Semantic segmentation Evolution

Image Segmentation Machine Learning Study Group Presented by Yaochen Xie Jan 25, 2018 Outline

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

SEMANTIC IMAGE SEGMENTATION WITH DEEP CONVOLUTIONAL NETS AND FULLY CONNECTED CRFS Paper by Chen,

Budget-aware Semi-Supervised Semantic and Instance Segmentation Miriam Bellver, Amaia Salvador,

Minor target countries

The K.U.Leuven CHR System: Implementation and Application Tom Schrijvers, Bart Demoen {

INTRO TO OOP FOR STREAMS AND DATA SCIENCE FILES PROF. JOHN GAUCH OVERVIEW OVERVIEW OVERVIEW

Semantics of Caching with SPOCA - A Stateless, Proportional, Optimally-Consistent Addressing

Pr ogr amme r 's Doze n T hir te e n R e c omme ndations for R e vie wing, R R e fac

Recurrent Concept Drift in Data Streams YUN SING KOH ykoh@cs.auckland.ac.nz

Background The many dimensions of searching and indexing video collections hard tasks:

GPT3 - AtishyaJain Thecontent of this presentation has beensourced fromvarious youtube videos