An Overview of Semantic Image Segmentation with Deep Learning Simone Bonechi
Outline Ø Semantic Image Segmentation Ø Deep Network for Semantic Segmentation FCN (Fully Convolutional Neural Network) • DeconvNet • PSPNet (Pyramid Scene Parsing Network) • Work in progress… Ø
Semantic Image Segmentation
Instance-Level Segmentation Ø Its main purpose is to identify objects of the same class and split them into different instances
Results on PascalVoc 2012
Fully Convolutional Neural Network (FCN) Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431-3440).
FCN Overview Ø Tested with AlexNet, VGG and GoogLeNet Ø Reinterpret standard classification convnets as “Fully convolutional” networks (FCN) for semantic segmentation Ø Combine information from different layers for segmentation
Replace FC with Convolutions A classification network Becoming fully convolutional
Upsampling the output
Convolution & Deconvolution Ø Deconvolution Ø Transposed convolution Ø Fractionally strided convolution Ø Backward strided convolution Ø Upconvolution Ø …..
Upsampling the output
FCN Limitations Ø Fixed-size receptive field FCN has fixed-size receptive field; objects substantially larger or • smaller than the receptive field may be fragmented or mislabeled Label map is so small, tend to forget detail structures of object •
FCN skip architecture
FCN Results Ø Results on PascalVOC 2012
DeconvNet Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1520-1528).
Pooling & Unpooling Ø Unpooling Retrieve structure of original activation map • Activation size is preserved, but still sparse •
Convolution & Deconvolution Ø Deconvolution Densify sparse activation map •
Visualization of activations
Results - Comparisons
PSP-net Zhao, Hengshuang, et al. "Pyramid scene parsing network." IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). 2017.
Atrous Convolution Ø Upsample with atrous convolution to compute feature densely
PSPNet Results
Recommend
More recommend