Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taixé 1
Ta Task d defin init itio ion: s semantic ic s segm gmentatio ion Classify the main object in the image. CAT , GRASS, TREE, SKY No objects, just classify each pixel. CV3DST | Prof. Leal-Taixé 2
Se Semantic ic Se Segmentatio ion - Every label in the image needs to be labelled with a category label. - Do not differentiate between the instances (see how we do not differentiate between pixels coming from different cows). CV3DST | Prof. Leal-Taixé 3
Fully lly Convolu lutional l Netw Networks CV3DST | Prof. Leal-Taixé 9
Fully convolutio ional neural networks • A FCN is able to deal with any input/output size Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016 CV3DST | Prof. Leal-Taixé 10
Fully convolutio ional neural networks 1. Replace FC layers with convolutional layers. 2. Convert the last layer output to the original resolution. 3. Do softmax-cross entropy between the pixelwise predictions and segmentaion ground truth. 4. Backprop and SGD Convolutional layers CV3DST | Prof. Leal-Taixé 11
“Co Convolutio ionaliz izatio ion” 1x1 Convolutions! CV3DST | Prof. Leal-Taixé 12
“Co Convo volutionaliza zation” See a more detailed explanation in this quora answer. CV3DST | Prof. Leal-Taixé 13
Se Semanti ntic c Se Segmenta ntati tion n (FCN) Fully Convolutional Networks for Semantic Segmentation • How do we upsample? Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016 CV3DST | Prof. Leal-Taixé 14
Network's archit itecture Predict the segmentation mask from high level features CV3DST | Prof. Leal-Taixé 15
Network's archit itecture Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features CV3DST | Prof. Leal-Taixé 16
Network's archit itecture Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features Predict the segmentation mask from low-level features CV3DST | Prof. Leal-Taixé 17
Network's archit itecture Hierarchical training where the network is initially trained only based on high level features and then finetuned based on middle and low-level features. CV3DST | Prof. Leal-Taixé 18
Network's archit itecture This is important because it allows the network to also learn the mid and low-level details of the image, in addition to high level ones. CV3DST | Prof. Leal-Taixé 19
Qualit itativ ive results Good Better Best CV3DST | Prof. Leal-Taixé 20
Qualit itativ ive results SDS is an R-CNN-based method, i.e., it uses object proposals. In general, FCN outperforms significantly (both qualitatively and quantitatively) pre-deep learning and quasi-deep learning methods and is recognized as the AlexNet of semantic segmentation. CV3DST | Prof. Leal-Taixé 21
Au Autoenc ncoder-style le ar archit hitecture CV3DST | Prof. Leal-Taixé 22
Se SegNet • Step-wise upsampling Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 23
Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 24
Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 25
Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters • The convolutional filters in the decoder are learned using backprop and their goal is to refine the upsampling Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 26
Tr Trans nsposed co convo volu luti tion • Transposed convolution Output 5x5 - Unpooling - Convolution filter (learned) - Also called up-convolution (never deconvolution) Input 3x3 CV3DST | Prof. Leal-Taixé 27
Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters ax layer: The output of the soft-max classifier is • Softmax a K channel image of probabilities where K is the number of classes. Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 28
CV3DST | Prof. Leal-Taixé Upsampli ling 29
Ty Types of upsa upsampl plings gs • 1. Interpolation ? CV3DST | Prof. Leal-Taixé 30
Ty Types of upsa upsampl plings gs • 1. Interpolation Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation Image: Michael Guerzhoy CV3DST | Prof. Leal-Taixé 31
Ty Types of upsa upsampl plings gs • 1. Interpolation Few artifacts CV3DST | Prof. Leal-Taixé 32
Ty Types of upsa upsampl plings gs • 2. Fixed unpooling efficient + CONVS A. Dosovitskiy, “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“. TPAMI 2017 CV3DST | Prof. Leal-Taixé 33
Ty Types of upsa upsampl plings gs • 3. Unpooling: “à la DeconvNet” Keep the locations where the max came from Zeiler and Fergus. „Visualizing and understanding convolutional neural networks“. ECCV 2014 CV3DST | Prof. Leal-Taixé 34
Ty Types of upsa upsampl plings gs • 3. Unpooling: “à la DeconvNet” Keep the details of the structures CV3DST | Prof. Leal-Taixé 35
Sk Skip p con connecti ection ons s (U (U-Net) Net) CV3DST | Prof. Leal-Taixé 36
Ski Skip Conne nnecti ctions ns • U-Net Pass the low- level information High-level information Recall ResNet O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015 CV3DST | Prof. Leal-Taixé 37
Ski Skip Conne nnecti ctions ns • U-Net: zoom in append O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015 CV3DST | Prof. Leal-Taixé 38
Ski Skip Conne nnecti ctions ns • Concatenation connections C. Hazirbas et al. “Deep depth from focus”. ACCV 2018 CV3DST | Prof. Leal-Taixé 39
DeepL DeepLab CV3DST | Prof. Leal-Taixé 41
Deep DeepLab ab CV3DST | Prof. Leal-Taixé 42
Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges • Reduced feature resolution – Proposed solution: Atrous convolutions • Objects exist at multiple scales – Proposed solution: Pyramid pooling, as in detection. • Poor localization of the edges – Proposed solution: Refinement with Conditional Random Field (CRF) CV3DST | Prof. Leal-Taixé 43
Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges • Reduced feature resolution – Proposed solution: Atrous convolutions • Objects exist at multiple scales – Proposed solution: Pyramid pooling, as in detection. • Poor localization of the edges – Proposed solution: Refinement with Conditional Random Field (CRF) CV3DST | Prof. Leal-Taixé 44
Wish: no Wi o redu educed ed feat eature e res esol olution on conv conv conv conv pixels in pixels out width x height x RGB width x height x classes Just convs & activations Super expensive! Fully Convolutional Network
Al Alternative: Dilated (at atrous) ) con onvol olution ions Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 46
Al Alternative: Dilated (at atrous) ) con onvol olution ions Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r=2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 47
Dilated Di ed (at atrous) ) con onvol olution ions 1D (a) Sparse feature extraction with standard convolution on a low resolution input feature map. (b) Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 48
Di Dilated ed (at atrous) ) co convo nvolutions ns in n 2D Output An analogy Standard for dilated convolution conv is a conv has dilation 1 filter with holes cla lass ss to torch ch.n .nn.Co Conv2d ( in in_channels , , out_ch out channels els , , ker kernel_ el_si size , , st stride= e=1 , , pa paddin ing=0 , , di dilat ation= n=2 ) cla lass ss to torch ch.n .nn.Co ConvTran anspose2d ( in in_channels , , out out_ch channels els , , ker kernel_ el_si size , , Input stride= st e=1 , , pa paddin ing=0 , , di dilat ation= n=2 ) CV3DST | Prof. Leal-Taixé 49
Recommend
More recommend