sem semanti tic c segm segmen enta tati tion on
play

Sem Semanti tic c segm segmen enta tati tion on CV3DST | - PowerPoint PPT Presentation

Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taix 1 Ta Task d defin init itio ion: s semantic ic s segm gmentatio ion Classify the main object in the image. CAT , GRASS, TREE, SKY No objects, just


  1. Sem Semanti tic c segm segmen enta tati tion on CV3DST | Prof. Leal-Taixé 1

  2. Ta Task d defin init itio ion: s semantic ic s segm gmentatio ion Classify the main object in the image. CAT , GRASS, TREE, SKY No objects, just classify each pixel. CV3DST | Prof. Leal-Taixé 2

  3. Se Semantic ic Se Segmentatio ion - Every label in the image needs to be labelled with a category label. - Do not differentiate between the instances (see how we do not differentiate between pixels coming from different cows). CV3DST | Prof. Leal-Taixé 3

  4. Fully lly Convolu lutional l Netw Networks CV3DST | Prof. Leal-Taixé 9

  5. Fully convolutio ional neural networks • A FCN is able to deal with any input/output size Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016 CV3DST | Prof. Leal-Taixé 10

  6. Fully convolutio ional neural networks 1. Replace FC layers with convolutional layers. 2. Convert the last layer output to the original resolution. 3. Do softmax-cross entropy between the pixelwise predictions and segmentaion ground truth. 4. Backprop and SGD Convolutional layers CV3DST | Prof. Leal-Taixé 11

  7. “Co Convolutio ionaliz izatio ion” 1x1 Convolutions! CV3DST | Prof. Leal-Taixé 12

  8. “Co Convo volutionaliza zation” See a more detailed explanation in this quora answer. CV3DST | Prof. Leal-Taixé 13

  9. Se Semanti ntic c Se Segmenta ntati tion n (FCN) Fully Convolutional Networks for Semantic Segmentation • How do we upsample? Long, Shelhamer, Darrell - Fully Convolutional Networks for Semantic Segmentation, CVPR 2015, PAMI 2016 CV3DST | Prof. Leal-Taixé 14

  10. Network's archit itecture Predict the segmentation mask from high level features CV3DST | Prof. Leal-Taixé 15

  11. Network's archit itecture Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features CV3DST | Prof. Leal-Taixé 16

  12. Network's archit itecture Predict the segmentation mask from high level features Predict the segmentation mask from mid-level features Predict the segmentation mask from low-level features CV3DST | Prof. Leal-Taixé 17

  13. Network's archit itecture Hierarchical training where the network is initially trained only based on high level features and then finetuned based on middle and low-level features. CV3DST | Prof. Leal-Taixé 18

  14. Network's archit itecture This is important because it allows the network to also learn the mid and low-level details of the image, in addition to high level ones. CV3DST | Prof. Leal-Taixé 19

  15. Qualit itativ ive results Good Better Best CV3DST | Prof. Leal-Taixé 20

  16. Qualit itativ ive results SDS is an R-CNN-based method, i.e., it uses object proposals. In general, FCN outperforms significantly (both qualitatively and quantitatively) pre-deep learning and quasi-deep learning methods and is recognized as the AlexNet of semantic segmentation. CV3DST | Prof. Leal-Taixé 21

  17. Au Autoenc ncoder-style le ar archit hitecture CV3DST | Prof. Leal-Taixé 22

  18. Se SegNet • Step-wise upsampling Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 23

  19. Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 24

  20. Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 25

  21. Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters • The convolutional filters in the decoder are learned using backprop and their goal is to refine the upsampling Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 26

  22. Tr Trans nsposed co convo volu luti tion • Transposed convolution Output 5x5 - Unpooling - Convolution filter (learned) - Also called up-convolution (never deconvolution) Input 3x3 CV3DST | Prof. Leal-Taixé 27

  23. Se SegNet • Enc Encoder : normal convolutional filters + pooling • De Decoder : Upsampling + convolutional filters ax layer: The output of the soft-max classifier is • Softmax a K channel image of probabilities where K is the number of classes. Badrinarayanan et al. „SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation“. TPAMI 2016 CV3DST | Prof. Leal-Taixé 28

  24. CV3DST | Prof. Leal-Taixé Upsampli ling 29

  25. Ty Types of upsa upsampl plings gs • 1. Interpolation ? CV3DST | Prof. Leal-Taixé 30

  26. Ty Types of upsa upsampl plings gs • 1. Interpolation Original image Nearest neighbor interpolation Bilinear interpolation Bicubic interpolation Image: Michael Guerzhoy CV3DST | Prof. Leal-Taixé 31

  27. Ty Types of upsa upsampl plings gs • 1. Interpolation Few artifacts CV3DST | Prof. Leal-Taixé 32

  28. Ty Types of upsa upsampl plings gs • 2. Fixed unpooling efficient + CONVS A. Dosovitskiy, “Learning to Generate Chairs, Tables and Cars with Convolutional Networks“. TPAMI 2017 CV3DST | Prof. Leal-Taixé 33

  29. Ty Types of upsa upsampl plings gs • 3. Unpooling: “à la DeconvNet” Keep the locations where the max came from Zeiler and Fergus. „Visualizing and understanding convolutional neural networks“. ECCV 2014 CV3DST | Prof. Leal-Taixé 34

  30. Ty Types of upsa upsampl plings gs • 3. Unpooling: “à la DeconvNet” Keep the details of the structures CV3DST | Prof. Leal-Taixé 35

  31. Sk Skip p con connecti ection ons s (U (U-Net) Net) CV3DST | Prof. Leal-Taixé 36

  32. Ski Skip Conne nnecti ctions ns • U-Net Pass the low- level information High-level information Recall ResNet O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015 CV3DST | Prof. Leal-Taixé 37

  33. Ski Skip Conne nnecti ctions ns • U-Net: zoom in append O. Ronneberger et al. “U-Net: Convolutional Networks for Biomedical Image Segmentation”. MICCAI 2015 CV3DST | Prof. Leal-Taixé 38

  34. Ski Skip Conne nnecti ctions ns • Concatenation connections C. Hazirbas et al. “Deep depth from focus”. ACCV 2018 CV3DST | Prof. Leal-Taixé 39

  35. DeepL DeepLab CV3DST | Prof. Leal-Taixé 41

  36. Deep DeepLab ab CV3DST | Prof. Leal-Taixé 42

  37. Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges • Reduced feature resolution – Proposed solution: Atrous convolutions • Objects exist at multiple scales – Proposed solution: Pyramid pooling, as in detection. • Poor localization of the edges – Proposed solution: Refinement with Conditional Random Field (CRF) CV3DST | Prof. Leal-Taixé 43

  38. Se Semant ntic Se Segm gment ntation: n: 3 cha hallenge nges • Reduced feature resolution – Proposed solution: Atrous convolutions • Objects exist at multiple scales – Proposed solution: Pyramid pooling, as in detection. • Poor localization of the edges – Proposed solution: Refinement with Conditional Random Field (CRF) CV3DST | Prof. Leal-Taixé 44

  39. Wish: no Wi o redu educed ed feat eature e res esol olution on conv conv conv conv pixels in pixels out width x height x RGB width x height x classes Just convs & activations Super expensive! Fully Convolutional Network

  40. Al Alternative: Dilated (at atrous) ) con onvol olution ions Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 46

  41. Al Alternative: Dilated (at atrous) ) con onvol olution ions Sparse feature extraction with standard convolution on a low resolution input feature map. Dense feature extraction with atrous convolution with rate r=2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 47

  42. Dilated Di ed (at atrous) ) con onvol olution ions 1D (a) Sparse feature extraction with standard convolution on a low resolution input feature map. (b) Dense feature extraction with atrous convolution with rate r = 2, applied on a high resolution input feature map. CV3DST | Prof. Leal-Taixé 48

  43. Di Dilated ed (at atrous) ) co convo nvolutions ns in n 2D Output An analogy Standard for dilated convolution conv is a conv has dilation 1 filter with holes cla lass ss to torch ch.n .nn.Co Conv2d ( in in_channels , , out_ch out channels els , , ker kernel_ el_si size , , st stride= e=1 , , pa paddin ing=0 , , di dilat ation= n=2 ) cla lass ss to torch ch.n .nn.Co ConvTran anspose2d ( in in_channels , , out out_ch channels els , , ker kernel_ el_si size , , Input stride= st e=1 , , pa paddin ing=0 , , di dilat ation= n=2 ) CV3DST | Prof. Leal-Taixé 49

Recommend


More recommend