pixel level im image understanding wit ith semantic
play

Pixel-Level Im Image Understanding wit ith Semantic Segmentation - PowerPoint PPT Presentation

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation Hengshuang Zhao The Chinese University of Hong Kong May 29, 2019 Part I: I: Semantic Segmentation Semantic Segmentation background car person


  1. Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation Hengshuang Zhao The Chinese University of Hong Kong May 29, 2019

  2. Part I: I: Semantic Segmentation

  3. Semantic Segmentation background car person horse Original Image Per-Pixel Annotation Images adapted from PASCAL VOC 2012 Images adapted from ADE20K

  4. Fully Convolutional Network FCN [Long et al. 2015]

  5. Conditional Random Field DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]

  6. Encoder-Decoder UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015], SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016], RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]

  7. Atrous Convolution / Dilated Convolution DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]

  8. Context Aggregation Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016] Large Kernel: GCN [Peng et al. 2017]

  9. Neural Architecture Search Search for head: DPC [Chen et al. 2018] Search for backbone: Auto-DeepLab [Liu et al. 2019]

  10. Attention Mechanism Channel reweighting: SENet [Hu et al. 2018], EncNet [Zhang et al. 2018], DFN [Yu et al. 2018] Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018] OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]

  11. Point-wise Spatial Attention Network (PSANet) • Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions • Pooling Operation: Fixed weights at each position with none adaptively manner • Feature Correlation: Relative position information ignored • Point-wise Spatial Attention: • Long-range context aggregation for dense prediction • Bi-direction information propagation • Self-adaptively learned and location-sensitive masks

  12. Point-wise Spatial Attention Network

  13. Point-wise Spatial Attention Network Information collection branch Over-completed Compact Information distribution branch

  14. Point-wise Spatial Attention Network Information collection branch Over-completed Compact feature fusion: local & global Information distribution branch

  15. Attention Mask Generation

  16. Incorporation with FCN

  17. Result on ADE20K and VOC 2012 ADE20K: information aggregation approaches ADE20K: result on val set PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set

  18. Result on Cityscapes result on val set result on test set result on test set (train with fine set) (train with fine+coarse set)

  19. Visual Prediction on ADE20K

  20. Visual Prediction on VOC 2012

  21. Visual Prediction on Cityscapes

  22. Mask Visualization

  23. Part II: II: Panoptic Segmentation

  24. Semantic Segmentation semantic segmentation: instances indistinguishable

  25. Instance Segmentation instance segmentation: stuff unsolved

  26. Panoptic Segmentation panoptic segmentation: stuff and things are solved, instances distinguishable

  27. Heuristic Combination Instance Mask R-CNN [He et al. 2017] Semantic PSPNet [Zhao et al. 2017] redundant computation for independent models

  28. Heuristic Combination Instance Heuristic Mask R-CNN [He et al. 2017] Merge Semantic PSPNet [Zhao et al. 2017] heuristic merge logic is not end-to-end trainable

  29. heuristic combination

  30. our end-to-end output

  31. Unified Panoptic Segmentation Network (UPSNet) Unified Backbone Network Pixel-wise Classification Save Computation! Consistent Estimation!

  32. Semantic & Instance Head Semantic Head: FPN with Deformable Conv Instance Head: Same as Mask-RCNN

  33. Panoptic Head Mask logits 𝑍 resize/pad 𝑗 from Instance head 𝑌mask 𝑗 𝑌thing max Panoptic logits H x W H x W max Thing & Stuff logits from Semantic head 1 Logits for Unknown 𝑂inst 𝑂stuff 𝑌stuff

  34. Performance Comparison Results on COCO (800 x 1300) Results on Cityscapes (1024 x 2048) 42.6 190 59.5 1200 42.4 185 1000 59 42.2 180 800 58.5 42 175 600 58 41.8 170 400 57.5 41.6 165 200 41.4 160 57 0 UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP

  35. Detailed Result result on COCO result on Cityscapes result on internal data run time comparison

  36. Visual Prediction result on COCO result on Cityscapes

  37. Code Resource I. Semantic Segmentation: • Caffe: • https://github.com/hszhao/PSPNet • https://github.com/hszhao/PSANet • https://github.com/hszhao/ICNet • PyTorch: • https://github.com/hszhao/semseg (new) • highly optimized codebase with better reimplementation results II. Panoptic Segmentation: • PyTorch: • https://github.com/uber-research/UPSNet • the first open sourced codebase for unified end-to-end panoptic segmentation

  38. Remain Problem I. Semantic Segmentation: • imbalance classes: long-tail distribution • confusion classes: using human’s confusion matrix (e.g., ade20k) as prior • data augmentation: adaptive augmentation or auto augmentation • hard mining: effective while not elegant • robustness and generalization: one model for different datasets • accuracy and efficiency: can both be achieved? II. Panoptic Segmentation: • introduce parameters into panoptic head (e.g., 3d Conv) • new frameworks with a single panoptic head

  39. Thanks!

Recommend


More recommend