Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation Hengshuang Zhao The Chinese University of Hong Kong May 29, 2019
Part I: I: Semantic Segmentation
Semantic Segmentation background car person horse Original Image Per-Pixel Annotation Images adapted from PASCAL VOC 2012 Images adapted from ADE20K
Fully Convolutional Network FCN [Long et al. 2015]
Conditional Random Field DeepLabV1 [Chen et al. 2015], DPN [Liu et al. 2015], CRF-RNN [Zheng et al. 2015]
Encoder-Decoder UNet [Ronneberger et al. 2015], DeconvNet [Noh et al. 2015], SegNet [Badrinarayanan et al. 2015], LRR [Ghiasi et al. 2016], RefineNet [Lin et al. 2017], FRRN [Pohlen et al. 2017]
Atrous Convolution / Dilated Convolution DeepLabV1 [Chen et al. 2015], Dilation [Fisher et al. 2016]
Context Aggregation Pooling: ParseNet [Liu et al. 2015], PSPNet [Zhao et al. 2017], DeepLabV2 [Chen et al. 2016] Large Kernel: GCN [Peng et al. 2017]
Neural Architecture Search Search for head: DPC [Chen et al. 2018] Search for backbone: Auto-DeepLab [Liu et al. 2019]
Attention Mechanism Channel reweighting: SENet [Hu et al. 2018], EncNet [Zhang et al. 2018], DFN [Yu et al. 2018] Spatial attention (dot product): Transformer [Vaswani et al. 2017], Non-Local-Net [Wang et al. 2018] OCNet [Yuan et al. 2018], DANet [Fu et al. 2018], CCNet [Huang et al. 2018]
Point-wise Spatial Attention Network (PSANet) • Conv & Dilated Conv: Fixed grid, information flow restricted inside local regions • Pooling Operation: Fixed weights at each position with none adaptively manner • Feature Correlation: Relative position information ignored • Point-wise Spatial Attention: • Long-range context aggregation for dense prediction • Bi-direction information propagation • Self-adaptively learned and location-sensitive masks
Point-wise Spatial Attention Network
Point-wise Spatial Attention Network Information collection branch Over-completed Compact Information distribution branch
Point-wise Spatial Attention Network Information collection branch Over-completed Compact feature fusion: local & global Information distribution branch
Attention Mask Generation
Incorporation with FCN
Result on ADE20K and VOC 2012 ADE20K: information aggregation approaches ADE20K: result on val set PSACAL VOC 2012:result on val set PSACAL VOC 2012: result on val set
Result on Cityscapes result on val set result on test set result on test set (train with fine set) (train with fine+coarse set)
Visual Prediction on ADE20K
Visual Prediction on VOC 2012
Visual Prediction on Cityscapes
Mask Visualization
Part II: II: Panoptic Segmentation
Semantic Segmentation semantic segmentation: instances indistinguishable
Instance Segmentation instance segmentation: stuff unsolved
Panoptic Segmentation panoptic segmentation: stuff and things are solved, instances distinguishable
Heuristic Combination Instance Mask R-CNN [He et al. 2017] Semantic PSPNet [Zhao et al. 2017] redundant computation for independent models
Heuristic Combination Instance Heuristic Mask R-CNN [He et al. 2017] Merge Semantic PSPNet [Zhao et al. 2017] heuristic merge logic is not end-to-end trainable
heuristic combination
our end-to-end output
Unified Panoptic Segmentation Network (UPSNet) Unified Backbone Network Pixel-wise Classification Save Computation! Consistent Estimation!
Semantic & Instance Head Semantic Head: FPN with Deformable Conv Instance Head: Same as Mask-RCNN
Panoptic Head Mask logits 𝑍 resize/pad 𝑗 from Instance head 𝑌mask 𝑗 𝑌thing max Panoptic logits H x W H x W max Thing & Stuff logits from Semantic head 1 Logits for Unknown 𝑂inst 𝑂stuff 𝑌stuff
Performance Comparison Results on COCO (800 x 1300) Results on Cityscapes (1024 x 2048) 42.6 190 59.5 1200 42.4 185 1000 59 42.2 180 800 58.5 42 175 600 58 41.8 170 400 57.5 41.6 165 200 41.4 160 57 0 UPSNet MR-CNN-PSP UPSNet MR-CNN-PSP
Detailed Result result on COCO result on Cityscapes result on internal data run time comparison
Visual Prediction result on COCO result on Cityscapes
Code Resource I. Semantic Segmentation: • Caffe: • https://github.com/hszhao/PSPNet • https://github.com/hszhao/PSANet • https://github.com/hszhao/ICNet • PyTorch: • https://github.com/hszhao/semseg (new) • highly optimized codebase with better reimplementation results II. Panoptic Segmentation: • PyTorch: • https://github.com/uber-research/UPSNet • the first open sourced codebase for unified end-to-end panoptic segmentation
Remain Problem I. Semantic Segmentation: • imbalance classes: long-tail distribution • confusion classes: using human’s confusion matrix (e.g., ade20k) as prior • data augmentation: adaptive augmentation or auto augmentation • hard mining: effective while not elegant • robustness and generalization: one model for different datasets • accuracy and efficiency: can both be achieved? II. Panoptic Segmentation: • introduce parameters into panoptic head (e.g., 3d Conv) • new frameworks with a single panoptic head
Thanks!
Recommend
More recommend