deep watershed transform for instance segmentation
play

Deep Watershed Transform for Instance Segmentation Min Bai & - PowerPoint PPT Presentation

Deep Watershed Transform for Instance Segmentation Min Bai & Raquel Urtasun To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017 Semantic Segmentation Input: RGB Image Output at each pixel: Semantic label


  1. Deep Watershed Transform for Instance Segmentation Min Bai & Raquel Urtasun To appear at IEEE CVPR 2017 in Hawaii Presented at NVIDIA GTC 2017

  2. Semantic Segmentation ● Input: RGB Image ● Output at each pixel: ○ Semantic label

  3. Instance Segmentation ● Input: RGB Image ● Output at each pixel: ○ Semantic label ○ Instance label ■ Same for each px in object ■ Different among objects ○ Difficulty: How to phrase the problem?

  4. Applications ● Object tracking Image credit: Davi Frossard

  5. Applications ● Interacting with the environment Image credit: http://www.rethinkrobotics.com/build-a-bot/

  6. Applications ● Useful information for other algorithms such as optical flow, etc Image credit: Shenlong Wang

  7. Semantic Segmentation ● Semantic segmentation is a well studied problem ○ Our instance segmentation method leverages an existing technique ○ H. Zhao et al, Pyramid Scene Parsing Network , https://arxiv.org/abs/1612.01105 Image credit: H. Zhao et al.

  8. Watershed Transform ● Classical image segmentation technique Image (left) credit: Adrian Fisher

  9. Scalar Field and Gradient ● Scalar field: single number at each pixel ● Gradient: vector at each pixel, pointing toward direction of greatest ascent Image source: Wikipedia: byVivekj78 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15346899

  10. Overview of Approach Input Image Gradient of Energy Landscape Energy Landscape Predicted Instances Semantic Segmentation

  11. Overview of Approach Input Image Gradient of Energy Landscape Energy Landscape Predicted Instances Semantic Segmentation

  12. Why Predict Direction First? Input Image Direction of Gradient Energy Landscape Much sharper difference in the direction label at the boundary!

  13. Overall Network

  14. Direction Prediction Network Input Image Ground Truth Directions Predicted Directions Semantic Segmentation

  15. Energy Prediction Network Ground Truth Energy Ground Truth Instances Predicted Energy Predicted Instances

  16. Training and Inference ● Pre-train both networks ● End-to-end fine-tuning ● Network trained on NVIDIA DGX-1 ○ Approximately 25 hours total for training on one GP100 core ○ ~0.1s per image for forward pass ○ Thank you NVIDIA for the generous gift! Image source: www.nvidia.com

  17. Cityscapes Dataset ● 2975 training / 500 validation / 1525 testing images ● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle

  18. Cityscapes Dataset ● 2975 training / 500 validation / 1525 testing images ● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle

  19. Cityscapes Instance Segmentation Leaderboard AP* AP* @ 50% AP* @ 50m AP* @ 100m van den Brand et al. 2.3% 3.7% 3.9% 4.9% Cordts et al. 4.6% 12.9% 7.7% 10.3% Uhrig et al. 8.9% 21.1% 15.3% 16.7% Ours 19.4% 35.3% 31.4% 36.8% * Average Precision (AP): higher is better Recently, new approaches have achieved even higher performance.

  20. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  21. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  22. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  23. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  24. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  25. Sample Output Input RGB Direction Prediction Energy Prediction Semantic Segmentation Predicted Instances Ground Truth Instances

  26. Preliminary TorontoCity Aerial Instance Segmentation Semantic Segmentation (ResNet) Input RGB Predicted Building Instances

  27. Preliminary TorontoCity Aerial Instance Segmentation Weighted AP* Recall* @ Precision* @ Coverage* 50% 50% FCN-8 41.92% 11.37% 21.50% 36.00% ResNet-56 40.65% 12.13% 18.90% 45.36% Ours 56.22% 21.22% 67.16% 63.67% * higher is better

  28. In Summary... ● Simple technique for instance segmentation ● Encodes object instances as energy map ● Predicts gradient direction as intermediate task for better supervision

Recommend


More recommend