mask r cnn
play

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION - PowerPoint PPT Presentation

Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION Kaiming He Georgia Gkioxari Piotr Dollr Ross Girshick RESEARCH SCIENTIST POSTDOC RESEARCH SCIENTIST RESEARCH SCIENTIST FACEBOOK AI RESEARCH (FAIR) Classic Computer Vision


  1. Mask R-CNN OBJECT INSTANCE SEGMENTATION AND HUMAN POSE ESTIMATION Kaiming He Georgia Gkioxari Piotr Dollár Ross Girshick RESEARCH SCIENTIST POSTDOC RESEARCH SCIENTIST RESEARCH SCIENTIST FACEBOOK AI RESEARCH (FAIR)

  2. Classic Computer Vision Problems Source: PASCAL Dataset Image classification ✓ boat ✓ person

  3. Classic Computer Vision Problems Source: PASCAL Dataset Object detection Image classification ✓ boat ✓ person

  4. Semantic Segmentation person Source: PASCAL Dataset Semantic segmentation (pixel-level classification)

  5. The Instance Segmentation Task Our task Person 4 Person 5 Person 1 person Person 2 Person 3 Source: PASCAL Dataset Semantic segmentation Instance segmentation (pixel-level classification) (pixel-level detection)

  6. Source: COCO Dataset

  7. Source: DAVIS Dataset

  8. Mask R-CNN TALK OUTLINE • Mask R-CNN Object instance segmentation • Human pose estimation • • Role of Caffe2 in our research • Conclusions

  9. Object Detection: R-CNN REGION-BASED CONVOLUTION NEURAL NETWORK Per-region Image Region proposals classification by a CNN (External algorithm) SOURCE: GIRSHICK, DONAHUE, DARRELL, MALIK. RICH FEATURE HIERARCHIES FOR ACCURATE OBJECT DETECTION AND SEMANTIC SEGMENTATION. CVPR 2014

  10. Object Detection: R-CNN REGION-BASED CONVOLUTION NEURAL NETWORK Class/box Class/box Class/box Class/box CNN CNN Per-region Image Region proposals classification by a CNN CNN CNN (External algorithm) SOURCE: GIRSHICK, DONAHUE, DARRELL, MALIK. RICH FEATURE HIERARCHIES FOR ACCURATE OBJECT DETECTION AND SEMANTIC SEGMENTATION. CVPR 2014

  11. Fast R-CNN Class/box A SHARED CNN BODY Class/box Class/box Shared region-wise subnetwork RoIPool op CNN applied to External region entire image proposal algorithm (same as R-CNN) SOURCE: GIRSHICK. FAST R-CNN. ICCV 2015

  12. Faster R-CNN Class/box REGION PROPOSAL NETWORK Class/box Class/box Shared region-wise subnetwork RoIPool op CNN applied to In-network region entire image proposals from RPN SOURCE: REN, HE, GIRSHICK,SUN. FASTER R-CNN: TOWARDS REAL-TIME OBJECT DETECTION WITH REGION PROPOSAL NETWORKS. NIPS 2015

  13. Mask R-CNN for Instance Segmentation OVERVIEW • An extension of Faster R-CNN • Surprisingly simple • Fast: 200 ms / im • Accurate: state of the art on COCO

  14. Mask R-CNN for Instance Segmentation Faster R-CNN Mask “head” RoIAlign CNN applied to entire image Region-wise segmentation subnetwork

  15. Mask R-CNN results on COCO

  16. Mask R-CNN results on COCO

  17. Mask R-CNN results on COCO

  18. Quantitative Results backbone mask AP 2015 COCO winner MNC ResNet-101-C4 24.6 FCIS w/ OHEM ResNet-101-C5-dilated 29.2 FCIS+++ w/ OHEM ResNet-101-C5-dilated 33.6 2016 COCO winner [seconds per image] Mask R-CNN ResNet-101-C4 33.1 Mask R-CNN ResNet-101-FPN 35.7 Our 200ms version Mask R-CNN ResNeXt-101-FPN 37.1

  19. Mask R-CNN for Human Pose Estimation OVERVIEW • Keypoint = 1-hot mask • Human pose = 17 keypoints • Represent pose as 17 masks

  20. Mask R-CNN results on COCO

  21. Mask R-CNN results on COCO

  22. Mask R-CNN results on COCO

  23. Mask R-CNN results on COCO

  24. Quantitative Results keypoint AP 2016 COCO winner CMU-Pose+++ 61.8 [seconds per image] G-RMI [w/ extra data] 62.4 Mask R-CNN [keypoint-only] 62.7 Mask R-CNN [keypoint & mask] 63.1 Our 200ms version

  25. Caffe2 Accelerated Research

  26. Caffe2 Object Detection Platform RAPID IDEA ITERATION IS A KEY ENABLING FACTOR IN RESEARCH • Early alpha users starting in May 2016 • Ported py-faster-rcnn from Caffe to Caffe2 • Key design choices • Flexible framework for implementing object detection models • Parallelize data loading with forward/backward computation

  27. Caffe2 Object Detection Platform RAPID IDEA ITERATION IS A KEY ENABLING FACTOR IN RESEARCH • Sync SGD with 8 GPUs [Tesla M40] in a BigSur server • Rapid prototyping of Mask R-CNN models in 8-12 hours • SOTA Mask R-CNN models train in 44 hours • Previous systems: ~ 4 days training time [experience from MSRA]

  28. From Research to Mobile with Caffe2

  29. Conclusions • Simple and effective • Fast inference • Box, mask, and pose all-in-one network and method • Caffe2 enables extremely fast prototyping, critical to our success

Recommend


More recommend