Recent Progress in Object Detection Jiaqi Wang Multimedia Laboratory The Chinese University of Hong Kong
Task definition Image classification Object detection Semantic segmentation Instance segmentation
Task definition
Progress Cascade R-CNN Mask R-CNN Fast R-CNN Relation Network Faster R-CNN R-FCN FPN R-CNN SNIP SSD RetinaNet CornerNet MultiBox YOLO YOLO v2 2014 2015 2016 2017 2018 recent
General pipeline Region proposal Two-stage Detector sliding window dense proposals anchors cls. & reg. RoI feature RoI image backbone neck task head extractor features Feature generation Region recognition
General pipeline Single-stage Detector sliding window anchors dense image backbone neck cls. & reg. Feature generation
Faster R-CNN • Region Proposal Network (RPN) • Training pipeline
Faster R-CNN • RPN
Faster R-CNN Training pipeline • Joint training: multi-task preferred RPN proposals No gradient Fast R- backbone CNN head
Feature Pyramid Network (FPN) • Top-down pathway • Multi-level prediction
Feature Pyramid Network (FPN)
Feature Pyramid Network (FPN)
Mask R-CNN • RoIAlign • Mask branch
Mask R-CNN RoI Pooling RoI Align
Mask R-CNN Mask branch
Cascade R-CNN • Cascade architecture • Training distribution
Cascade R-CNN Cascade architecture Faster R-CNN Cascade R-CNN
Cascade R-CNN Training distribution Regressor Detector
Cascade R-CNN Training distribution
RetinaNet • FPN • Focal Loss
RetinaNet FPN heavier head than SSD / Faster R-CNN
RetinaNet Focal Loss • Problem: class imbalance • inefficient training • loss is overwhelmed by negative samples Model Solution Two-stage detectors 1) proposal 2) mini-batch sampling SSD Hard negative mining RetinaNet Focal loss
RetinaNet Focal Loss • Solution: high confidence -> small loss
COCO Challenge 2018 Comparison of our approach with 2017 winning entries on COCO test-dev. 0.5 0.57 0.56 0.49 0.56 0.49 0.55 0.48 0.541 0.474 0.54 0.467 0.47 MASK AP BOX AP 0.526 0.53 0.46 0.52 0.45 0.51 0.505 0.44 0.44 0.5 0.43 0.49 0.42 0.48 2017 winner (single model) 2017 winner 2017 winner (single model) 2017 winner Single model Final results Single model Final results
COCO Challenge 2018 1. We developed a hybrid task cascade framework for detection and segmentation. Detection & Segmentation
COCO Challenge 2018 1. We developed a hybrid task cascade framework for detection and segmentation. 2. We proposed a feature guided anchoring scheme to improve the average recall (AR) of RPN by 10 points. Detection & Proposal Segmentation
COCO Challenge 2018 1. We developed a hybrid task cascade framework for detection and segmentation. 2. We proposed a feature guided anchoring scheme to improve the average recall (AR) of RPN by 10 points. 3. We designed a new backbone FishNet. Detection & Backbone Proposal Segmentation
COCO Challenge 2018 Hybrid Task Cascade (HTC) • Cascade Mask R-CNN (Cascade R-CNN + Mask R-CNN) RPN M1 B1 M2 B2 M3 B3 pool pool pool F Problem: Two branches at each stage are executed in parallel, without interaction.
COCO Challenge 2018 Hybrid Task Cascade (HTC) • Interleaved execution M1 M2 M3 RPN B1 B2 B3 pool pool pool pool F Problem: N o direct information flow between mask branches at different stages .
COCO Challenge 2018 Hybrid Task Cascade (HTC) • Mask Information Flow M1 M2 M3 RPN B1 B2 B3 pool pool pool pool F Problem: Spatial context is not much explored.
COCO Challenge 2018 Hybrid Task Cascade (HTC) • Spatial context S M1 M2 M3 RPN B1 B2 B3 pool pool pool pool F
COCO Challenge 2018 Hybrid Task Cascade (HTC)
COCO Challenge 2018 Guided anchoring Region proposal sliding window dense learnable proposals anchors cls. & reg. RoI feature RoI image backbone neck task head extractor features Feature generation Region recognition
COCO Challenge 2018 Guided anchoring • Our goal • Sparse • Arbitrary shape • General rules for anchor design • Alignment • Consistency
COCO Challenge 2018 Guided anchoring Location prediction
COCO Challenge 2018 Guided anchoring Shape prediction
COCO Challenge 2018 Guided anchoring
COCO Challenge 2018 Guided anchoring Feature adaption
COCO Challenge 2018 Guided anchoring 0.72 GA-RPN (SENet- 0.7 154) 0.68 GA-RPN (ResNet- 50) 0.66 AR 1000 RPN (SENet-154) 0.64 0.62 RPN (ResNeXt- RPN (ResNet- 101) 152) 0.6 RPN (ResNet-50) 0.58 0 2 4 6 8 10 12 Runtime on TITAN X (fps)
COCO Challenge 2018 Guided anchoring
COCO Challenge 2018 Implementation details 1. Training scales • short edge: random sampled from 400 ~ 1400 • long edge: 1600 2. Test scales • (600, 900), (800, 1200), (1000, 1500), (1200, 1800), (1400, 2100) 3. Pipeline • Joint training • Finetune with GA-RPN proposals • Test with GA-RPN proposals 4. Resources • 32 Tesla V100 GPUs (16GB) for 3 days
COCO Challenge 2018 Implementation details Backbones • SENet-154 ~0.8 points higher • ResNeXt101 (64*4d) • ResNeXt101 (32*8d) comparable • DPN-107 • FishNet
COCO Challenge 2018 Implementation details Other tricks • w/ SoftNMS • w/o OHEM • w/o classwise balance sampling • w/o voting for bbox or mask
COCO Challenge 2018 • With bells and whistles mask AP on test-dev 49.0 (+1.6) 49 47.4 model (+2.1) ensemble 47 45.3 multi-scale & (+1.0) 44.3 flip testing (+1.8) 45 GARPN 42.5 finetune better (+1.8) 43 backbone 40.7 multi-scale (+1.2) training 39.5 41 (+1.1) 38.4 synchronize BN (+1.5) 39 deformable 36.9 conv HTC 37 baseline R-50 Cascade with mask 35
mmdetection • Comprehensive √ √ RPN Fast/Faster R-CNN √ √ Mask R-CNN FPN √ √ Cascade R-CNN RetinaNet More … … • High performance √ Better performance √ Optimized memory consumption √ Faster speed • Handy to develop GitHub: mmdetection √ Written with PyTorch √ Modular design
Hybrid Task Cascade for Instance Segmentation (Accepted to CVPR 2019) https://arxiv.org/abs/1901.07518 Region Proposal by Guided Anchoring (Accepted to CVPR 2019) https://arxiv.org/abs/1901.03278 FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction (Accepted to NIPS 2018) https://arxiv.org/abs/1901.03495
Recommend
More recommend