LVIS Challenge 2020 A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 R231 / G 36 / B 39 #586a87 #595959 #e72427 Jingru Tan 1 Gang Zhang 2 Hanming Deng 3 Changbao Wang 3 Lewei Lu 3 Quanquan Li 3 Pantone / 129C 1 Tongji University 2 Tsinghua University 3 Sensetime Research C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 1
Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Challenges in LVIS Inconsistent annotations Overview Objects that are hard to represent with boxes 2
Introduction of LVIS Long tail distribution High quality mask annotations Overview 3
Introduction of LVIS Long Tail Distribution Classifier is heavily biased towards head categories. Tail categories are hard to classify 4
Introduction of LVIS High Quality Mask Annotations COCO LVIS Coarse polygon annotations Precise polygon annotations 5
Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Overview 6
Training pipeline Representation Learning Learning universal representation Fine-tuning Balancing classifier (for long-tail distribution) Pay more attention on mask prediction (for high quality mask) 7
Representation Learning Equalization Loss Equalization Loss for Long-tailed Object Recognition, CVPR 2020 Repeat Factor Sampling LVIS: A Dataset for Large Vocabulary Instance Segmentation , CVPR 2019 Mosaic & Rotate & Multi-Scale Training YOLOv4: Optimal Speed and Accuracy of Object Detection, Arxiv preprint 8
Representation Learning Self-Training To further enhance the model performance, pseudo label is inferenced on LVIS and external datasets like Open Images for self-training. For self-training, we ignore all proposals matched with the Pseudo Label pseudo boxes. AP@Seg AP@r AP@ AP@f AP@BBox m c Baseline 26.2 17.1 26.2 30.2 27.0 Open Images 26.8 17.5 27.2 30.5 28.1 Ignore LVIS 26.8 17.0 27.1 30.9 27.8 pseudo 9
Fine-tuning – BBox Head Balanced Classifier Classifier is heavily biased towards head Balanced Group Softmax categories. 10 Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. CVPR 2020
Fine-tuning – Mask Head A Good Box is not a Guarantee of a Good Mask Category Frequency BBox AP Mask AP Area Mask/BBox Mask AP - BBox AP coatrack common 73.6 10.1 0.29 -63.5 tripod frequent 40.9 3.8 0.22 -37.1 necklace frequent 32.8 3.0 0.17 -29.8 ski pole frequent 36.2 7.5 0.15 -28.7 fork frequent 47.2 21.9 0.26 -25.3 windshield wiper frequent 29.6 5.2 0.26 -24.4 giraffe frequent 79.2 60.7 0.33 -18.5 Mask AP - BBox AP < -5.0: 270 of 1203 categories (~25%) Large BBox, Small Mask Mask AP - BBox AP < -5.0 and Area Mask/BBox < 0.5: 168 of 1203 categories 11
Fine-tuning – Mask Head A Good Box is not a Guarantee of a Good Mask The smaller the mask/bbox ratio, the larger the gap between mask and bbox AP. 12
Fine-tuning – Mask Head Some Examples Tripod Giraffe Ski pole 13
Fine-tuning – Mask Head Solution for Categories with Small Ratio New Strategy for Mask Proposal Assignment Feature Pyramid Networks for Object Detection, CVPR 2017 AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 Ratio Assign 35.0 26.4 35.2 38.6 37.6 14
Fine-tuning – Mask Head Solution for Categories with Small Ratio Balanced Mask Loss: foreground/background imbalance AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 +BML 35.0 26.1 35.3 38.5 37.6 15
Fine-tuning – Mask Head w Ratio Assign & Balanced Mask Loss Category Frequency Area AP Gap AP Gap (Ours) Improvemen Mask/BBox t coatrack common 0.29 -63.5 -60.1 +3.4 tripod frequent 0.22 -37.1 -33.1 +4.0 necklace frequent 0.17 -29.8 -27.7 +2.1 ski pole frequent 0.15 -28.7 -24.5 +4.2 fork frequent 0.26 -25.3 -21.3 +4.0 windshield frequent 0.26 -24.4 -21.1 +3.3 wiper giraffe frequent 0.33 -18.5 -14.4 +4.1 However, it is still an open problem. Leave the further research in the future. 16
Fine-tuning – Mask Head Predicting High Quality Mask AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 + Ratio Assign 35.0 26.2 35.2 38.5 37.6 + Balanced Mask Loss 35.2 26.0 35.4 38.9 37.6 + Boundary 35.6 26.9 35.6 39.3 37.6 Supervision* + 7 Convs for Mask 35.8 26.8 35.9 39.6 37.6 Head + Deformable RoI 36.1 28.8 35.8 39.8 38.3 Pooling Boundary-preserving Mask R-CNN, ECCV 2020 17
Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Overview 18
Our Results Baseline 19.2 baseline 19
Our Results Data Augmentation (Mosaic, Rotate) 20.3 19.2 data augmentation baseline 20
Our Results Equalization Loss 22.4 20.3 19.2 EQL data augmentation baseline baseline 21
Our Results Repeat Factor Sampling 26.2 22.4 RFS 20.3 19.2 EQL data augmentation baseline 22
Our Results HTC w/o Semantic Branch 28.8 26.2 HTC 22.4 RFS 20.3 19.2 EQL data augmentation baseline baseline 23
Our Results ResNeSt101 + DCN + 400-1400 Multi-Scale training 32.0 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data augmentation baseline 24
Our Results Some Tricks Make sampling probability in mosaic align with RFS Make rotated boxes align with rotated masks 33.2 32.0 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 25
Our Results Self-Training 33.7 33.2 32.0 self training 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 26
Our Results Mask Scoring + Pseudo Ignore + ResNeSt269 36.5 33.7 33.2 mask scoring + 32.0 pseudo ignore + self training 28.8 tricks S269 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 27
Our Results Balanced Group Softmax 37.6 36.5 balanced group softmax 33.7 33.2 mask scoring + 32.0 self training pseudo ignore + tricks S269 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 28
Our Results High Quality Mask 38.8 37.6 36.5 high quality mask 33.7 33.2 balanced group softmax misc 32.0 self training tricks 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation misc: mask scoring, pseudo ignore, ResNeSt269 29
Our Results Testing Time Augmentation 41.5 38.8 TTA 37.6 36.5 high quality mask 33.7 33.2 balanced group softmax misc 32.0 self training 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation TTA: (1) multi-scale testing (2) scale-aware inference (3) revised Softnms 30
Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Challenges in LVIS Inconsistent annotations Overview Objects that are hard to represent with boxes 31
Challenges in LVIS Not-well Boxable Objects Fire Hose (Mask AP 3.9) Hose (Mask AP 6.5) 32
Challenges in LVIS Categories that are Hard to Detect Hook (Mask AP 7.3) Stirrup (Mask AP 1.2) 33
Challenges in LVIS Inconsistent Annotations Crib (Mask AP - BBox AP = -51.6) 34
LVIS Challenge 2020 Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 Thank you R231 / G 36 / B 39 #586a87 #595959 #e72427 Pantone / 129C C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 35
Recommend
More recommend