A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / - PowerPoint PPT Presentation

LVIS Challenge 2020 A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 R231 / G 36 / B 39 #586a87 #595959 #e72427 Jingru Tan 1 Gang Zhang 2 Hanming Deng 3 Changbao Wang 3 Lewei Lu 3 Quanquan Li 3 Pantone / 129C 1 Tongji University 2 Tsinghua University 3 Sensetime Research C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 1

Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Challenges in LVIS Inconsistent annotations Overview Objects that are hard to represent with boxes 2

Introduction of LVIS Long tail distribution High quality mask annotations Overview 3

Introduction of LVIS Long Tail Distribution Classifier is heavily biased towards head categories. Tail categories are hard to classify 4

Introduction of LVIS High Quality Mask Annotations COCO LVIS Coarse polygon annotations Precise polygon annotations 5

Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Overview 6

Training pipeline Representation Learning Learning universal representation Fine-tuning Balancing classifier (for long-tail distribution) Pay more attention on mask prediction (for high quality mask) 7

Representation Learning Equalization Loss Equalization Loss for Long-tailed Object Recognition, CVPR 2020 Repeat Factor Sampling LVIS: A Dataset for Large Vocabulary Instance Segmentation , CVPR 2019 Mosaic & Rotate & Multi-Scale Training YOLOv4: Optimal Speed and Accuracy of Object Detection, Arxiv preprint 8

Representation Learning Self-Training To further enhance the model performance, pseudo label is inferenced on LVIS and external datasets like Open Images for self-training. For self-training, we ignore all proposals matched with the Pseudo Label pseudo boxes. AP@Seg AP@r AP@ AP@f AP@BBox m c Baseline 26.2 17.1 26.2 30.2 27.0 Open Images 26.8 17.5 27.2 30.5 28.1 Ignore LVIS 26.8 17.0 27.1 30.9 27.8 pseudo 9

Fine-tuning – BBox Head Balanced Classifier Classifier is heavily biased towards head Balanced Group Softmax categories. 10 Overcoming Classifier Imbalance for Long-tail Object Detection with Balanced Group Softmax. CVPR 2020

Fine-tuning – Mask Head A Good Box is not a Guarantee of a Good Mask Category Frequency BBox AP Mask AP Area Mask/BBox Mask AP - BBox AP coatrack common 73.6 10.1 0.29 -63.5 tripod frequent 40.9 3.8 0.22 -37.1 necklace frequent 32.8 3.0 0.17 -29.8 ski pole frequent 36.2 7.5 0.15 -28.7 fork frequent 47.2 21.9 0.26 -25.3 windshield wiper frequent 29.6 5.2 0.26 -24.4 giraffe frequent 79.2 60.7 0.33 -18.5 Mask AP - BBox AP < -5.0: 270 of 1203 categories (~25%) Large BBox, Small Mask Mask AP - BBox AP < -5.0 and Area Mask/BBox < 0.5: 168 of 1203 categories 11

Fine-tuning – Mask Head A Good Box is not a Guarantee of a Good Mask The smaller the mask/bbox ratio, the larger the gap between mask and bbox AP. 12

Fine-tuning – Mask Head Some Examples Tripod Giraffe Ski pole 13

Fine-tuning – Mask Head Solution for Categories with Small Ratio New Strategy for Mask Proposal Assignment Feature Pyramid Networks for Object Detection, CVPR 2017 AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 Ratio Assign 35.0 26.4 35.2 38.6 37.6 14

Fine-tuning – Mask Head Solution for Categories with Small Ratio Balanced Mask Loss: foreground/background imbalance AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 +BML 35.0 26.1 35.3 38.5 37.6 15

Fine-tuning – Mask Head w Ratio Assign & Balanced Mask Loss Category Frequency Area AP Gap AP Gap (Ours) Improvemen Mask/BBox t coatrack common 0.29 -63.5 -60.1 +3.4 tripod frequent 0.22 -37.1 -33.1 +4.0 necklace frequent 0.17 -29.8 -27.7 +2.1 ski pole frequent 0.15 -28.7 -24.5 +4.2 fork frequent 0.26 -25.3 -21.3 +4.0 windshield frequent 0.26 -24.4 -21.1 +3.3 wiper giraffe frequent 0.33 -18.5 -14.4 +4.1 However, it is still an open problem. Leave the further research in the future. 16

Fine-tuning – Mask Head Predicting High Quality Mask AP@Segm AP@r AP@c AP@f AP@BBox Baseline 34.7 26.1 34.9 38.1 37.6 + Ratio Assign 35.0 26.2 35.2 38.5 37.6 + Balanced Mask Loss 35.2 26.0 35.4 38.9 37.6 + Boundary 35.6 26.9 35.6 39.3 37.6 Supervision* + 7 Convs for Mask 35.8 26.8 35.9 39.6 37.6 Head + Deformable RoI 36.1 28.8 35.8 39.8 38.3 Pooling Boundary-preserving Mask R-CNN, ECCV 2020 17

Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Overview 18

Our Results Baseline 19.2 baseline 19

Our Results Data Augmentation (Mosaic, Rotate) 20.3 19.2 data augmentation baseline 20

Our Results Equalization Loss 22.4 20.3 19.2 EQL data augmentation baseline baseline 21

Our Results Repeat Factor Sampling 26.2 22.4 RFS 20.3 19.2 EQL data augmentation baseline 22

Our Results HTC w/o Semantic Branch 28.8 26.2 HTC 22.4 RFS 20.3 19.2 EQL data augmentation baseline baseline 23

Our Results ResNeSt101 + DCN + 400-1400 Multi-Scale training 32.0 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data augmentation baseline 24

Our Results Some Tricks Make sampling probability in mosaic align with RFS Make rotated boxes align with rotated masks 33.2 32.0 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 25

Our Results Self-Training 33.7 33.2 32.0 self training 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 26

Our Results Mask Scoring + Pseudo Ignore + ResNeSt269 36.5 33.7 33.2 mask scoring + 32.0 pseudo ignore + self training 28.8 tricks S269 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 27

Our Results Balanced Group Softmax 37.6 36.5 balanced group softmax 33.7 33.2 mask scoring + 32.0 self training pseudo ignore + tricks S269 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation 28

Our Results High Quality Mask 38.8 37.6 36.5 high quality mask 33.7 33.2 balanced group softmax misc 32.0 self training tricks 28.8 S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation misc: mask scoring, pseudo ignore, ResNeSt269 29

Our Results Testing Time Augmentation 41.5 38.8 TTA 37.6 36.5 high quality mask 33.7 33.2 balanced group softmax misc 32.0 self training 28.8 tricks S101 & DCN 26.2 HTC 22.4 RFS 20.3 19.2 EQL data baseline augmentation TTA: (1) multi-scale testing (2) scale-aware inference (3) revised Softnms 30

Introduction of LVIS Long tail distribution High quality mask annotations Training Pipeline Representation learning stage Fine-tuning stage Our Results Improvements & tricks Challenges in LVIS Inconsistent annotations Overview Objects that are hard to represent with boxes 31

Challenges in LVIS Not-well Boxable Objects Fire Hose (Mask AP 3.9) Hose (Mask AP 6.5) 32

Challenges in LVIS Categories that are Hard to Detect Hook (Mask AP 7.3) Stirrup (Mask AP 1.2) 33

Challenges in LVIS Inconsistent Annotations Crib (Mask AP - BBox AP = -51.6) 34

LVIS Challenge 2020 Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 Thank you R231 / G 36 / B 39 #586a87 #595959 #e72427 Pantone / 129C C 0 / M 20 / Y 100 / K 0 R253 / G 208 / B 0 #fdd000 35

A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / - PowerPoint PPT Presentation

LVIS Challenge 2020 A Good Box is not a Guarantee of a Good Mask Pantone / 185C C 75 / M 59 / Y 37 / K 0 C 71 / M 63 / Y 60 / K 14 C 0 / M 95 / Y 85 / K 0 R 88 / G 106 / B 135 R 89 / G 89 / B 89 R231 / G 36 / B 39 #586a87 #595959 #e72427

Architecture Aromatique Good Taste Good Food Good Health Based on sustainability Technical

Enjoyment with a good conscience We guarantee that our organic products meet the highest

This is 1) Sandy (our teacher, 40) ________________________________________________________

Fall Vegetable Garden A Successful Garden Good Siting Sunlight at least 6 hrs. Good

The good, the bad and the ugly of online community engagement The good! The Really Good The

6/18/2018 What Is Good To God? He has told you, O man, what is good; and what does the Lord

CoffeeScript: The Good Parts The good and not so good parts of Co ff eeScript in comparison to

MODULE ONE Managing Yourself in the Workplace Together for Good Together for Good Together for

The Three Ss: Good Science, Good Sense, Good Sensibilities norecopa.no/3S ADRIAN SMITH PENNY

Good Clinical Practice Sarah Temkin, MD What is Good Clinical Practice? Good Clinical

6/18/2018 What Is Good To God? One OT prophet summarizes the good life God expects of his

Good Planning Good Geology Good Management TSX.V : CSL February 27, 2017 Disclaimer

INITIATIVES AND CHALLENGES GOOD GOVERNANCE The concept of Good Governance has come to

WHO CAN'T TAKE PICTURES GOOD AND WANNA LEARN TO DO OTHER STUFF GOOD TOO Stay a while and

MITOCW | watch?v=lbqlj1g8gu0 TOM KOCHAN: Well, good afternoon, everyone, or good morning or good

Social sector leaders pride themselves on doing good for the world, But to be of maximum

SE Testing Basics SWEN-101 What is a Good Test? A good test has a high probability of

Creating Events that Engage Communities Rebecca Patterson Director, Good Life, Good Death, Good

SHARK-FV 2014 The MOOD Ideas MOOD for multi-mat. flows The good The not-yet-good-enough

Welcome The good, the bad and the ugly Tim Farrar TMF Associates The good: Deployment now

Finance for Social Good June 2017 Construct of SFF SFF is a company which is limited

How good is simple reversal sort? p Not so good actually p It has to do at most n-1 reversals with

1. procedure ONE TO ALL BC( d , my id , X ) 2. begin mask := 2 d 1; 3. /* Set all d bits of

Presentation Vernon Hill Good evening. Good evening in London and good afternoon in America.