Seesaw Loss for Long-Tailed Instance Segmentation Jiaqi Wang 1 , Wenwei Zhang 2 , Yuhang Zang 2 , Yuhang Cao 1 , Jiangmiao Pang 3 , Tao Gong 4 , Kai Chen 1 , Ziwei Liu 1 , Chen Change Loy 2 , Dahua Lin 1 1 The Chinese University of Hong Kong 2 Nanyang Technological University 3 Zhejiang University 4 University of Science and Technology of China Team: MMDet
Results Comparison of our entry with official baseline on LVIS v1 test-dev. 45.42 38.9 40.42 35.42 30.42 26.8 MASK AP 25.42 20.42 15.42 10.42 5.42 0.42 Baseline AP MMDet AP
Results Comparison of our entry with official baseline on LVIS v1 test-dev. 50.42 45.4 45.42 40.42 37 35.42 32 29.5 30.42 MASK AP 25.2 25.42 19 20.42 15.42 10.42 5.42 0.42 Basline APr MMDet APr Basline APc MMDet APc Basline APf MMDet APf
Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed instance segmentation.
Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed instance segmentation. 2. We propose HTC-Lite , a light-weight version of Hybrid Task Cascade (HTC).
Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed instance segmentation. 2. We propose HTC-Lite , a light-weight version of Hybrid Task Cascade (HTC).
Seesaw Loss Existing object detectors struggle on long-tailed datasets, exhibiting unsatisfactory performance on rare classes. The reason lies on that the overwhelming number of samples in frequent classes leads to models whose rare class confidences are severely suppressed.
Seesaw Loss To tackle this problem, we propose Seesaw Loss for long-tailed instance segmentation. • Dynamic: Seesaw Loss dynamically modifies the penalty according to the relative ratio of instance numbers between each category pair. Smooth: Seesaw Loss smoothly adjusts the punishment on rare classes when the • training instances are positive samples of other relatively frequent categories. Self-calibrated: It directly learns to balance the penalty to each categories during • training, without relying on known dataset distributions or a specific data sampler.
Seesaw Loss Seesaw Loss can be derived from cross-entropy loss:
Seesaw Loss Seesaw Loss accumulates the number of training samples for each category during each training iteration. Given an instance with positive label 𝒋 , for the other category 𝒌 , Seesaw Loss dynamically 𝑶 𝒌 adjusts the penalty for negative label 𝒌 w.r.t. 𝑶 𝒋 as,
Seesaw Loss Seesaw Loss When category 𝒋 is more frequent than category 𝒌 , Seesaw Loss will reduce the penalty on • 𝒒 𝑶 𝒌 category 𝒌 by a factor of to protect category 𝒌 . 𝑶 𝒋 Otherwise, Seesaw Loss will keep the penalty on negative classes to reduce misclassification. •
Seesaw Loss Normalized Linear Layer. We adopt a normalized linear layer to predict classification activation. Objectness Score. We adopt an objectness branch to predict objectness scores with normalized linear layer and cross-entropy loss.
Seesaw Loss
Overview 1. We propose Seesaw Loss that dynamically rebalances the penalty between different categories for long-tailed classification. 2. We propose HTC-Lite , a light-weight version of Hybrid Task Cascade (HTC).
HTC-Lite • Original HTC Semantic Segmentatin M1 M2 M3 B3 B1 B2 Pool Pool Pool Pool Pool F
HTC-Lite • Reduce the number of mask heads Semantic Segmentatin M3 B3 B1 B2 Pool Pool Pool Pool Pool F
HTC-Lite • Use context encoding rather than semantic segmentation head • Does not rely on semantic segmentation annotation Context Encoding M3 B3 B1 B2 Pool Pool Pool Pool Pool F
HTC-Lite Comparison with HTC w/o semantic Dataset Method Bbox AP Mask AP HTC w/o semantic 41.5 36.7 COCO HTC-Lite 42.5 37.8 HTC w/o semantic 26.8 24.5 LVIS V1 HTC-Lite 27.2 25.2
Experiments Training/Testing details 1. Training Dataset: Detectors: LVIS v1 training split • No extra data or annotation is used in our entry. 2. Training scales long edge: random sampled from 768~1792 pixels • Random crop to 1280 x 1280 • 3. Augmentation Use InstaBoost • 4. Test time augmentation • Random flip • Scales: (1200, 1200), (1400, 1400), (1600, 1600), (1800, 1800), (2000, 2000)
Experiments Model Modifications Synchronized BN • CARAFE • HTC-Lite • TSD • Mask Scoring • Better Neck: FPG + CARAFE + DCNv2 • Better Backbone: ResNest-200 + DCNv2 •
Experiments mask AP on val 40 38 36 34 32 30 28 26 24 22 18.7 20 18 Baseline
Experiments mask AP on val 40 38 36 34 32 30 28 26 24 18.9 22 (+0.2) 18.7 20 SyncBN 18 Baseline
Experiments mask AP on val 40 38 36 34 32 30 28 26 24 19.4 18.9 22 (+0.5) (+0.2) 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 32 30 28 26 21.9 (+2.5) 24 19.4 18.9 22 (+0.5) (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 32 30 28 23.5 26 21.9 (+1.6) (+2.5) 24 19.4 18.9 TSD 22 (+0.5) (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 32 30 28 23.9 23.5 26 (+0.4) 21.9 (+1.6) (+2.5) 24 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 32 30 26.5 28 (+2.6) 23.9 23.5 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 32 27 30 26.5 (+0.5) 28 (+2.6) 23.9 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 38 36 34 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 40 36.8 (+6.9) 38 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 37.3 40 36.8 (+0.5) (+6.9) 38 Finetune 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Experiments mask AP on val 38.92 on test-dev 38.8 37.3 (+1.5) 40 36.8 (+0.5) (+6.9) 38 Test Time Aug. Finetune 36 Seesaw 34 Loss 29.9 (+2.9) 32 27 30 26.5 (+0.5) ResNest200 28 (+2.6) 23.9 DCNv2 23.5 FPG 26 (+0.4) 21.9 (+1.6) Training (+2.5) 24 Time Aug. 19.4 Mask 18.9 TSD 22 (+0.5) Scoring (+0.2) HTC-Lite 18.7 20 SyncBN CARAFE 18 Baseline Upsample
Supported Methods • • RPN • GRoIE HRNet • • Guided Anchoring • DIoU GCNet We recently release • • Fast / Faster R-CNN • CIoU NAS-FPN MMDetetion v2.0 & MMDetetion3D • • R-FCN • BIoU PAFPN • • Grid R-CNN • RetinaNet FSAF • • Libra R-CNN • ATSS PointRend • • Mask R-CNN • SSD Instaboost • • Dynamic R-CNN • GHM Mixed Precision Training • • Mask scoring R-CNN • OHEM CARAFE • • Double Head R-CNN • FCOS DCN / DCN V2 • • Cascade R-CNN • NAS-FCOS Weight Standardization GitHub: MMDet GitHub: MMDet3D • • Hybrid Task Cascade • FoveaBox Generalized Attention • • DetectoRS • Reppoints Generalized Focal Loss
Thank you!
Recommend
More recommend