Decoupling Representation and Classifier for Long-Tailed Recognition - PowerPoint PPT Presentation

Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang , Saining Xie, Marcus Rohrbach, Zhicheng Yan , Albert Gordo, Jiashi Feng, Yannis Kalantidis

Long-tailed classification Problem statement ❏ Training set: long-tailed distribution ❏ Head v.s. Tail ❏ Testing set: balanced distribution ❏ Evaluation: three splits based on cardinality Existing methods ❏ Rebalancing the data Up/Down sampling tail/head classes. ❏ Rebalancing the loss Assign larger/smaller weight to tail/head classes. e.g., CB-Focal[1], LDAM[2] [1] Cui, Yin, et al. "Class-balanced loss based on effective number of samples." CVPR. 2019. [2] Cao, Kaidi, et al. "Learning imbalanced datasets with label-distribution-aware margin loss." NIPS. 2019.

The problem behind long-tail Classification performance Representation Quality Classifier Quality Performance Final

The problem behind long-tail Classification performance Representation Quality Classifier Quality Quality

The problem behind long-tail Classification performance Representation Quality Classifier Quality Quality NOTE: Such observations are drawn empirically!

Notations ● Feature representation: 𝑔 𝑦; 𝜾 = 𝑨 ● Linear classifiers: 𝑕 𝑗 𝑨 = 𝑋𝑗 𝑈 𝑨 + 𝑐 ● Final prediction: , 𝑧 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑕𝑗(𝑨)

What is the problem with the classifier? ImageNet_LT ResNext50 Jointly learned classifier Small weight scale; Dataset distribution Small confidence score; Poor performance. After joint training with instance-balanced sampling, ● the norms of the weights are correlated with the size of the classes .

How to improve the classifier? -- Three ways I. Classifier Retraining (cRT) KEY: break the norm v.s. class size correlation. ❏ Freeze the representation. ❏ Retrain the linear classifier with class- balanced sampling.

How to improve the classifier? -- Three ways KEY: break the norm v.s. #data I. Classifier Retraining (cRT) correlation. ❏ Freeze the representation. ❏ Retrain the linear classifier with class- balanced sampilng. II. Tau-Normalization ( 𝞾 -Norm) ❏ Adjust the classifier weight norms directly ❏ Tau is “temperature” of the normalization.

How to improve the classifier? -- Three ways KEY: break the norm v.s. #data I. Classifier Retraining (cRT) correlation. ❏ Freeze the representation. ❏ Retrain the linear classifier with class-balanced sampling II. Tau-Normalization ( 𝞾 -Norm) ❏ Adjust the classifier weight norms directly. ❏ Tau is “temperature” of the normalization. III. Learnable Weight Scaling (LWS) ❏ Tune the scale of each weight vector through learning

Classifier Rebalancing - Without classifier rebalancing (i.e. Joint training), progressively-balanced sampling works best - When instance-balanced sampling is used and classifiers are re-balanced, medium-shot, and few- shot performance increases significantly, and achieve best results

How Does Classifier Rebalancing Work? Larger weights ==> Wider classification cone ● Un-normalized weights ==> Unbalanced decision boundaries ● Classifier rebalancing ==> More balanced decision boundaries ●

Can we finetune both trunk and classifier? The best performance is achieved when only classifier is retrained, and ● backbone model is fixed.

Experiments Datasets I. ImageNet_LT ❏ Constructed from ImageNet 2012 ❏ 1000 categories, 115.8k images II. iNaturalist 2018 ❏ Contains only species. ❏ 8142 categories, 437.5k images III. Places_LT ❏ Constructed from Places365 ❏ 365 classes

Experiments ➢ Datasets From joint to LWS/cRT/tau-norm, with little sacrifice on many shot I. ImageNet_LT ➢ New SOTA can be achieved ❏ Constructed from ImageNet 2012 ➢ Improvement on Medium: ~10, few: 20+ ❏ 1000 categories, 115.8k images

Experiments Datasets ➢ From joint to cRT/tau-norm, little sacrifice on head classes, Large gain on tail classes. II. iNaturalist 2018 ➢ Once representation is sufficiently trained, ❏ Contains only species. New SOTA can be easily obtained. ❏ 8142 categories, 437.5k images ➢ With little sacrifice on many shot, new SOTA can be achieved. * Notation: 90 epochs/200 epochs

Take home messages ❏ For solving long-tailed recognition Code is available! problem, representation and classifiers should be considered separately. ❏ Our methods achieve performance gain by finding a better tradeoff (currently the best one) between head and tail classes. ❏ Future research might be focusing more on improving representation quality. https://github.com/facebookresearch/classifier-balancing

Decoupling Representation and Classifier for Long-Tailed Recognition - PowerPoint PPT Presentation

Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang , Saining Xie, Marcus Rohrbach, Zhicheng Yan , Albert Gordo, Jiashi Feng, Yannis Kalantidis Long-tailed classification Problem statement Training set:

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed

PSE Decoupling Mechanisms A Brief Overview Jon Piliaris Manager, Pricing & Cost of Service,

Efficient Decoupling Capacitor Planning Efficient Decoupling Capacitor Planning via Convex

Rethinking Class-Balanced Methods for Long-tailed Visual Recognition from a Domain Adaptation

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Mammal PBL Project By: Anastasia and Amauree: gray wolf, white tailed jaguar What regions of

Optimizing performance in heavy-tailed system: a case study Lyubov V. Potakhina Alexander S.

Importance Sampling Methodology for Multidimensional Heavy-tailed Random Walks Jose Blanchet

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Processing Quantities with Result for Addition . . . Heavy-Tailed Distribution of Case of a

Quantum decoupling via efficient classical operations and the entanglement cost of one-shot

Cache Refill/Access Decoupling for Vector Machines Christopher Batten, Ronny Krashinsky, Steve

Towards a Realistic Model of Incentives in Interdomain Routing: Decoupling Forwarding from

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Jobs End LESSON 5 Your Response to the Lesson What was most interesting in the Bible story?

John 14:6 I am the way the TRUTH the life, no man can come to the Father except through no

Myths and Folklore Martin Thompson - @mjpt777 Top Performance Myths and Folklore Martin

Review: Linearizability Defini3on Given some component C (say,

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

CLOUD-SCALE INFORMATION RETRIEVAL Ken Birman, CS5412 Cloud Computing Styles of cloud computing

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

Decoupling Representation and Classifier for Long-Tailed Recognition - PowerPoint PPT Presentation

Decoupling Representation and Classifier for Long-Tailed Recognition Bingyi Kang , Saining Xie, Marcus Rohrbach, Zhicheng Yan , Albert Gordo, Jiashi Feng, Yannis Kalantidis Long-tailed classification Problem statement Training set:

The Nave Bayes Classifier Machine Learning 1 Todays lecture The nave Bayes Classifier

Concentration bounds for CVaR estimation: The cases of light-tailed and heavy-tailed

PSE Decoupling Mechanisms A Brief Overview Jon Piliaris Manager, Pricing &amp; Cost of Service,

Efficient Decoupling Capacitor Planning Efficient Decoupling Capacitor Planning via Convex

Rethinking Class-Balanced Methods for Long-tailed Visual Recognition from a Domain Adaptation

Lazy Associative Classification Decision Tree Classifier (Eager) Associative Classifier By

Mammal PBL Project By: Anastasia and Amauree: gray wolf, white tailed jaguar What regions of

Optimizing performance in heavy-tailed system: a case study Lyubov V. Potakhina Alexander S.

Importance Sampling Methodology for Multidimensional Heavy-tailed Random Walks Jose Blanchet

Heavy tails: right skew ! Right skew ! normal distribution (not heavy tailed) ! e.g. heights of

Processing Quantities with Result for Addition . . . Heavy-Tailed Distribution of Case of a

Quantum decoupling via efficient classical operations and the entanglement cost of one-shot

Cache Refill/Access Decoupling for Vector Machines Christopher Batten, Ronny Krashinsky, Steve

Towards a Realistic Model of Incentives in Interdomain Routing: Decoupling Forwarding from

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

When and Why to use a Classifier? When and Why to use a Classifier? Alan Rector Alan Rector

Jobs End LESSON 5 Your Response to the Lesson What was most interesting in the Bible story?

John 14:6 I am the way the TRUTH the life, no man can come to the Father except through no

Myths and Folklore Martin Thompson - @mjpt777 Top Performance Myths and Folklore Martin

Review: Linearizability Defini3on Given some component C (say,

The Long Tail as a Pow g er Curve 120 100 80 60 40 40 20 0 1 1 11 11 21 21 31 31

CLOUD-SCALE INFORMATION RETRIEVAL Ken Birman, CS5412 Cloud Computing Styles of cloud computing

IAIR T TDS V VI: D Deali ling wit ith Long T Tail il Cla laim ims October 12, 2018

Accept the Risk and Continue: Measuring the Long Tail of Government https Adoption Sudheesh

PSE Decoupling Mechanisms A Brief Overview Jon Piliaris Manager, Pricing & Cost of Service,