learning from fine grained and long tailed visual data
play

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui - PowerPoint PPT Presentation

Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019 Visual Recognition System Database Supervised Learning Convolutional Neural bird Network (CNN) Visual Recognition System Larger Database


  1. Learning from Fine-Grained and Long-Tailed Visual Data Yin Cui Google Research Dec 11 2019

  2. Visual Recognition System Database Supervised Learning Convolutional Neural “bird” Network (CNN)

  3. Visual Recognition System Larger Database (more images, more classes) Supervised Learning Convolutional Neural “Northern cardinal” Network (CNN)

  4. Visual Recognition System Even Larger Database Supervised Learning Convolutional Neural “Northern cardinal” Network (CNN)

  5. Problem occurs... ● Long-tailed ○ Majority of categories are rare Even Larger Database ● Hard to get labels ○ Labeling effort grows dramatically per image. ○ Human expertise. Supervised Learning Convolutional Neural “Northern cardinal” Network (CNN)

  6. In reality... Medium-sized Database Supervised Learning Convolutional Neural “bird” Network (CNN)

  7. But luckily, we have transfer learning Large-Scale Dataset Medium-sized Database Transfer Learning Supervised Learning Convolutional Neural “Northern cardinal” Network (CNN)

  8. A diverse array of data sources Social Network Search Engine Communities

  9. Can we build a generic, one size fits all pre-trained model for transfer learning?

  10. Large-scale pre-training C. Sun et al. Revisiting Unreasonable Effectiveness of Data in Deep D. Mahajan et al. Exploring the Limits of Weakly Supervised Pretraining. Learning Era. ICCV 2017. ECCV 2018.

  11. Generic vs. Specialized Model ● ImageNet pre-training vs. iNaturalist pre-training ● iNaturalist 2017 contains 859k images from 5000+ natural categories. ● Fine-tuned on 7 medium-sized datasets. CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet 82.84 84.19 96.26 91.31 85.49 88.65 82.01 iNat 89.26 78.46 97.64 88.31 82.61 88.80 87.91

  12. Generic vs. Specialized Model ● ImageNet pre-training vs. iNaturalist pre-training ● iNaturalist 2017 contains 859k images from 5000+ natural categories. ● Fine-tuned on 7 medium-sized datasets. ● Combining ImageNet + iNat. More data doesn’t always help. CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet 82.84 84.19 96.26 91.31 85.49 88.65 82.01 iNat 89.26 78.46 97.64 88.31 82.61 88.80 87.91 ImageNet + iNat 85.84 82.36 97.07 91.38 85.21 88.45 83.98

  13. Model Capacity is not a problem ● Combined training achieve similar performance on each dataset. ● Model is able to learn well on both datasets, but cannot transfer well. ○ Trade-off between quantity and quality in transfer learning. ○ Pre-training a more specialized model could help.

  14. Domain similarity via Earth Mover’s Distance ● Red: source domain. Green: target domain. Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.

  15. Source domain selection ● Greedy selection strategy: sort and include most similar source classes. ○ Simple and no guarantee on the optimality, but works well in practice. Cui et al. Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning. CVPR 2018.

  16. Improved Transfer Learning ● Comparable to the best of ImageNet, iNat with only a subset of 585 classes. CUB-200 Stanford Dogs Flowers-102 Stanford Cars Aircraft Food-101 NA-Birds ImageNet 82.84 84.19 96.26 91.31 85.49 88.65 82.01 iNat 89.26 78.46 97.64 88.31 82.61 88.80 87.91 ImageNet + iNat 85.84 82.36 97.07 91.38 85.21 88.45 83.98 Ours (585-class) 88.76 85.23 97.37 90.58 86.13 88.37 87.89

  17. Transfer Learning via Fine-tuning ● Transfer learning performance can be estimated by domain similarity.

  18. Discussion ● In the AutoML setting: ○ We need a model that performs well on a small dataset. Usually it’s domain specific. ○ We have access to large datasets and pre-trained models. ○ The problem cannot be solved by pre-training on a single large source domain. ● Architectural search is one solution. ● Another solution could be from the perspective of source domain selection: ○ A model zoo with models trained on different datasets. ○ Select a source domain / pre-trained model based on domain similarity.

  19. Dealing with long-tailed data distribution

  20. The World is Long-Tailed ● A large number of classes are rare in nature. ● Cannot easily scale the data collection for those classes in the long tail. Cui et al. Class-Balanced Loss Based on Effective Number of Samples. CVPR 2019.

  21. Overview Effective Number of Samples: n: number of samples. Class-Balanced Loss:

  22. The more data, the better, but... ● As the number of samples increases, the marginal benefit a model can extract from the data diminishes. image courtesy: https://me.me/i/ate-too-much-regrets-nothing-5869266

  23. Data Sampling as Random Covering ● In order to measure data overlap, we associate each sample with a small region of unit volume 1 instead of a point. Assume the volume of all possible data is N.

  24. Theoretical Results

  25. Class-Balanced Loss ● Class-Balanced Softmax Cross-Entropy Loss: ● Class-Balanced Sigmoid Cross-Entropy Loss: ● Class-Balanced Focal Loss:

  26. Class-Balanced Loss ● Class-Balanced Softmax Cross-Entropy Loss: ● Class-Balanced Sigmoid Cross-Entropy Loss: ● Class-Balanced Focal Loss:

  27. Datasets

  28. Classification Error Rate of ResNet-32 on CIFAR ● Original Losses and best class-balanced loss. ● SM: Softmax; SGM: Sigmoid.

  29. Analysis

  30. Classification Error Rate on ImageNet and iNat

  31. Classification Error Rate on ImageNet and iNat

  32. ResNet-50 Training Curves on ImageNet and iNat

  33. ResNet-50 Training Curves on iNat and ImageNet

  34. Discussion ● The concept of effective number of samples for long-tailed data distribution. ● A theoretical framework to quantify effective number of samples. ○ Model each example as a small region instead of a point. ● Class-balanced loss. ● Improved performance on 3 commonly used loss functions. ● Non-parametric. We do not assume the distribution of data. ● Code available at: https://github.com/richardaecn/class-balanced-loss ● Two follow-up work: ○ K Cao et al. Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. NeurIPS 2019. ○ B Kang et al. Decoupling Representation and Classifier for Long-Tailed Recognition. https://arxiv.org/abs/1910.09217.

  35. Thanks!

Recommend


More recommend