cs839 special topics in ai deep learning
play

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)


  1. CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020

  2. Overview • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Self-supervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

  3. Part I: Weakly Supervised Learning

  4. Model Complexity Keeps Increasing output 10 fc 120 LeNet (Lecun et al. 1998) conv conv fc 84 >100 millions of parameters ResNet (He et al. 2016)

  5. [Sun et al. 2017]

  6. Challenge: Limited labeled data ImageNet , 1M images 1B images x 1000 ~thousand annotation hours ~million annotation hours [Deng et al. 2009]

  7. TRAINING AT SCALE Weakly Supervised Fully Supervised Un-supervised Levels of A CUTE CAT COUPLE CAT, DOG, ??? Supervision FLOOR #CAT Crawled web images ImageNet Instagram/Flickr

  8. TRAINING AT SCALE Non-Visual Incorrect #LOVE #CAT #DOG #HUSKY Labels Labels Noisy Data Missing Labels

  9. Flickr 100M [Joulin et al. 2015]

  10. JFT 300M [Sun et al. 2017]

  11. Can we use billions of images with hashtags for pre-training? [Mahajan et al. 2018]

  12. Hashtags Selection 1.5K, 1B synonyms of ImageNet labels 17K, 3B synonyms of nouns in wordnet [Mahajan et al. 2018]

  13. Network Architecture and Capacity ResNeXt-101 32x C d # of params x10^9 # of flops x10^6 160 900 120 675 80 450 40 225 0 0 4 8 16 32 48 4 8 16 32 48 C C Xie et al. 2016

  14. Largest Weakly Supervised Training 3.5B 
 DISTRIBUTED LARGE CAPACITY MODEL PUBLIC INSTAGRAM 17K UNIQUE LABELS TRAINING (RESNEXT101-32X48) IMAGES (350 GPUS) [Mahajan et al. 2018] 85.1%

  15. Results

  16. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 16 ImageNet-1K.

  17. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 17 ImageNet-1K.

  18. Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 18 ImageNet-1K.

  19. Transfer Learning Performance Target task: ImageNet Target task: CUB-2011 & Places-365 * With a bigger model, we even got 85.4% top-1 error on 19 ImageNet-1K.

  20. Models are surprisingly robust to label "noise" Dataset: IG-1B-17k Network: ResNext-101 32x16 20

  21. Effect of Model Capacity Matching hashtags to target task helps (1.5K tags) 
 Target task: ImageNet-1K

  22. BiT Transfer [Kolesnikov et al. 2020]

  23. Part II: Data Augmentation

  24. Data Augmentation “Quokka” Figure credit: https://github.com/aleju/imgaug

  25. Data Augmentation “cat” Load image and label CNN Data

  26. Data Augmentation Transformation function (TF) Load image and label CNN Data

  27. Data Augmentation Transformation function - Change the pixels without changing the labels (TF) - Train on transformed data improves generalization - VERY widely used

  28. Example of Transformation Functions (TFs) Original image Color jitter Horizontal flip Random crop

  29. Heuristic Data Augmentation Human expert TF sequences Augmented data Data TF 1 TF L rotation flip

  30. Heuristic Data Augmentation How to automatically learn the compositions and Human expert parameterizations of TFs? TF sequences Augmented data Data TF 1 TF L rotation flip

  31. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Augmented TF 1 TF L Data data rotation flip [Ratner et al. 2017]

  32. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Ratner et al. 2017]

  33. TANDA T ransformation A dversarial N etworks for D ata A ugmentations Heuristic augmentation TANDA 100 +2.1% 75 +1.4 +3.4% 50 25 Generated MNIST samples 0 CIFAR-10 ACE (F1 score) Medical Imaging [Ratner et al. 2017]

  34. AutoAugment [Cubuk et al. 2018]

  35. AutoAugment Controller (RNN) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Cubuk et al. 2018]

  36. AutoAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip State-of-the-art performance on various benchmarks, however the computational cost is very high. [Cubuk et al. 2018]

  37. RandAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip [Cubuk et al. 2019]

  38. RandAugment (1) random sampling over the transformation functions Outperform AutoAugment (2) grid search over the parameters of each transformation Augmented TF 1 TF L Data data Randomly Randomly Sampled Sampled [Cubuk et al. 2019]

  39. Adversarial AutoAugment Adversarial Controller (RNN) Reward signal Maximize Training loss TF sequences End model Augmented TF 1 TF L Data data Minimize Training loss rotation flip 12x reduction in computing cost on ImageNet, compared to AutoAugment. Top-1 error 1.36% on CIFAR-10 (new sota). [Zhang et al. 2019]

  40. Uncertainty-based sampling augmentation Model selects the TFs that provides Rotate the most information during training —No policy learning required Invert mixup invert Cutout Augmented Data … K randomly sampled comp. of TFs … data rotate cutout Mixup Users provide transformation functions (TFs) [Wu et al. 2020]

  41. Empirical results: State of the art quality Improved the existing methods across domains SoTA on CIFAR-10, CIFAR-100, and SVHN 84.54% on CIFAR-100 using Wide-ResNet-28-10 outperforming RandAugment (Cubuk et al.’19) by 1.24% Improved 0.28 pts. in accuracy on text classification problem CIFAR-10 CIFAR-100 SVHN

  42. Check out the blog post series! Automating the Art of Data Augmentation (Part I: Overview) Automating the Art of Data Augmentation (Part II: Practical Methods) Automating the Art of Data Augmentation (Part III: Theory) Automating the Art of Data Augmentation (Part IV: New Direction)

  43. Part III: Self-supervised Learning

  44. Source: Yann LeCun’s talk

  45. What if we can get labels for free for unlabelled data and train unsupervised dataset in a supervised manner?

  46. Pretext Tasks

  47. Rotation [Gidaris et al. 2018]

  48. Rotation Gidaris et al. 2018

  49. Rotation Gidaris et al. 2018

  50. Patches [Doersch et al., 2015]

  51. Colorization [Zhang et al. 2016] http://richzhang.github.io/colorization/

  52. Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019]

  53. Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019] Positive pair Negative pairs

  54. SimCLR [Chen et al. 2020]

  55. SimCLR [Chen et al. 2020]

  56. SimCLR [Chen et al. 2020]

  57. Data Augmentation is the key [Chen et al. 2020]

  58. Unsupervised learning benefits more from bigger models [Chen et al. 2020]

  59. Summary • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Unsupervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

  60. Questions ?

Recommend


More recommend