CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020

Overview • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Self-supervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

Part I: Weakly Supervised Learning

Model Complexity Keeps Increasing output 10 fc 120 LeNet (Lecun et al. 1998) conv conv fc 84 >100 millions of parameters ResNet (He et al. 2016)

[Sun et al. 2017]

Challenge: Limited labeled data ImageNet , 1M images 1B images x 1000 ~thousand annotation hours ~million annotation hours [Deng et al. 2009]

TRAINING AT SCALE Weakly Supervised Fully Supervised Un-supervised Levels of A CUTE CAT COUPLE CAT, DOG, ??? Supervision FLOOR #CAT Crawled web images ImageNet Instagram/Flickr

TRAINING AT SCALE Non-Visual Incorrect #LOVE #CAT #DOG #HUSKY Labels Labels Noisy Data Missing Labels

Flickr 100M [Joulin et al. 2015]

JFT 300M [Sun et al. 2017]

Can we use billions of images with hashtags for pre-training? [Mahajan et al. 2018]

Hashtags Selection 1.5K, 1B synonyms of ImageNet labels 17K, 3B synonyms of nouns in wordnet [Mahajan et al. 2018]

Network Architecture and Capacity ResNeXt-101 32x C d # of params x10^9 # of flops x10^6 160 900 120 675 80 450 40 225 0 0 4 8 16 32 48 4 8 16 32 48 C C Xie et al. 2016

Largest Weakly Supervised Training 3.5B   DISTRIBUTED LARGE CAPACITY MODEL PUBLIC INSTAGRAM 17K UNIQUE LABELS TRAINING (RESNEXT101-32X48) IMAGES (350 GPUS) [Mahajan et al. 2018] 85.1%

Results

Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 16 ImageNet-1K.

Transfer Learning Performance Target task: ImageNet Target task: CUB-2011 & Places-365 * With a bigger model, we even got 85.4% top-1 error on 19 ImageNet-1K.

Models are surprisingly robust to label "noise" Dataset: IG-1B-17k Network: ResNext-101 32x16 20

Effect of Model Capacity Matching hashtags to target task helps (1.5K tags)   Target task: ImageNet-1K

BiT Transfer [Kolesnikov et al. 2020]

Part II: Data Augmentation

Data Augmentation “Quokka” Figure credit: https://github.com/aleju/imgaug

Data Augmentation “cat” Load image and label CNN Data

Data Augmentation Transformation function (TF) Load image and label CNN Data

Data Augmentation Transformation function - Change the pixels without changing the labels (TF) - Train on transformed data improves generalization - VERY widely used

Example of Transformation Functions (TFs) Original image Color jitter Horizontal flip Random crop

Heuristic Data Augmentation Human expert TF sequences Augmented data Data TF 1 TF L rotation flip

Heuristic Data Augmentation How to automatically learn the compositions and Human expert parameterizations of TFs? TF sequences Augmented data Data TF 1 TF L rotation flip

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Augmented TF 1 TF L Data data rotation flip [Ratner et al. 2017]

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Ratner et al. 2017]

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Heuristic augmentation TANDA 100 +2.1% 75 +1.4 +3.4% 50 25 Generated MNIST samples 0 CIFAR-10 ACE (F1 score) Medical Imaging [Ratner et al. 2017]

AutoAugment [Cubuk et al. 2018]

AutoAugment Controller (RNN) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Cubuk et al. 2018]

AutoAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip State-of-the-art performance on various benchmarks, however the computational cost is very high. [Cubuk et al. 2018]

RandAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip [Cubuk et al. 2019]

RandAugment (1) random sampling over the transformation functions Outperform AutoAugment (2) grid search over the parameters of each transformation Augmented TF 1 TF L Data data Randomly Randomly Sampled Sampled [Cubuk et al. 2019]

Adversarial AutoAugment Adversarial Controller (RNN) Reward signal Maximize Training loss TF sequences End model Augmented TF 1 TF L Data data Minimize Training loss rotation flip 12x reduction in computing cost on ImageNet, compared to AutoAugment. Top-1 error 1.36% on CIFAR-10 (new sota). [Zhang et al. 2019]

Uncertainty-based sampling augmentation Model selects the TFs that provides Rotate the most information during training —No policy learning required Invert mixup invert Cutout Augmented Data … K randomly sampled comp. of TFs … data rotate cutout Mixup Users provide transformation functions (TFs) [Wu et al. 2020]

Empirical results: State of the art quality Improved the existing methods across domains SoTA on CIFAR-10, CIFAR-100, and SVHN 84.54% on CIFAR-100 using Wide-ResNet-28-10 outperforming RandAugment (Cubuk et al.’19) by 1.24% Improved 0.28 pts. in accuracy on text classification problem CIFAR-10 CIFAR-100 SVHN

Check out the blog post series! Automating the Art of Data Augmentation (Part I: Overview) Automating the Art of Data Augmentation (Part II: Practical Methods) Automating the Art of Data Augmentation (Part III: Theory) Automating the Art of Data Augmentation (Part IV: New Direction)

Part III: Self-supervised Learning

Source: Yann LeCun’s talk

What if we can get labels for free for unlabelled data and train unsupervised dataset in a supervised manner?

Pretext Tasks

Rotation [Gidaris et al. 2018]

Rotation Gidaris et al. 2018

Patches [Doersch et al., 2015]

Colorization [Zhang et al. 2016] http://richzhang.github.io/colorization/

Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019]

Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019] Positive pair Negative pairs

SimCLR [Chen et al. 2020]

Data Augmentation is the key [Chen et al. 2020]

Unsupervised learning benefits more from bigger models [Chen et al. 2020]

Summary • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Unsupervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

Questions ?

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)

CS839 Special Topics in Deep Learning Course Overview Sharon Yixuan Li University of

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

Final Project. Advanced Topics in Deep Learning Instructor: Yuan Yao Due: 23:59 Sunday 15 Dec,

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

Think Deep Learning: Overview Ju Sun Computer Science & Engineering University of Minnesota,

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 4: Q-Value based RL Animesh

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs)

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization

CS 7643: Deep Learning Topics: Computational Graphs Notation + example Computing

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Feedforward Networks Thanks to Sargur Srihari, Alexander Ororbia, Christopher Olah Deep

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Deep Learning Tutorial Part I Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)

CS839 Special Topics in Deep Learning Course Overview Sharon Yixuan Li University of

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

Final Project. Advanced Topics in Deep Learning Instructor: Yuan Yao Due: 23:59 Sunday 15 Dec,

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

Think Deep Learning: Overview Ju Sun Computer Science &amp; Engineering University of Minnesota,

CSC2621 Topics in Robotics Reinforcement Learning in Robotics Week 4: Q-Value based RL Animesh

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

DeepMind Self-Learning Atari Agent Human - level control through deep reinforcement learning

CS 4803 / 7643: Deep Learning Topics: Variational Auto-Encoders (VAEs)

ECE 6504: Deep Learning for Perception Topics: (Finish) Backprop Convolutional Neural

CS 4803 / 7643: Deep Learning Topics: Regularization Neural Networks Optimization

CS 7643: Deep Learning Topics: Computational Graphs Notation + example Computing

Deep Learning in Image Processing Topics: Image Filtering 101 CNNs 101 Image

ECE 6504: Deep Learning for Perception Topics: LSTMs (intuition and variants) [Abhishek:]

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Feedforward Networks Thanks to Sargur Srihari, Alexander Ororbia, Christopher Olah Deep

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Deep Learning Tutorial Part I Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

Think Deep Learning: Overview Ju Sun Computer Science & Engineering University of Minnesota,

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward