A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert - PowerPoint PPT Presentation

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert Gu, Alex Ratner, Virginia Smith, Chris De Sa, Chris Ré

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy…

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100 A form of weak supervision: expresses domain knowledge (invariance)

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM … but is not well understood

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM … but is not well understood How does data augmentation affect the model? • Learning process • Parameters and decision surface

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as sequence modeling • TANDA [Ratner et al., 2017] • AutoAugment [Cubuk et al., 2018]

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as sequence modeling • TANDA [Ratner et al., 2017] • AutoAugment [Cubuk et al., 2018] Model augmentation as a Markov chain

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as kernels Base classifier: k-nearest neighbors + Data augmentation = Asymptotic kernel classifier

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o x x o x x o x x Invariance

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility speeding up training

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility as a speeding up diagnostic training

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Model of data augmentation: kernel classifier n 1 X ` ( w > � ( x i )) min Non-augmented: w n Loss function i =1 Feature map

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Model of data augmentation: kernel classifier n 1 X ` ( w > � ( x i )) min Non-augmented: w n Loss function i =1 Feature map n 1 X E z i ⇠ T ( x i ) ` ( w > � ( z i )) min Augmented: w n i =1 Transformed versions of data point

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding)

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding) 1 st order effect: induces invariance by feature averaging

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding) 1 st order effect: 2 nd order effect: reduces induces invariance model complexity by feature via a data-dependent averaging regularization

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric ψ ( x ) = E z ∼ T ( x ) φ ( z ) Averaged features: Kernel target alignment [Cristianini et al., 2002]: how well separated are features from different classes

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric Kernel alignment Kernel alignment MNIST

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric Kernel alignment Kernel alignment MNIST Kernel alignment correlates with accuracy.

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Summary • Data augmentation + k-NN = asymptotic kernel classifier. • Data augmentation induces invariance and regularizes. • Application in speeding up training and diagnostics. Tri Dao trid@stanford.edu Poster #227 on Tuesday Jun 11 th at 6:30pm

A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert - PowerPoint PPT Presentation

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:309:00 P 9:00 PM A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

ECE 417 Fall 2018 Lecture 19: Mini-Batch Training and Data Augmentation Mark Hasegawa-Johnson

SwitchOut: An Efficient Data Augmentation for Neural Machine Translation Xinyi Wang , Hieu

Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasures Eleonora

Decision support systems and machine learning Lecture 11 Lecture 11 p. 1/24 Neural networks:

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1

Outline Last time: window-based generic object detection Discriminative classifiers

Perceptrons 2-29-16 What is a neural network? activation connection functions A NN is a

Clean Slate Program: Second Chance Remedies and Preparing Client Declarations June 2020 TAPs

Declaration Syntax General form of a declaration: declaration-specifiers declarators ;

A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert - PowerPoint PPT Presentation

th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:309:00 P 9:00 PM A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert

Data Augmentation in NLP 2020-03-21 Xiachong Feng Outline Why we need Data Augmentation?

Population Based Augmentation Efficient Learning of Augmentation Policy Schedules Daniel Ho , Eric

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

image-augmentation April 9, 2019 1 Image Augmentation In [1]: % matplotlib inline import d2l

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Galileo Local Element Augmentation System Galileo Local Element Augmentation System (GALILEA)

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

ECE 417 Fall 2018 Lecture 19: Mini-Batch Training and Data Augmentation Mark Hasegawa-Johnson

SwitchOut: An Efficient Data Augmentation for Neural Machine Translation Xinyi Wang , Hieu

Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasures Eleonora

Decision support systems and machine learning Lecture 11 Lecture 11 p. 1/24 Neural networks:

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun &amp; Rich Zemels

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos &amp; Aarti Singh 1

Outline Last time: window-based generic object detection Discriminative classifiers

Perceptrons 2-29-16 What is a neural network? activation connection functions A NN is a

Clean Slate Program: Second Chance Remedies and Preparing Client Declarations June 2020 TAPs

Declaration Syntax General form of a declaration: declaration-specifiers declarators ;

CSC 411: Lecture 07: Multiclass Classification Class based on Raquel Urtasun & Rich Zemels

Logistic Regression (slides borrowed from Tom Mitchell, Barnabs Pczos & Aarti Singh 1