th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A Kernel Theory of Modern Data Augmentation Tr Tri Dao ao , Albert Gu, Alex Ratner, Virginia Smith, Chris De Sa, Chris Ré
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy…
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation is important to accuracy… 3.7 pt. average gain across top ten CIFAR-10 models 13.9 pt. average gain for CIFAR-100 A form of weak supervision: expresses domain knowledge (invariance)
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM … but is not well understood
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM … but is not well understood How does data augmentation affect the model? • Learning process • Parameters and decision surface
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as sequence modeling • TANDA [Ratner et al., 2017] • AutoAugment [Cubuk et al., 2018]
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as sequence modeling • TANDA [Ratner et al., 2017] • AutoAugment [Cubuk et al., 2018] Model augmentation as a Markov chain
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Augmentation as kernels Base classifier: k-nearest neighbors + Data augmentation = Asymptotic kernel classifier
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o x x o x x o x x Invariance
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility speeding up training
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Effects of data augmentation on kernel classifiers o o o o o o o o o o x x x x o o x x x o o x x x x x Regularization Invariance Practical utility as a speeding up diagnostic training
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Model of data augmentation: kernel classifier n 1 X ` ( w > � ( x i )) min Non-augmented: w n Loss function i =1 Feature map
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Model of data augmentation: kernel classifier n 1 X ` ( w > � ( x i )) min Non-augmented: w n Loss function i =1 Feature map n 1 X E z i ⇠ T ( x i ) ` ( w > � ( z i )) min Augmented: w n i =1 Transformed versions of data point
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding)
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding) 1 st order effect: induces invariance by feature averaging
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Data augmentation effects n n 1 E z i ⇠ T ( x i ) ` ( w > � ( z i )) ≈ 1 X X ` ( w > E z i ⇠ T ( x i ) � ( z i )) n n i =1 i =1 Average of augmented features (i.e. kernel mean embedding) 1 st order effect: 2 nd order effect: reduces induces invariance model complexity by feature via a data-dependent averaging regularization
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric ψ ( x ) = E z ∼ T ( x ) φ ( z ) Averaged features: Kernel target alignment [Cristianini et al., 2002]: how well separated are features from different classes
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric Kernel alignment Kernel alignment MNIST
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM A diagnostic: kernel alignment metric Kernel alignment Kernel alignment MNIST Kernel alignment correlates with accuracy.
th 6:30 11 th ICML Oral 06/11/2019 Post ster #227 | Tue Ju Jun 11 6:30—9:00 P 9:00 PM Summary • Data augmentation + k-NN = asymptotic kernel classifier. • Data augmentation induces invariance and regularizes. • Application in speeding up training and diagnostics. Tri Dao trid@stanford.edu Poster #227 on Tuesday Jun 11 th at 6:30pm
Recommend
More recommend