a kernel perspective for regularizing deep neural networks
play

A Kernel Perspective for Regularizing Deep Neural Networks Alberto - PowerPoint PPT Presentation

A Kernel Perspective for Regularizing Deep Neural Networks Alberto Bietti* Grgoire Mialon* Dexiong Chen Julien Mairal Inria ICML 2019, Long Beach Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1


  1. A Kernel Perspective for Regularizing Deep Neural Networks Alberto Bietti* Grégoire Mialon* Dexiong Chen Julien Mairal Inria ICML 2019, Long Beach Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 1 / 5

  2. Regularization in Deep Learning Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

  3. Regularization in Deep Learning Two issues with today’s deep learning models: Poor performance on small datasets Lack of robustness to adversarial perturbations Questions: Can regularization address this? n 1 � min ℓ ( y i , f ( x i )) + λ Ω( f ) n f i =1 What is a good choice of Ω( f ) for deep (convolutional) networks? Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 2 / 5

  4. Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

  5. Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

  6. Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

  7. Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): � Φ( x ) − Φ( y ) � H ≤ � x − y � 2 � Φ( x τ ) − Φ( x ) � H ≤ C ( τ ) for a small transformation x τ of x Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

  8. Regularization with the RKHS Norm Kernel methods: f ( x ) = � f , Φ( x ) � H Φ( x ) captures useful properties of the data � f � H controls model complexity and smoothness : | f ( x ) − f ( y ) | ≤ � f � H · � Φ( x ) − Φ( y ) � H Our work : view generic CNN f θ as an element of a RKHS H and regularize using � f θ � H Kernels for deep convolutional architectures (Bietti and Mairal, 2019): � Φ( x ) − Φ( y ) � H ≤ � x − y � 2 � Φ( x τ ) − Φ( x ) � H ≤ C ( τ ) for a small transformation x τ of x CNNs f θ with ReLUs are (approximately) in the RKHS with norm � f θ � 2 H ≤ ω ( � W 1 � 2 , . . . , � W L � 2 ) . Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 3 / 5

  9. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  10. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  11. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  12. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup � f , Φ( x + δ ) − Φ( x ) � H (adversarial perturbations) x , � δ �≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  13. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  14. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  15. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 � f � H ≥ sup x �∇ f ( x ) � 2 (gradient penalty) Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  16. Approximating the RKHS norm Our approach : use upper and lower bound approximations of � f � H Upper bound : constraint/penalty on spectral norms Lower bounds : use � f � H = sup � u � H ≤ 1 � f , u � H = ⇒ consider tractable subsets of the RKHS unit ball � f � H ≥ sup f ( x + δ ) − f ( x ) (adversarial perturbations) x , � δ �≤ 1 � f � H ≥ sup f ( x τ ) − f ( x ) (adversarial deformations) x , C ( τ ) ≤ 1 � f � H ≥ sup x �∇ f ( x ) � 2 (gradient penalty) Best performance by combining upper + lower bound approaches Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 4 / 5

  17. More Perspectives and Experiments Regularization approaches Unified view on various existing strategies, including links with robust optimization Theoretical insights Guarantees on adversarial generalization with margin bounds Insights on regularization for training generative models Experiments Improved performance on small data scenarios in vision and biological datasets Robustness benefits with large adversarial perturbations Poster #223 Bietti, Mialon, Chen and Mairal Kernel regularization of deep nets ICML 2019, Long Beach 5 / 5

Recommend


More recommend