Regularization for Deep Learning Lecture slides for Chapter 7 of - PowerPoint PPT Presentation

Jan 25, 2023 •49 likes •181 views

Regularization for Deep Learning Lecture slides for Chapter 7 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27 Definition Regularization is any modification we make to a learning algorithm that is intended to reduce

Regularization for Deep Learning Lecture slides for Chapter 7 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-27
Definition • “Regularization is any modification we make to a learning algorithm that is intended to reduce its generalization error but not its training error.” (Goodfellow 2016)
Weight Decay as Constrained Optimization w ∗ ˜ w 2 w w 1 Figure 7.1 (Goodfellow 2016)
Norm Penalties • L1: Encourages sparsity, equivalent to MAP Bayesian estimation with Laplace prior • Squared L2: Encourages small weights, equivalent to MAP Bayesian estimation with Gaussian prior (Goodfellow 2016)
Dataset Augmentation Elastic A ffi ne Noise Deformation Distortion Random Horizontal Translation Hue Shift flip (Goodfellow 2016)
Multi-Task Learning y (1) y (1) y (2) y (2) h (1) h (1) h (2) h (2) h (3) h (3) h (shared) h (shared) x Figure 7.2 (Goodfellow 2016)
Learning Curves Early stopping: terminate while validation set performance is better 0 . 20 Loss (negative log-likelihood) Training set loss Validation set loss 0 . 15 0 . 10 0 . 05 0 . 00 0 50 100 150 200 250 Time (epochs) Figure 7.3 (Goodfellow 2016)
Early Stopping and Weight Decay w ∗ w ∗ ˜ ˜ w 2 w 2 w w w 1 w 1 Figure 7.4 (Goodfellow 2016)
Sparse Representations 2 0 3 2 3 2 3 � 14 3 � 1 2 � 5 4 1 2 6 7 1 4 2 � 3 � 1 1 3 6 7 6 7 6 7 0 6 7 6 7 6 7 19 = � 1 5 4 2 � 3 � 2 6 7 6 7 6 7 0 (7.47) 6 7 6 7 6 7 2 3 1 2 � 3 0 � 3 6 7 4 5 4 5 � 3 4 5 23 � 5 4 � 2 2 � 5 � 1 0 B 2 R m ⇥ n y 2 R m h 2 R n (Goodfellow 2016)
Bagging Original dataset First ensemble member First resampled dataset 8 Second resampled dataset Second ensemble member 8 Figure 7.5 (Goodfellow 2016)
Dropout y y y y Figure 7.6 h 1 h 1 h 2 h 2 h 1 h 1 h 2 h 2 h 1 h 1 h 2 h 2 h 2 h 2 x 1 x 1 x 2 x 2 x 2 x 2 x 1 x 1 x 1 x 1 x 2 x 2 y y y y y h 1 h 1 h 1 h 1 h 2 h 2 h 2 h 2 h 1 h 1 h 2 h 2 x 1 x 1 x 2 x 2 x 1 x 1 x 2 x 2 x 2 x 2 y y y y x 1 x 1 x 2 x 2 h 1 h 1 h 1 h 1 h 2 h 2 Base network x 1 x 1 x 2 x 2 x 1 x 1 x 1 x 1 y y y y h 2 h 2 h 1 h 1 x 2 x 2 (Goodfellow 2016) Ensemble of subnetworks
Adversarial Examples + . 007 ⇥ = x + sign ( r x J ( θ , x , y )) x ✏ sign ( r x J ( θ , x , y )) y = “panda” “nematode” “gibbon” w/ 57.7% w/ 8.2% w/ 99.3 % confidence confidence confidence Figure 7.8 Training on adversarial examples is mostly intended to improve security, but can sometimes provide generic regularization. (Goodfellow 2016)
Tangent Propagation Normal Tangent x 2 x 1 Figure 7.9 (Goodfellow 2016)

Recommend

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Deep learning Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of technology October 9, 2019 Hamid Beigy | Sharif university of technology | October 9, 2019 1 / 57 Deep learning Table of contents 1

661 views • 64 slides

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization, Early stopping, Dataset augmentation, Parameter sharing and tying, Injecting noise at input, Ensemble methods, Dropout Mitesh M. Khapra Department of

1.06k views • 86 slides

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970 Lecture 3: Stephen Scott Stephen Scott and Vinod and Vinod Regularization Variyam Variyam Machine learning can generally be distilled to an

551 views • 9 slides

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Regularization is a general approach to add a complexity parameter to a learning algorithm. Requires that the model parameters be continuous. (i.e., Regression OK, IAML: Regularization and Ridge Regression Decision trees

204 views • 3 slides

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Regularization Overview Regularization Overview Problems & Multicollinearity We will discuss three popular methods for obtaining better estimates of the linear model coefficients Regularization Techniques Principal

305 views • 12 slides

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep 3D Representation Learning for Visual Computing Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms Conclusion 2 Outline Overview of 3D deep learning Background 3D deep learning tasks 3D deep

1.66k views • 122 slides

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep Learning Image from http://imgur.com/ Deep

1.15k views • 79 slides

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for Multi-Output Learning About this class Goal In many practical problems, it is convenient to model the object of interest as a function with multiple

507 views • 47 slides

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep learning is most often tested on regularization (Bayesian Occams Razor, description length regularization) smoothing the predictions

420 views • 25 slides

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About this class Goal To analyze the limits of learning from examples in high dimensional spaces. To introduce the semi-supervised setting and the use of

437 views • 31 slides

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso Poggio The Learning Problem and Regularization Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference

1.28k views • 113 slides

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso Poggio The Learning Problem and Regularization Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference

1.23k views • 105 slides

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso Poggio The Learning Problem and Regularization Computational Learning Statistical Learning Theory Learning is viewed as a generalization/inference

1.31k views • 112 slides

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco Regularization via Spectral Filtering About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse

889 views • 48 slides

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

June 2006 Trevor Hastie, Stanford Statistics 1 June 2006 Trevor Hastie, Stanford Statistics 2 Theme Regularization Paths Boosting fits a regularization path toward a max-margin classifier. Svmpath does as well. Trevor Hastie In

112 views • 9 slides

React A"JavaScript"Library"For"Building"User"Interfaces

React A"JavaScript"Library"For"Building"User"Interfaces React&is&not&FRP [Func&onal]+Reac&ve+programming+is+programming+with+ asynchronous+data+streams. !!"Andr"Staltz 1 1

916 views • 24 slides

Interplay between the Beale-Kato-Majda theorem and the analyticity-strip method to investigate

Interplay between the Beale-Kato-Majda theorem and the analyticity-strip method to investigate numerically the incompressible Euler singularity problem Miguel Bustamante and Marc Brachet " s w o fl d n a s e l c i t r a p

1.02k views • 45 slides

pix ) pclx : Likelihood : arggrax Posterior piy.ES/pc5 ) pcxly ) : = |dEpc5 )

Inference Recap Bayesian : Prior pix ) pclx : Likelihood : arggrax Posterior piy.ES/pc5 ) pcxly ) : = |dEpc5 ) Evidence pcj ) = : |dEp(y*lI)pcEl5 ( Marginal likelihood ) - MAP - - u plyix ) : pcxly )

863 views • 20 slides

Algorithms for the validation and correction of orthology relations Manuel Lafond University of

Algorithms for the validation and correction of orthology relations Manuel Lafond University of Ottawa Introduction Gene trees, species trees Duplication, speciation Orthologs, paralogs, why? Validation and correction of orthology relations

1.39k views • 118 slides

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu xiwu@cs.wisc.edu Joint work with Uyeong Jang, Jiefeng Chen, Lingjiao Chen, and Somesh Jha July 19, 2018 Xi Wu Model Confidence and Adversarial

740 views • 9 slides

Nonce-based Encryption Formalized by Rogaway Primary Condition Uniqueness of the nonce

Dhiman Saha 1 , Sukhendu Kuila 2 , Dipanwita Roy Chowdhury 1 1 Dept. Of Computer Science & Engineering, IIT Kharagpur, INDIA 2 Dept. Of Mathematics, Vidyasagar University, INDIA DIAC 2014, Santa Barbara, USA Nonce-based Encryption

298 views • 19 slides

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of

A Closer Look at Adversarial Examples for Separated Data Kamalika Chaudhuri University of California, San Diego Adversarial Examples Gibbon Panda Small perturbation to legitimate inputs causing misclassification Adversarial Examples Can

1.41k views • 42 slides

AT&T Research at TRECVID 2013: Surveillance Event Detection Xiaodong Yang * , Zhu Liu ,

AT&T Research at TRECVID 2013: Surveillance Event Detection Xiaodong Yang * , Zhu Liu , Eric Zavesky , David Gibbon , Behzad Shahraray City College of New York, CUNY AT&T Labs - Research *This work is carried out

637 views • 41 slides