Training Neural Networks Some considerations Gaurav Kumar Center - PowerPoint PPT Presentation

Mar 18, 2024 •982 likes •1.19k views

Training Neural Networks Some considerations Gaurav Kumar Center for Language and Speech Processing gkumar@cs.jhu.edu Universal Approximators Neural networks can approximate [1] function. any Capacity Layers hidden layer size

Training Neural Networks Some considerations Gaurav Kumar Center for Language and Speech Processing gkumar@cs.jhu.edu
Universal Approximators • Neural networks can approximate [1] function. any • Capacity • Layers • hidden layer size • A bsence of regularization • Optimal activation functions and hyper-parameters. • Training data [1] K. Hornik, M. Stinchcombe, and H. White. 1989. Multilayer feedforward networks are universal approximators. Neural Netw. 2, 5 (July 1989) : proved this for a specific class of functions.
Universal Approximators • We will focus on two important aspects of training: • Ideal properties of parameters during training • Generalization error • Other things to consider: • Hyper-parameter optimization • Choice of model, loss functions • Learning rates (Use Adadelta or Adam) • …
Properties of Parameters • Responsive to activation functions • Numerically stable
Activation Saturation Sigmoid Relu
Initialization of weight matrices • Are you using a non-recurrent NN ? • Use the Xavier initialization • (use small values to initialize bias vectors) Glorot & Bengio (2010), He et.al (2015)
Initialization of weight matrices (Xavier, He) • Tanh • Sigmoid • Relu
Initialization of weight matrices • Are you using a recurrent NN ? • With LSTMs : Use the Saxe initialization • All weight matrices initialized to be orthonormal (Gaussian noise -> SVD) • Without LSTMS • All weight matrices initialized to identity Saxe et al, 2014,
Watch your input • A high variance in input features may cause saturation very early • Mean subtraction : Same mean across all features • Normalization : Same scale across all features
Numerical stability • Floating point precision causes values to overflow or underflow • Instead, compute
Numerical stability L = − t log( p ) − (1 − t )log(1 − p ) • Cross Entropy Loss • Probabilities close to 0 for the correct label will cause underflow • Use range clipping. All values between 0.000001 and 0.999999.
Generalization Preventing Overfitting
Regularization • L2 regularization • L1 regularization • Gradient clipping (max norm constraints)
Regularization • Perform layer-wise regularization • After computing the activated value of each layer, normalize with the L2 norm. • No regularization hyper-parameters • No waiting till back-propagation for weight penalties to flow in
Dropout Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15
Dropout
Dropout • Interpret as regularization • Interpret as training an ensemble of thinned networks

Recommend

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural Networks can represent complex decision boundaries decision boundaries Variable size. Any boolean function can be Variable size. Any boolean

358 views • 14 slides

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks and Handwriting Recognition Steven Sloss Math 164 Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven Sloss Structure Training Neural Networks Math 164 Motivation Problem

889 views • 41 slides

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Feed-forward Networks Network Training Error Backpropagation Deep Learning Feed-forward Networks Network Training Error Backpropagation Deep Learning Neural Networks Neural networks arise from attempts to model Neural Networks

380 views • 9 slides

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks and their Application to Go A. Bausch Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training neural networks Problems AlphaGo Anne-Marie Bausch The Game of Go Policy Network

280 views • 24 slides

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks Recurrent Neural

303 views • 4 slides

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Feed-forward Networks Network Training Error Backpropagation Applications Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural networks

956 views • 46 slides

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural IR tasks Neural IR architecture Feature Representations Neural IR query auto completion Neural IR query suggestion Neural IR document

1.48k views • 18 slides

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I : Recurrent Neural Networks CHAPTER I Recurrent Neural Networks Introduction In this chapter first the

404 views • 27 slides

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory Associative Memory CHAPTER III : III : Neural Networks as Associative Memory CHAPTER Neural Networks as

513 views • 22 slides

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks<br/><br/> 5/4/19, 4(03 PM Convolutional Neural Networks<br/><br/> 5/4/19, 4(03 PM Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use UMaine

412 views • 9 slides

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks have become one of the major thrust areas recently in various pattern recognition, prediction, and analysis problems In many problems they have

852 views • 33 slides

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks have become one of the major thrust areas recently in various pattern recognition, prediction, and analysis problems In many problems they have

1.17k views • 91 slides

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg Durrett Neural Networks Neural Networks Linear classification: argmax y w > f ( x, y ) possible because Linear Neural we transformed

316 views • 4 slides

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography Hopfield, J. J., "Neural networks and physical systems with emergent collective computational abilities," Proceedings of the National Academy

367 views • 19 slides

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural networks have become one of the major thrust areas recently in various pattern recognition, prediction, and analysis problems In many problems they

1.63k views • 119 slides

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova November 21, 2016 Neural Networks 2/20 Neural Networks 3/20 Neural Networks Neural computing requires a number of neurons , to be connected

813 views • 21 slides

Modeling/Checking Approximate Computing Sara Achour, David Bindel, Luis Ceze, Eva Darulova,

Modeling/Checking Approximate Computing Sara Achour, David Bindel, Luis Ceze, Eva Darulova, Andreas Gerstlauer, Karthik Pattabiraman, Ulya Karpuzcu, Subhasish Mitra Goal overview Paper 1: Modeling Approximation Errors at the ISA - Goal:

136 views • 11 slides

LARP PS2 Budget Guidance FY10 By Lab: k$ By Task: k$ LBNL 262 Sp-Ch sim. 173 FNAL 39

LARP PS2 Budget Guidance FY10 By Lab: k$ By Task: k$ LBNL 262 Sp-Ch sim. 173 FNAL 39 Instabilities 130 BNL 133 e-Cloud 199 SLAC 237 feedback 40 Total: 671 IPM: 133 Total 675 By Lab & Project FY10 PersonYear tot SLAC

462 views • 8 slides

Numerical Relativity simulations of black holes: Methodology and Computational Framework U.

Numerical Relativity simulations of black holes: Methodology and Computational Framework U. Sperhake CSIC-IEEC Barcelona Numerical Cosmology 2012 19 th July 2012 U. Sperhake (CSIC-IEEC) Numerical Relativity simulations of black holes:

833 views • 67 slides

Differential Equations Overview of differential equation Initial value problem

Differential Equations Overview of differential equation Initial value problem Explicit numeric methods Implicit numeric methods Modular implementation Physics-based simulation Its an algorithm that produces a

653 views • 63 slides

ttPrt r

ttPrt r rss r r s

441 views • 22 slides

Low-frequency stability analysis of periodic traveling-wave solutions of viscous conservation

Low-frequency stability analysis of periodic traveling-wave solutions of viscous conservation laws in several dimensions Myunghyun Oh Department of Mathematics University of Kansas Stability Analysis, HYP2006 p.1/16 Outline of the talk

491 views • 48 slides

Stability of quantum many-body systems with point interactions Robert Seiringer IST Austria

Stability of quantum many-body systems with point interactions Robert Seiringer IST Austria Joint work with Thomas Moser arXiv:1609.08342, Commun. Math. Phys. (in press) Quantissima in the Serenissima II Venice, August 2125, 2017 R.

279 views • 13 slides

Information-theoretic Planck scale cutoff: Predictions for the CMB Achim Kempf With: A.

Information-theoretic Planck scale cutoff: Predictions for the CMB Achim Kempf With: A. Chatwin-Davies (CalTech), R. Martin (U. Cape Town) Departments of Applied Mathematics and Physics Institute for Quantum Computing, University of Waterloo

349 views • 33 slides