IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko - PowerPoint PPT Presentation

IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 26

Outline ◮ Why multilayer artificial neural networks (ANNs)? ◮ Representation Power of ANNs ◮ Training ANNs: backpropagation ◮ Learning Hidden Layer Representations ◮ Examples ◮ Recurrent Neural Networks ◮ W & F sec 6.3, multilayer perceptrons, backpropagation (details on pp 230-232 not required), radial basis function networks 2 / 26

Why we need multilayer networks ◮ Networks without hidden units are very limited in the input-output mappings they can represent ◮ More layers of linear units do not help, it is still linear ◮ Fixed non-linearities φ ( x ) are problematic; what are good basis functions to choose ?   � f ( x ) = g w j φ j ( x )  j ◮ We get more power from multiple layers of adaptive non-linear hidden units 3 / 26

Artificial Neural Networks (ANNs) ◮ The field of neural networks grew up out of simple models of neurons ◮ Research was done into what networks of these neurons could achieve ◮ Neural networks proved to be a reasonable modelling tool ◮ Which is funny really as they never were very good models of neurons... or of neural networks ◮ But when understood in terms of learning from data, they proved to be powerful 4 / 26

An example network with 2 hidden layers . . . output layer . . . hidden layer 2 . . . hidden layer 1 input layer (x) 5 / 26

◮ There can be an arbitrary number of hidden layers ◮ Each unit in the first hidden layer computes a non-linear function of the input x ◮ Each unit in a higher hidden layer computes a non-linear function of the outputs of the layer below ◮ Common choices for the hidden-layer non-linearities are the logistic function g ( z ) = 1 / ( 1 + e − z ) or the Gaussian function ◮ Logistic nonlinearity → multilayer perceptron (MLP) ◮ Gaussian nonlinearity → radial basis function (RBF), normally only 1 hidden layer 6 / 26

◮ Output units compute a linear combination of the outputs of the final hidden layer and pass it through a transfer function g () ◮ g is the identity function for a regression task (cf linear regression) ◮ g is the logistic function for a two-class classification task (cf logistic regression) 7 / 26

Representation Power of ANNs ◮ Boolean functions: ◮ Every boolean function can be represented by network with single hidden layer ◮ but might require exponential (in number of inputs) hidden units ◮ Continuous functions: ◮ Every bounded continuous function can be approximated with arbitrarily small error, by network with one hidden layer [Cybenko 1989; Hornik et al. 1989] ◮ Any function can be approximated to arbitrary accuracy by a network with two hidden layers [Cybenko 1988]. ◮ Neural Networks are universal approximators . 8 / 26

ANN predicting 1 of 10 vowel sounds based on formats F1 and F2 Figure from Mitchell (1997) 9 / 26

Limitations of Representation Power Results ◮ The fact that a function is representable does not tell us how many hidden units would be required for its approximation ◮ Nor does it tell us if it is learnable (a search problem) ◮ Nor does it say anything about how much training data would be needed to learn the function ◮ In fact universal approximation has only a limited benefit: need bias 10 / 26

Training ANNs ◮ As in linear and logistic regression, we create an error function that measures the agreement of the target y ( x ) and the prediction f ( x ) ◮ Linear regression, squared error: E = � n i = 1 ( y i − f ( x i )) 2 ◮ Logistic regression (0/1 labels): E = � n i = 1 y i log f ( x i ) + ( 1 − y i ) log ( 1 − f ( x i )) ◮ These are both related to the log likelihood of the data under the relevant model ◮ For linear and logistic regression the optimization problem for w had a unique optimum; this is no longer the case for ANNs (e.g. hidden layer neurons can be permuted) 11 / 26

Backpropagation ◮ As discussed for logistic regression, we need the gradient of E wrt all the parameters w , i.e. g ( w ) = ∂ E ∂ w ◮ This is in fact an exercise in using the chain rule to compute derivatives; for ANNs this is given the name backpropagation ◮ We make use of the layered structure of the net to compute the derivatives, heading backwards from the output layer to the inputs ◮ Once you have g ( w ) , you can use your favourite optimization routines to minimize E ; see discussion of gradient descent and other methods in Logistic Regression slides ◮ It can make sense to use a regularization penalty (e.g. λ | w | 2 ) to help control overfitting 12 / 26

Batch vs online ◮ Batch learning: use all patterns in training set, and update weights after calculating ∂ E ∂ E i � ∂ θ = ∂ θ i ◮ On-line learning: adapt weights after each pattern presentation, using ∂ E i ∂ θ ◮ Batch more powerful optimization methods ◮ Batch easier to analyze ◮ On-line more feasible for huge or continually growing datasets ◮ On-line may have ability to jump over local optima 13 / 26

Convergence of Backpropagation ◮ Dealing with local minima. Train multiple nets from different starting places, and then choose best (or combine in some way) ◮ Initialize weights near zero; therefore, initial networks are near-linear ◮ Increasingly non-linear functions possible as training progresses 14 / 26

Training ANNs: Summary ◮ Optimize over vector of all weights/biases in a network ◮ All methods considered find local optima ◮ Gradient descent is simple but slow ◮ In practice, second-order methods ( conjugate gradients ) are used for batch learning ◮ Overfitting can be a problem 15 / 26

Fitting this into the general structure for learning algorithms: ◮ Define the task : classification or regression, discriminative ◮ Decide on the model structure : ANN ◮ Decide on the score function : log likelihood ◮ Decide on optimization/search method to optimize the score function: numerical optimization routine 16 / 26

Hypothesis space and Inductive Bias for ANNs ◮ Hypothesis space : if there are | w | weights and biases � w | w ∈ R | w | � H = ◮ Inductive Bias : hard to characterize, depends on search procedure, regularization and how weight space spans the space of representable functions ◮ Approximate statement: smooth interpolation between data points 17 / 26

Learning Hidden Layer Representations ◮ Backprop can develop intermediate representations of its inputs in the hidden layers ◮ These new features will capture properties of the input instances that are most relevant to learning the target function ◮ This ability to automatically discover useful hidden-layer representations is a key feature of ANN learning 18 / 26

Example 1: Neural Net Language Models Y Bengio et al, JMLR 3, 1137-1155 (2003) ◮ Predict word w t given preceeding words w t − 1 , w t − 2 etc ◮ Simple way is to estimate the trigram model count ( abc ) p ( w t = c | w t − 1 = b , w t − 2 = a ) = c ′ count ( abc ′ ) � ◮ Can’t use bigger context due to sparse data problems ◮ But this method uses no sharing across related words; we want a feature-based representation , so that e.g. cat and dog may share some features 19 / 26

Figure credit: Bengio et al, 2003 20 / 26

◮ Learned distributed encoding of each context word ◮ These are transformed by a hidden layer, followed by ◮ Softmax distribution over all possible words ◮ Predictive performance measured by perplexity (the geometric average of 1 / p ( w t | context ) ◮ Neural network is about 24% better on Brown corpus, 8% better on AP corpus than the best n-gram results 21 / 26

Example 2: Le Net e.g. LeCun and Bengio, 1995 ◮ Task is to recognize handwritten digits ◮ “Le Net” is a multilayer backprop net which has many hidden layers ◮ Alternation of convolutional features, followed by subsampling ◮ Final output is a softmax over the 10 classes Figure credit: LeCun et al, 1995 22 / 26

◮ The convolutional approach allows the net to identify certain features, even if they have been shifted in the image ◮ Subsampling affords a small amount of translational invariance at each stage ◮ Convolutional nets give the best performance on the MNIST dataset (best is now 0.39% error) 23 / 26

Recurrent Neural Networks Connectivity does not have to be feedforward, there can be directed cycles. This can give rise to richer behaviour: ◮ The network can oscillate—good for motor control? ◮ It can converge to a point attractor: good for classification? ◮ It can behave chaotically: but this is usually a bad idea for information processing ◮ It can use activities as hidden state, to remember things for a long time 24 / 26

’’ ’’ V V 1 2 w 12 w 12 w w 11 22 w V V w w 11 1 2 22 21 ’ ’ V V 1 2 w w 21 12 w w 11 22 w 21 V V 1 2 ◮ Recurrent networks can also be trained using backpropagation 25 / 26

ANNs: Summary ◮ Artificial neural networks are a powerful nonlinear modelling tool for classification and regression ◮ Trained by optimization methods making use of the backpropagation algorithm to compute derivatives ◮ Local optima in optimization are present, cf linear and logistic regression (and kernelized versions thereof, e.g. SVM) ◮ Ability to automatically discover useful hidden-layer representations 26 / 26

IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko - PowerPoint PPT Presentation

IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 26 Outline Why multilayer artificial neural networks (ANNs)? Representation Power of ANNs Training ANNs: backpropagation

IAML: Artificial Neural Networks Charles Sutton and Victor Lavrenko School of Informatics

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Artificial Neural Networks Roger Barlow CODATA School - Roger Barlow -Artificial Neural Networks

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Networks Luke Schuler Overview What is an Artificial Neural Network? History

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Introduction to Artificial Intelligence Lirong Xia Thursday, January 18, 2018 Basic information

Artificial Intelligence Artificial Intelligence Course: CS40002 Course: CS40002 Instructor: Dr.

Search Introduction and Problem Formulation Alice Gao Lecture 3 Based on work by K.

Flows and Discrete VAEs Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CS145: INTRODUCTION TO DATA MINING 6: Vector Data: Neural Network Instructor: Yizhou Sun

Bo osting Neural Net w orks pap er No Holger Sc h w enk LIMSICNRS

COMP24111: Machine Learning and Optimisation Chapter 5: Neural Networks and Deep Learning Dr.

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of

IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko - PowerPoint PPT Presentation

IAML: Artificial Neural Networks Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 26 Outline Why multilayer artificial neural networks (ANNs)? Representation Power of ANNs Training ANNs: backpropagation

IAML: Artificial Neural Networks Charles Sutton and Victor Lavrenko School of Informatics

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Artificial Neural Networks By: Kodi Neumiller Overview What is an artificial neural network

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Artificial Neural Networks Roger Barlow CODATA School - Roger Barlow -Artificial Neural Networks

How Neural Networks (NN) Biological Neuron: A . . . Can (Hopefully) Learn Artificial Neural . .

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training

Networks Luke Schuler Overview What is an Artificial Neural Network? History

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS4501: Introduction to Computer Vision Neural Networks (NNs) Artificial Neural Networks (ANNs)

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Introduction to Artificial Intelligence Lirong Xia Thursday, January 18, 2018 Basic information

Artificial Intelligence Artificial Intelligence Course: CS40002 Course: CS40002 Instructor: Dr.

Search Introduction and Problem Formulation Alice Gao Lecture 3 Based on work by K.

Flows and Discrete VAEs Instructor: John Thickstun Discussion Board: Available on Ed Zoom Link:

CS145: INTRODUCTION TO DATA MINING 6: Vector Data: Neural Network Instructor: Yizhou Sun

Bo osting Neural Net w orks pap er No Holger Sc h w enk LIMSICNRS

COMP24111: Machine Learning and Optimisation Chapter 5: Neural Networks and Deep Learning Dr.

Introduction to Deep Learning M S Ram Dept. of Computer Science &amp; Engg. Indian Institute of

Introduction to Deep Learning M S Ram Dept. of Computer Science & Engg. Indian Institute of