Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Perceptrons Introduction: Neural Networks 1 The Perceptron 2 Using Perceptrons Perceptrons Training Multilayer Perceptrons 3 Structure Steven J Zeil Training MLPs 4 Backpropagation Old Dominion Univ. Improving Convergence Fall 2010 OverTraining Tuning Network Size Applying MLPs 5 Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks 1 2 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Neural Networks Computing via NN Networks of processing units (neurons) with connections Not so much an attempt to imitate the brain as inspired by it (synapses) between them A model for massive parallel processing Large number of neurons: 10 10 Simplest building block: the perceptron Large connectitivity: 10 5 Parallel processing Robust 3 4
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs The Perceptron Perceptron Introduction: Neural Networks 1 The Perceptron 2 Rosenblatt, 1962 Using Perceptrons w i are connection Training weights Multilayer Perceptrons 3 w T � y = � x Structure � w = [ w 0 , w 1 , . . . , w d ] Training MLPs 4 Backpropagation � x = [1 , x 1 , x 2 , . . . , w d ] Improving Convergence ( augmented input vector) OverTraining Tuning Network Size Applying MLPs 5 Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks 5 6 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Basic Uses Perceptron Output Functions Many perceptrons have a w T � y = � x + w 0 “post-processing” function at the output node. A common choice is the Linear regression threshold: Linear discriminant � 1 between 2 classes w T � if � x > 0 y = Use multiple 0 ow perceptrons for K > 2 classes Useful for classification. 7 8
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Sigmoid Output Functions K Classes Useful when we need differentiation or w T the ability to estimate posterior probs. o i = � i � x Use softmax: 1 exp o i y = sigmoid ( o ) = y i = 1 + exp [ − � w T � x ] � k exp o k Choose C i if y i = max k y k 9 10 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Training Update Rule: Regression Allows online (incremental) training rather than the usual Error function is batch x t , r t ) = 1 2( r t − � E t ( � w T � x t ) 2 No need to store whole sample w | � Adjusts to slow changes in the problem domain Incremental form of gradient-descent: update in direction of with gradient components gradient after each training instance ∂ E t − ( r t − � w T � x t ) x t LMS update: = i ∂ w t ∆ w t r t i − y t x t � � ij = η i i j − ( r t − y t ) x t = i η is the learning factor - size controls rate of convergence and stability Therefore to move in the direction of the gradient ∆ w t r t i − y t x t � � ij = η i j 11 12
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Update Rule: Classification Example: Learning Boolean Functions Example: spreadsheet ∆ w t r t i − y t x t � � ij = η demonstrates that perceptrons can learn linearly separable i j functions (AND, OR, NAND, . . . ) For K=2, but cannot learn XOR y t = sigmoid ( � w T � x ) Minsky & papert, 1969 Nearly halted all work on neural networks until 1982 leads to same update function as for regression For K > 2 softmax leads to same update as well. 13 14 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Multilayer Perceptrons Multilayer Perceptrons Introduction: Neural Networks 1 Adds one or more hidden layers The Perceptron 2 Using Perceptrons H � v T � y i = � z = v ih z h + v i 0 Training h =1 Multilayer Perceptrons 3 Structure Training MLPs z h w T � 4 = sigmoid ( � x ) Backpropagation 1 = Improving Convergence � �� d �� 1 + exp − j =1 w hj x j + w h 0 OverTraining Tuning Network Size Applying MLPs 5 (Rumelhart et al. 1986) Structuring Networks Dimensionality Reduction Time Delay Neural Networks Recurrent Networks 15 16
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Learning XOR MLP as a Universal Approximator Any function with continuous inputs and outputs can be approximated by an MLP Given two hidden layers, can use one to divide input domain and the other to compute a piecewise linear regression function Hidden layers may need to be arbitrarily wide 17 18 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Training MLPs Training MLPs: Backpropagation Introduction: Neural Networks 1 The Perceptron 2 H Using Perceptrons � v T � y i = � z = v ih z h + v i 0 Training h =1 Multilayer Perceptrons 3 Structure z h w T � Training MLPs = sigmoid ( � x ) 4 Backpropagation 1 = Improving Convergence � �� d �� 1 + exp − j =1 w hj x j + w h 0 OverTraining Tuning Network Size Applying MLPs Given the z values, we could 5 Structuring Networks train the � v as we do a Dimensionality Reduction single-layer perceptron. Time Delay Neural Networks ( r t − y t ) z t � ∆ v h = η Recurrent Networks h 19 20 t How to get the W ?
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Backpropagation (cont.) Backpropagation Algorithm Initialize all v ih and w hj to rand(-0.01,0.01) repeat x t , r t ) ∈ X in random order do for all ( � − η ∂ E for h=1 , . . . , H do w T x t ) ∆ w hj = z h ← sigmoid ( ˜ h ˜ ∂ w hj end for for i=1 , . . . , K do v T y i ← � i � ∂ E ∂ y i ∂ z h z = end for for i=1 , . . . , K do ∂ y i ∂ z h ∂ w hj v i ← η ( r t i − y t ∆ � i ) � z end for − ( r t − y t ) ∂ y i ∂ z h � for h=1 , . . . , H do = − η i ( r t i − y t x t ∆ � w h ← η ( � i ) v ih ) z h (1 − z h ) � ∂ z h ∂ w hj end for t for i=1 , . . . , K do ∂ z h � v i ← � v i + ∆ � v i � − ( r t − y t ) v h = − η end for for h=1 , . . . , H do ∂ w hj w h ← � � w h + ∆ � t w h end for � − ( r t − y t ) v h z t h (1 − z t h ) x t = − η end for j until convergence t 21 22 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Applying Backpropagation Example of Batch Learning Batch learning: make multiple passes over entire sample Update � v and � w after each entire pass Each pass is called an epoch Online learning: one pass, smaller η 23 24
Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Multiple hidden Levels Improving Convergence Multiple hidden levels are possible Momentum: Attempts to damp out oscillations by averaging in the “trend” of prior updates Backpropagation generalizes to any nuimber of levels. i = − η∂ E t ∆ w t + α ∆ w t − 1 i ∂ w i 0 . 5 ≤ α < 1 . 0 Adaptive Leanring rate: Keep η large when learning is going on, decreasing it later � + a if E t + τ < E t ∆ η = − b η otherwise Note that increase is arithmetic, but decrease is geometric. 25 26 Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs Introduction: Neural Networks The Perceptron Multilayer Perceptrons Training MLPs Applying MLPs OverTraining Overtraining Example MLPs are subject to overtaining partly due to large number of parameters but also is a function of training time w i start near zero - in effect the paramters are ignored Early training steps move the more important attributes’ weights away from zero As training continues, we start fittign to noise by moving the weights of less important attributes away from zero In effect, adding more parameters to the model over time. 27 28
Recommend
More recommend