artificial neural networks
play

Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Artificial Neural Networks Oliver Schulte - CMPT 726 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural networks


  1. Feed-forward Networks Network Training Error Backpropagation Applications Artificial Neural Networks Oliver Schulte - CMPT 726

  2. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks • Neural networks arise from attempts to model human/animal brains • Many models, many claims of biological plausibility • We will focus on multi-layer perceptrons • Mathematical properties rather than plausibility • Prof. Hadley CMPT418

  3. Feed-forward Networks Network Training Error Backpropagation Applications Uses of Neural Networks • Pros • Good for continuous input variables. • General continuous function approximators. • Highly non-linear. • Learn feature functions. • Good to use in continuous domains with little knowledge: • When you don’t know good features. • You don’t know the form of a good functional model. • Cons • Not interpretable, “black box”. • Learning is slow. • Good generalization can require many datapoints.

  4. Feed-forward Networks Network Training Error Backpropagation Applications Applications There are many, many applications. • World-Champion Backgammon Player. http://en.wikipedia.org/wiki/TD-Gammon http://en.wikipedia.org/wiki/Backgammon • No Hands Across America Tour. http://www.cs.cmu.edu/afs/cs/usr/tjochem/ www/nhaa/nhaa_home_page.html • Digit Recognition with 99.26% accuracy. • ...

  5. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  6. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  7. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  8. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  9. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  10. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  11. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D w ( 1 ) ji x i + w ( 1 ) � a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  12. Feed-forward Networks Network Training Error Backpropagation Applications Activation Functions • Can use a variety of activation functions • Sigmoidal (S-shaped) • Logistic sigmoid 1 / ( 1 + exp ( − a )) (useful for binary classification) • Hyperbolic tangent tanh i ( x i − w ji ) 2 • Radial basis function z j = � • Softmax • Useful for multi-class classification • Hard Threshold • . . . • Should be differentiable for gradient-based learning (later) • Can use different activation functions in each unit

  13. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks hidden units z M w (1) w (2) MD KM x D y K outputs inputs y 1 x 1 w (2) z 1 10 x 0 z 0 • Connect together a number of these units into a feed-forward network (DAG) • Above shows a network with one layer of hidden units • Implements function: � D   M � w ( 2 ) w ( 1 ) ji x i + w ( 1 ) + w ( 2 ) � � y k ( x , w ) = h kj h  j 0 k 0  j = 1 i = 1 • See http://aispace.org/neural/ .

  14. Feed-forward Networks Network Training Error Backpropagation Applications A general network ... ... target t t 1 t 2 t k t c output z z 1 z 2 z k z c output ... ... w kj y 1 y 2 y j y n H hidden ... ... w ji x 1 x 2 x i x d input ... ... ... ... input x x 1 x 2 x i x d

  15. Feed-forward Networks Network Training Error Backpropagation Applications The XOR Problem Revisited x 2 z=-1 1 R 2 R 1 z=+1 x 1 -1 1 R 2 z=-1 -1

  16. Feed-forward Networks Network Training Error Backpropagation Applications The XOR Problem Solved z x 2 1 0 1 -1 0 -1 0 -1 1 x 1 z k output k y 1 y 2 x 2 x 2 1 -.4 -1 w kj 1 y 1 y 2 .7 0 0 1 1 -1 -1 -1.5 hidden j 0 0 -1 -1 bias 0 .5 0 -1 -1 1 w ji 1 1 1 1 x 1 x 1 1 input i x 2 x 1

  17. Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units Compute Basis Functions • red dots = network function • dashed line = hidden unit activation function. • blue dots = data points Network function is roughly the sum of activation functions.

  18. Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units As Feature Extractors sample training patterns ... ... ... learned input-to-hidden weights • 64 input nodes • 2 hidden units • learned weight matrix at hidden units

  19. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  20. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1

  21. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1

  22. Feed-forward Networks Network Training Error Backpropagation Applications Parameter Optimization E ( w ) w 1 w A w B w C w 2 ∇ E • For either of these problems, the error function E ( w ) is nasty • Nasty = non-convex • Non-convex = has local minima

  23. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  24. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  25. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + η w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

Recommend


More recommend