neural networks
play

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural


  1. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5

  2. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks • Neural networks arise from attempts to model human/animal brains • Many models, many claims of biological plausibility • We will focus on multi-layer perceptrons • Mathematical properties rather than plausibility • Prof. Hadley CMPT418

  3. Feed-forward Networks Network Training Error Backpropagation Applications Uses of Neural Networks • Pros • Good for continuous input variables. • General continuous function approximators. • Highly non-linear. • Learn feature functions. • Good to use in continuous domains with little knowledge: • When you don’t know good features. • You don’t know the form of a good functional model. • Cons • Not interpretable, “black box”. • Learning is slow. • Good generalization can require many datapoints.

  4. Feed-forward Networks Network Training Error Backpropagation Applications Applications There are many, many applications. • World-Champion Backgammon Player. • No Hands Across America Tour. • Digit Recognition with 99.26% accuracy. • ...

  5. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  6. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  7. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  8. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

  9. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D � w ( 1 ) ji x i + w ( 1 ) a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  10. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D � w ( 1 ) ji x i + w ( 1 ) a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  11. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D � w ( 1 ) ji x i + w ( 1 ) a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

  12. Feed-forward Networks Network Training Error Backpropagation Applications Activation Functions • Can use a variety of activation functions • Sigmoidal (S-shaped) • Logistic sigmoid 1 / ( 1 + exp ( − a )) (useful for binary classification) • Hyperbolic tangent tanh • Radial basis function z j = � i ( x i − w ji ) 2 • Softmax • Useful for multi-class classification • Identity • Useful for regression • Threshold • . . . • Needs to be differentiable for gradient-based learning (later) • Can use different activation functions in each unit

  13. Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks hidden units z M w (1) w (2) MD KM x D y K outputs inputs y 1 x 1 w (2) z 1 10 x 0 z 0 • Connect together a number of these units into a feed-forward network (DAG) • Above shows a network with one layer of hidden units • Implements function:   � D � M � � w ( 2 ) w ( 1 ) ji x i + w ( 1 ) + w ( 2 ) y k ( x , w ) = h kj h   j 0 k 0 j = 1 i = 1

  14. Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units Compute Basis Functions • red dots = network function • dashed line = hidden unit activation function. • blue dots = data points

  15. Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

  16. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1 • For binary classification, this is another discriminative model, ML: N � y t n n { 1 − y n } 1 − t n p ( t | w ) = n = 1 N � E ( w ) = − { t n ln y n + ( 1 − t n ) ln ( 1 − y n ) } n = 1

  17. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1 • For binary classification, this is another discriminative model, ML: N � y t n n { 1 − y n } 1 − t n p ( t | w ) = n = 1 N � E ( w ) = − { t n ln y n + ( 1 − t n ) ln ( 1 − y n ) } n = 1

  18. Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1 • For binary classification, this is another discriminative model, ML: N � y t n n { 1 − y n } 1 − t n p ( t | w ) = n = 1 N � E ( w ) = − { t n ln y n + ( 1 − t n ) ln ( 1 − y n ) } n = 1

  19. Feed-forward Networks Network Training Error Backpropagation Applications Parameter Optimization E ( w ) w 1 w A w B w C w 2 ∇ E • For either of these problems, the error function E ( w ) is nasty • Nasty = non-convex • Non-convex = has local minima

  20. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  21. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

  22. Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

Recommend


More recommend