chapter vii vii chapter learning in recurrent networks
play

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning - PDF document

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks Introduction We


  1. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks Introduction We have examined the dynamics of recurrent neural networks in detail in Chapter 2. Then in Chapter 3, we used them as associative memory with fixed weights. In this chapter, the backpropagation learning algorithm that we have considered for feedforward networks in Chapter 6 will be extended to recurrent neural networks [Almeida 87, 88]. Therefore, the weights of the recurrent network will be adapted in order to use it as associative memory. Such a network is expected to converge to the desired output pattern when the associated pattern is applied at the network inputs. EE543 - ANN - CHAPTER 7 1

  2. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation x 1 1 Consider the recurrent system shown in the Figure 7.1, in which there are n neurons, u 1 x 2 some of them being input units, and some u 2 ... others outputs. ... x i u j In such a network, the units, which are neither ... ... input nor output, are called hidden neurons. x M u N input hidden output neurons neurons neurons Figure 7.1 Recurrent network architecture CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation We will assume a network dynamic defined as: dx ( ∑ = − + + θ i x f w x ) (7.1.1) i ji j i dt j This may be written equivalently as da ∑ i = − + + θ (7.1.2) a w f a ( ) i ji i i dt j through a linear transformation. EE543 - ANN - CHAPTER 7 2

  3. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Our goal is to update the weights of the network so that it will be able to remember predefined associations, µ k =( u k , y k ), u k ∈ R N , y k ∈ R N , k =1.. K . With no loss of generality, we extended here the input vector u such that u i =0 if the neuron i is not an input neuron. Furthermore, we will simply ignore the outputs of the unrelated neurons. We apply an input u k to the network by setting (7.1.3) θ = = u k i 1 .. N i i Therefore, we desire the network with an initial state x (0)= x k0 to converge to x k ( ∞ ) = x k ∞ = y k (7.1.4) whenever u k is applied as input to the network. CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation The recurrent backpropagation algorithm, updates the connection weights aiming to minimize the error ∑ e k = ε k 2 12 ( ) (7.1.5) i i so that the mean error is also minimized e =< ( ε k ) 2 > (7.1.6) EE543 - ANN - CHAPTER 7 3

  4. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, e k and e are scalar values while ε k is a vector defined as ε k =y k -x k (7.1.7) whose i th component ε i k , i =1.. M, is ε k = α k − k ( y x ) i (7.1.8) i i i In equation (7.1.8) the coefficient α i used to discriminate between the output neurons and the others by setting its value as ⎧ 1 if i is an output neuron α = ⎨ (7.1.9) i ⎩ 0 otherwise Therefore, the neurons, which are not output, will have no effect on the error. CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, if an input u k is applied to the network and if it is let to converge to a fixed point x k ∞ , the error depends on the weight matrix through these fixed points. The learning algorithm should modify the connection weights so that the fixed points satisfy ∞ = (7.1.10) k k x y i i For this purpose, we let the system to evolve in the weight space along trajectories in the opposite direction of the gradient, that is d w = η − ∇ k e (7.1.11) dt In particular w ij should satisfy = − η ∂ k d w e ij (7.1.12) ∂ dt w ij Here η is a positive constant named the learning rate, which should be chosen so small. EE543 - ANN - CHAPTER 7 4

  5. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Since, α ε = ε (7.1.13) i i i the partial derivative of e k given in Eq. (7.1.5) with respect to w sr becomes: ∂ e ∂ ∞ k k (7.1.14) ∑ x = − ε k i ∂ ∂ i w w sr i sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation On the other hand, since x k is a fixed point, it should satisfy k ∞ dx i = 0 (7.1.15) dt for which Eq. (7.1.1) becomes ∞ ∑ ∞ k = k + k x f ( w x u ) (7.1.16) ji i j i j Therefore , ∞ ∞ ∂ k ∂ k ∂ x x w ∞ ∑ ∞ ji j i = ′ k k + f ( a ) ( x w ) (7.1.17) ji ∂ i j ∂ ∂ w w w sr sr sr j where ∞ = d f ( a ) ′ k (7.1.18) f ( a ) ∞ k i = ∑ + k a w x u da ij j i j EE543 - ANN - CHAPTER 7 5

  6. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, ∂ w ij (7.1.19) = δ δ js ir ∂ w sr where δ ij is the Kronecker delta which have value 1 if i=j and 0 otherwise, resulting ∑ ∞ ∞ k δ δ = δ k x x (7.1.20) js ir ir s j j Hence, ∞ ∞ ∂ k ∂ k x x ∞ ∞ + ∑ j i = ′ k δ k (7.1.21) f ( a )( x w ) ir s ji ∂ ∂ i w w sr sr j By reorganizing the above equation, we obtain ∂ ∞ ∂ ∞ k k x x ∑ − ′ ∞ = ′ ∞ δ ∞ (7.1.22) j i k k k f ( a ) w f ( a ) x ∂ i ji ∂ i ir s w w sr j sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember ∂ ∞ ∂ ∞ k k x x ∑ − ′ ∞ = ′ ∞ δ ∞ j i k k k f ( a ) w f ( a ) x (7.1.22) ∂ ∂ i ji i ir s w w sr j sr • Notice that, ∞ ∞ ∂ k ∂ k x x = ∑ (7.1.23) j i δ ji ∂ ∂ w w sr sr j • Therefore, Eq. (7.1.22), can be written equivalently as, ∂ ∞ ∂ ∞ k k x ∑ x ∑ δ − ′ ∞ = ′ ∞ δ ∞ j (7.1.24) i k k k f ( a ) w f ( a ) x ∂ ∂ ji i ji i ir s w w j sr j sr or, ∂ ∞ k x ∑ δ − ′ ∞ = δ ′ ∞ ∞ (7.1.25) j k k k (( w f ( a )) f ( a ) x ∂ ji ji i ir i s w j sr EE543 - ANN - CHAPTER 7 6

  7. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember ∞ ∂ k x ∑ ′ ∞ ′ ∞ ∞ δ − j = δ k k k (( w f ( a )) f ( a ) x (7.1.25) ji ji i ∂ ir i s w j sr If we define matrix L k and vector R k such that • ∞ ∞ (7.1.26) k = δ − ′ k L f ( a ) w ij ji ij i and ∞ ′ ∞ = δ k k R f ( a ) (7.1.27) i ir i the equation (7.1.25) results in ∂ ∞ ∞ = ∞ ∞ k k k k (7.1.28) L x R x ∂ s w sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Hence, we obtain, ∂ ∞ = ∞ − ∞ k k 1 k x ( L ) R x ∂ s (7.1.29) w sr In particular, if we consider the i th row we observe that ∂ = ∑ ∞ ∞ − ∞ k k 1 k x ( ( L ) R ) x j ∂ w (7.1.30) i ij s sr j Since ∑ ∑ ∞ − ∞ − ′ ∞ ∞ − ′ ∞ = δ = k 1 k 1 k k 1 k ( L ) R ( L ) f ( a ) ( L ) f ( a ) ij j ij jr j ir r (7.1.31) j j we obtain ∂ ∞ ∞ − ∞ ∞ k = k 1 ′ k k x ( L ) f ( a ) x (7.1.32) ∂ w i ir r s sr EE543 - ANN - CHAPTER 7 7

  8. Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember = − η ∂ k d w e (7.1.12) ij ∂ dt w ij ∂ e ∂ ∞ k k x ∑ = − ε k i (7.1.14) ∂ i ∂ w w i sr sr ∂ ∞ ∞ − ∞ ∞ k = k 1 ′ k k x ( L ) f ( a ) x ∂ w i ir r s (7.1.32) sr • Insertion of (7.1.32) in equation (7.1.14) and then (7.1.12), results in d w ∑ ∞ ∞ − ∞ ∞ sr = η ε k k 1 ′ k k (7.1.33) ( L ) f ( a ) x i ir r s dt i CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation When the network with input u k has converged to x k ∞ , the local gradient for recurrent backpropagation at the output of the r th neuron may be defined in analogy with the standard backpropagation as ∑ ∞ ′ ∞ ∞ ∞ − δ = ε k k k k 1 f ( a ) ( L ) r r i ir (7.1.34) i So, it becomes simply d w ∞ ∞ (7.1.35) sr = ηδ k k x r s dt EE543 - ANN - CHAPTER 7 8

Recommend


More recommend