CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning - PDF document

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks Introduction We have examined the dynamics of recurrent neural networks in detail in Chapter 2. Then in Chapter 3, we used them as associative memory with fixed weights. In this chapter, the backpropagation learning algorithm that we have considered for feedforward networks in Chapter 6 will be extended to recurrent neural networks [Almeida 87, 88]. Therefore, the weights of the recurrent network will be adapted in order to use it as associative memory. Such a network is expected to converge to the desired output pattern when the associated pattern is applied at the network inputs. EE543 - ANN - CHAPTER 7 1

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation x 1 1 Consider the recurrent system shown in the Figure 7.1, in which there are n neurons, u 1 x 2 some of them being input units, and some u 2 ... others outputs. ... x i u j In such a network, the units, which are neither ... ... input nor output, are called hidden neurons. x M u N input hidden output neurons neurons neurons Figure 7.1 Recurrent network architecture CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation We will assume a network dynamic defined as: dx ( ∑ = − + + θ i x f w x ) (7.1.1) i ji j i dt j This may be written equivalently as da ∑ i = − + + θ (7.1.2) a w f a ( ) i ji i i dt j through a linear transformation. EE543 - ANN - CHAPTER 7 2

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Our goal is to update the weights of the network so that it will be able to remember predefined associations, µ k =( u k , y k ), u k ∈ R N , y k ∈ R N , k =1.. K . With no loss of generality, we extended here the input vector u such that u i =0 if the neuron i is not an input neuron. Furthermore, we will simply ignore the outputs of the unrelated neurons. We apply an input u k to the network by setting (7.1.3) θ = = u k i 1 .. N i i Therefore, we desire the network with an initial state x (0)= x k0 to converge to x k ( ∞ ) = x k ∞ = y k (7.1.4) whenever u k is applied as input to the network. CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation The recurrent backpropagation algorithm, updates the connection weights aiming to minimize the error ∑ e k = ε k 2 12 ( ) (7.1.5) i i so that the mean error is also minimized e =< ( ε k ) 2 > (7.1.6) EE543 - ANN - CHAPTER 7 3

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, e k and e are scalar values while ε k is a vector defined as ε k =y k -x k (7.1.7) whose i th component ε i k , i =1.. M, is ε k = α k − k ( y x ) i (7.1.8) i i i In equation (7.1.8) the coefficient α i used to discriminate between the output neurons and the others by setting its value as ⎧ 1 if i is an output neuron α = ⎨ (7.1.9) i ⎩ 0 otherwise Therefore, the neurons, which are not output, will have no effect on the error. CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, if an input u k is applied to the network and if it is let to converge to a fixed point x k ∞ , the error depends on the weight matrix through these fixed points. The learning algorithm should modify the connection weights so that the fixed points satisfy ∞ = (7.1.10) k k x y i i For this purpose, we let the system to evolve in the weight space along trajectories in the opposite direction of the gradient, that is d w = η − ∇ k e (7.1.11) dt In particular w ij should satisfy = − η ∂ k d w e ij (7.1.12) ∂ dt w ij Here η is a positive constant named the learning rate, which should be chosen so small. EE543 - ANN - CHAPTER 7 4

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Since, α ε = ε (7.1.13) i i i the partial derivative of e k given in Eq. (7.1.5) with respect to w sr becomes: ∂ e ∂ ∞ k k (7.1.14) ∑ x = − ε k i ∂ ∂ i w w sr i sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation On the other hand, since x k is a fixed point, it should satisfy k ∞ dx i = 0 (7.1.15) dt for which Eq. (7.1.1) becomes ∞ ∑ ∞ k = k + k x f ( w x u ) (7.1.16) ji i j i j Therefore , ∞ ∞ ∂ k ∂ k ∂ x x w ∞ ∑ ∞ ji j i = ′ k k + f ( a ) ( x w ) (7.1.17) ji ∂ i j ∂ ∂ w w w sr sr sr j where ∞ = d f ( a ) ′ k (7.1.18) f ( a ) ∞ k i = ∑ + k a w x u da ij j i j EE543 - ANN - CHAPTER 7 5

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Notice that, ∂ w ij (7.1.19) = δ δ js ir ∂ w sr where δ ij is the Kronecker delta which have value 1 if i=j and 0 otherwise, resulting ∑ ∞ ∞ k δ δ = δ k x x (7.1.20) js ir ir s j j Hence, ∞ ∞ ∂ k ∂ k x x ∞ ∞ + ∑ j i = ′ k δ k (7.1.21) f ( a )( x w ) ir s ji ∂ ∂ i w w sr sr j By reorganizing the above equation, we obtain ∂ ∞ ∂ ∞ k k x x ∑ − ′ ∞ = ′ ∞ δ ∞ (7.1.22) j i k k k f ( a ) w f ( a ) x ∂ i ji ∂ i ir s w w sr j sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember ∂ ∞ ∂ ∞ k k x x ∑ − ′ ∞ = ′ ∞ δ ∞ j i k k k f ( a ) w f ( a ) x (7.1.22) ∂ ∂ i ji i ir s w w sr j sr • Notice that, ∞ ∞ ∂ k ∂ k x x = ∑ (7.1.23) j i δ ji ∂ ∂ w w sr sr j • Therefore, Eq. (7.1.22), can be written equivalently as, ∂ ∞ ∂ ∞ k k x ∑ x ∑ δ − ′ ∞ = ′ ∞ δ ∞ j (7.1.24) i k k k f ( a ) w f ( a ) x ∂ ∂ ji i ji i ir s w w j sr j sr or, ∂ ∞ k x ∑ δ − ′ ∞ = δ ′ ∞ ∞ (7.1.25) j k k k (( w f ( a )) f ( a ) x ∂ ji ji i ir i s w j sr EE543 - ANN - CHAPTER 7 6

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember ∞ ∂ k x ∑ ′ ∞ ′ ∞ ∞ δ − j = δ k k k (( w f ( a )) f ( a ) x (7.1.25) ji ji i ∂ ir i s w j sr If we define matrix L k and vector R k such that • ∞ ∞ (7.1.26) k = δ − ′ k L f ( a ) w ij ji ij i and ∞ ′ ∞ = δ k k R f ( a ) (7.1.27) i ir i the equation (7.1.25) results in ∂ ∞ ∞ = ∞ ∞ k k k k (7.1.28) L x R x ∂ s w sr CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Hence, we obtain, ∂ ∞ = ∞ − ∞ k k 1 k x ( L ) R x ∂ s (7.1.29) w sr In particular, if we consider the i th row we observe that ∂ = ∑ ∞ ∞ − ∞ k k 1 k x ( ( L ) R ) x j ∂ w (7.1.30) i ij s sr j Since ∑ ∑ ∞ − ∞ − ′ ∞ ∞ − ′ ∞ = δ = k 1 k 1 k k 1 k ( L ) R ( L ) f ( a ) ( L ) f ( a ) ij j ij jr j ir r (7.1.31) j j we obtain ∂ ∞ ∞ − ∞ ∞ k = k 1 ′ k k x ( L ) f ( a ) x (7.1.32) ∂ w i ir r s sr EE543 - ANN - CHAPTER 7 7

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation Remember = − η ∂ k d w e (7.1.12) ij ∂ dt w ij ∂ e ∂ ∞ k k x ∑ = − ε k i (7.1.14) ∂ i ∂ w w i sr sr ∂ ∞ ∞ − ∞ ∞ k = k 1 ′ k k x ( L ) f ( a ) x ∂ w i ir r s (7.1.32) sr • Insertion of (7.1.32) in equation (7.1.14) and then (7.1.12), results in d w ∑ ∞ ∞ − ∞ ∞ sr = η ε k k 1 ′ k k (7.1.33) ( L ) f ( a ) x i ir r s dt i CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks 7.1. Recurrent Backpropagation When the network with input u k has converged to x k ∞ , the local gradient for recurrent backpropagation at the output of the r th neuron may be defined in analogy with the standard backpropagation as ∑ ∞ ′ ∞ ∞ ∞ − δ = ε k k k k 1 f ( a ) ( L ) r r i ir (7.1.34) i So, it becomes simply d w ∞ ∞ (7.1.35) sr = ηδ k k x r s dt EE543 - ANN - CHAPTER 7 8

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning - PDF document

Ugur HALICI - METU EEE - ANKARA 11/18/2004 CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER VI : VI : Learning in CHAPTER Learning in Recurrent Recurrent Networks Networks Introduction We

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER CHAPTER VII CHAPTER CHAPTER VII VII VII MANAGEMENT AND MANAGEMENT AND

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Introduction CSCE CSCE 496/896 496/896 Lecture 6: Lecture 6: Recurrent Recurrent CSCE

CS885 Reinforcement Learning Lecture 12: June 8, 2018 Deep Recurrent Q-Networks [GBC] Chap. 10

Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) ar

Learning Beyond Finite Memory in Recurrent Networks Of Spiking Neurons Peter Ti no Ashley

The BC Forest Safety Council The Safety Challenge About the Council The Work Plan

CYPRUS 2016 The 4th International Conference on Sustainable Solid Waste Management, 23 - 25 June

Business Environment in the Southern Mediterranean A project funded by the European Union RIA :

THE ESP SERIES OF SYSTEMS ON A CHIP FOR PROTOTYPING IOT DEVICES Bill Ball Scholar in Residence,

Communications M.TECH SEMINAR PRESENTATION by MEET HARESH HARIA (153079029) under the guidance

The Story of Wines For Humanity In The Beginning There was a man named Anton Steinhart.

Remote with Me: Successfully Working from Home Chad Pingel Google Procurement Organization,

Home Office Security and Productivity Gary Micander Sierra Computer Group Cybersecurity at