Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020
Introduction Recurrent Neural Network LSTM Agenda § Get introduced to different recurrent neural architecture e.g. , RNNs, LSTMs, GRUs etc. § Get introduced to tasks involving sequential inputs and/or sequential outputs. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 2 / 30
Introduction Recurrent Neural Network LSTM Resources § Deep Learning by I. Goodfellow and Y. Bengio and A. Courville. [Link] [Chapter 10] § CS231n by Stanford University [Link] § Understanding LSTM Networks by Chris Olah [Link] Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 3 / 30
Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § So far, we focused mainly on prediction problems with fixedsize inputs and outputs. § In image classification, input is fixed size image and and output is its class, in video classification, the input is fixed size video and output is its class, in bounding-box regression the input is fixed size region proposal (resized/RoI pooled) and output is bounding box coordinates. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 4 / 30
Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Suppose, we want our model to write down the caption of this image. Figure: Several people with umbrellas walk down a side walk on a rainy day. Image source: COCO Dataset, ICCV 2015 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 5 / 30
Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30
Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? § Will this work? Several people with umbrellas § When the model generates ‘people’, we need a way to tell the model that ‘several’ has already been generated and similarly for the other words. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 6 / 30
Introduction Recurrent Neural Network LSTM Why do we Need another NN Model? Several people with umbrellas § e.g. Image Captioning § image -> sequence of words Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 7 / 30 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 12 May 4, 2017
Introduction Recurrent Neural Network LSTM Recurrent Neural Networks: Process Sequences e.g., Machine Translation e.g., Image Captioning sequence of words -> sequence of words image-> sequence of words e.g., Frame Level Video Classification e.g., Sentiment Classification sequence of frames -> sequence of labels sequence of words -> sentiment Image source: CS231n from Stanford Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 8 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network § The fundamental feature of a Recurrent Neural Network (RNN) is that the network contains at least one feedback connection so that activation can flow in a loop. § The feedback connection allows information to persist. Remember the generation of people would require the generation of several to be remembered. § The simplest form of RNN has the previous set of hidden unit activations feeding back into the network along with the inputs. Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 9 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network Outputs ℎ " ℎ " Hidden Units Delay 𝑦 " ℎ "#$ Inputs § Note that the concept of ‘ time ’ or sequential processing comes into picture. § The activations are updated one time-step at a time. § The task of the delay unit is to simply delay the hidden layer activation until the next time-step. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 10 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network Input vector ℎ " = 𝑔 𝑦 " , ℎ "'( Some function Old state New state § f , in particular, can be a layer of a neural network. § Lets unroll the recurrent connection. 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § Note that all weight matrices are same across timesteps. So, the weights are shared for all the timesteps. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 11 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: Forward Pass 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ 𝑿 & 𝑿 & 𝑿 & 𝑿 & 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + a t = W h h t − 1 + W i x t (1) h t = g ( a t ) (2) y t = W o h t (3) § Note that we can have biases too. For simplicity these are omitted. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 12 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 " 𝒊 ")$ 𝒊 ")𝟑 𝒊 +#$ 𝒊 "#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + § BPTT: Backpropagation through time T L t and we are after ∂L ∂L ∂L � § Total loss L = ∂ W o , ∂ W h and ∂ W i t =1 § Lets compute ∂L ∂ y t . ∂L t ∂ y t = 1 . ∂L t ∂ y t = ∂L ∂L (4) ∂L t ∂ y t ∂L t ∂ y t is computable depending on the particular form of the loss § function. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 13 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30
Introduction Recurrent Neural Network LSTM Recurrent Neural Network: BPTT 𝒛 " 𝒛 ")$ 𝒛 ")- 𝒛 + 𝑿 ( 𝑿 ( 𝑿 ( 𝑿 ( a t = W h h t − 1 + W i x t 𝒊 ")𝟑 𝒊 "#$ 𝒊 " 𝒊 ")$ 𝒊 +#$ h t = g ( a t ) 𝑿 & 𝑿 & 𝑿 & 𝑿 & y t = W o h t 𝑿 ' 𝑿 ' 𝑿 ' 𝑿 ' 𝒚 " 𝒚 ")$ 𝒚 ")- 𝒚 + ∂ h t . The subtlety here is that all L t after timestep t § Lets compute ∂L are functions of h t . So, let us first consider ∂L ∂ h T , where T is the last timestep. ∂ y T ∂ h T = ∂L ∂L ∂L ∂ h T = ∂ y T W o (5) ∂ y T ∂L ∂ y T , we just computed last slide (eqn. (4)). § ∂ h t . h t affects y t and also h t +1 . § For a generic t , we need to compute ∂L For this we will use something that we used while studying Backpropagation for feedforward networks. Abir Das (IIT Kharagpur) CS60010 Mar 11, 2020 14 / 30
Recommend
More recommend