introduction to rnns
play

Introduction to RNNs Arun Mallya Best viewed with Computer Modern - PowerPoint PPT Presentation

Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher


  1. Introduction to RNNs � Arun Mallya � Best viewed with Computer Modern fonts installed �

  2. Outline � • Why Recurrent Neural Networks (RNNs)? � • The Vanilla RNN unit � • The RNN forward pass � • Backpropagation refresher � • The RNN backward pass � • Issues with the Vanilla RNN � • The Long Short-Term Memory (LSTM) unit � • The LSTM Forward & Backward pass � • LSTM variants and tips � – Peephole LSTM � – GRU �

  3. � � Motivation � • Not all problems can be converted into one with fixed- length inputs and outputs � • Problems such as Speech Recognition or Time-series Prediction require a system to store and use context information � – Simple case: Output YES if the number of 1s is even, else NO � 1000010101 – YES, 100011 – NO, … � • Hard/Impossible to choose a fixed context window � – There can always be a new sample longer than anything seen �

  4. � Recurrent Neural Networks (RNNs) � • R ecurrent N eural N etwork s take the previous output or hidden states as inputs. � The composite input at time t has some historical information about the happenings at time T < t � • RNNs are useful as their intermediate values (state) can store information about past inputs for a time that is not fixed a priori �

  5. Sample Feed-forward Network � y 1 � h 1 � x 1 � t = 1 � 5 ¡

  6. Sample RNN � y 3 � y 2 � h 3 � y 1 � h 2 � x 3 � h 1 � t = 3 � x 2 � t = 2 � x 1 � t = 1 � 6 ¡

  7. Sample RNN � y 3 � y 2 � h 3 � y 1 � h 2 � x 3 � h 1 � t = 3 � x 2 � h 0 � t = 2 � x 1 � t = 1 � 7 ¡

  8. � � � The Vanilla RNN Cell � x t � W � h t � h t-1 � ⎛ ⎞ x t h t = tanh W ⎜ ⎟ ⎝ ⎠ h t − 1 8 ¡

  9. � � � The Vanilla RNN Forward � C 1 � C 2 � C 3 � y 1 � y 2 � y 3 � ⎛ ⎞ x t h t = tanh W ⎜ ⎟ ⎝ ⎠ h t − 1 h 1 � h 2 � h 3 � y t = F( h t ) C t = Loss( y t ,GT t ) x 1 h 0 � x 2 h 1 � x 3 h 2 � 9 ¡

  10. � � � The Vanilla RNN Forward � C 1 � C 2 � C 3 � y 1 � y 2 � y 3 � ⎛ ⎞ x t h t = tanh W ⎜ ⎟ ⎝ ⎠ h t − 1 h 1 � h 2 � h 3 � y t = F( h t ) C t = Loss( y t ,GT t ) indicates shared weights � x 1 h 0 � x 2 h 1 � x 3 h 2 � 10 ¡

  11. Recurrent Neural Networks (RNNs) � • Note that the weights are shared over time � • Essentially, copies of the RNN cell are made over time (unrolling/unfolding), with di ff erent inputs at di ff erent time steps �

  12. � Sentiment Classification � • Classify a � restaurant review from Yelp! OR � movie review from IMDB OR � … � as positive or negative � • Inputs: Multiple words, one or more sentences � • Outputs: Positive / Negative classification � • “The food was really good” � • “The chicken crossed the road because it was uncooked” �

  13. Sentiment Classification � RNN � h 1 � The �

  14. Sentiment Classification � RNN � RNN � h 1 � h 2 � The � food �

  15. Sentiment Classification � h n � RNN � RNN � RNN � h 1 � h n-1 � h 2 � The � food � good �

  16. Sentiment Classification � Linear Classifier � h n � RNN � RNN � RNN � h 1 � h n-1 � h 2 � The � food � good �

  17. Sentiment Classification � Linear Ignore � Ignore � Classifier � h 2 � h n � h 1 � RNN � RNN � RNN � h 1 � h n-1 � h 2 � The � food � good �

  18. Sentiment Classification � h = Sum(…) � h 1 � h n � h 2 � RNN � RNN � RNN � h 1 � h n-1 � h 2 � The � food � good � http://deeplearning.net/tutorial/lstm.html �

  19. Sentiment Classification � Linear Classifier � h = Sum(…) � h 1 � h n � h 2 � RNN � RNN � RNN � h 1 � h n-1 � h 2 � The � food � good � http://deeplearning.net/tutorial/lstm.html �

  20. � Image Captioning � • Given an image, produce a sentence describing its contents � • Inputs: Image feature (from a CNN) � • Outputs: Multiple words (let’s consider one sentence) � : The dog is hiding ¡

  21. Image Captioning � RNN � CNN �

  22. Image Captioning � The � Linear Classifier � h 2 � RNN � RNN � h 2 � h 1 � CNN �

  23. Image Captioning � The � dog � Linear Linear Classifier � Classifier � h 2 � h 3 � RNN � RNN � RNN � h 2 � h 1 � h 3 � CNN �

  24. RNN Outputs: Image Captions � Show and Tell: A Neural Image Caption Generator, CVPR 15 �

  25. RNN Outputs: Language Modeling � VIOLA: � KING LEAR: � Why, Salisbury must find his flesh and thought � O, if you were a feeble sight, the That which I am not aps, not a man and in fire, � courtesy of your law, � To show the reining of the raven and the wars � Your sight and several breath, will To grace my hand reproach within, and not a fair are wear the gods � hand, � With his heads, and my hands are That Caesar and my goodly father's world; � wonder'd at the deeds, � When I was heaven of presence and our fleets, � So drop upon your lordship's head, We spare with hours, but cut thy council I am great, � and your opinion � Murdered and by thy master's ready there � Shall be against your honour. � My power to give thee but so much as hell: � Some service in the noble bondman here, � Would show him to her wine. � http://karpathy.github.io/2015/05/21/rnn-e ff ectiveness/ �

  26. Input – Output Scenarios � Single - Single � Feed-forward Network � Single - Multiple � Image Captioning � Multiple - Single � Sentiment Classification � Multiple - Multiple � Translation � Image Captioning �

  27. Input – Output Scenarios � Note: We might deliberately choose to frame our problem as a � particular input-output scenario for ease of training or � better performance. � For example, at each time step, provide previous word as � input for image captioning � (Single-Multiple to Multiple-Multiple). �

  28. � � � The Vanilla RNN Forward � C 1 � C 2 � C 3 � ⎛ ⎞ x t y 1 � y 2 � y 3 � h t = tanh W ⎜ ⎟ ⎝ ⎠ h t − 1 y t = F( h t ) C t = Loss( y t ,GT h 1 � h 2 � h 3 � t ) “Unfold” network through time by making copies at each time-step � x 1 h 0 � x 2 h 1 � x 3 h 2 � 28 ¡

  29. BackPropagation Refresher � y = f ( x ; W ) C = Loss( y , y GT ) C � y � SGD Update W ← W − η ∂ C f(x; W) � ∂ W x � ∂ C ⎛ ∂ C ⎞ ∂ y ⎛ ⎞ ∂ W = ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ∂ y ⎠ ∂ W

  30. Multiple Layers � y 1 = f 1 ( x ; W 1 ) y 2 = f 2 ( y 1 ; W 2 ) C = Loss( y 2 , y GT ) C � y 2 � SGD Update W 2 ← W 2 − η ∂ C f 2 (y 1 ; W 2 ) � ∂ W 2 W 1 ← W 1 − η ∂ C y 1 � ∂ W 1 f 1 (x; W 1 ) � x �

  31. Chain Rule for Gradient Computation � y 1 = f 1 ( x ; W 1 ) y 2 = f 2 ( y 1 ; W 2 ) C = Loss( y 2 , y GT ) C � Find ∂ C , ∂ C y 2 � ∂ W 1 ∂ W 2 ⎛ ⎞ ⎛ ⎞ ∂ C ∂ C ∂ y 2 f 2 (y 1 ; W 2 ) � = ⎜ ⎟ ⎜ ⎟ ∂ W 2 ∂ y 2 ∂ W 2 ⎝ ⎠ ⎝ ⎠ y 1 � ⎛ ⎞ ⎛ ⎞ ∂ C ∂ C ∂ y 1 = ⎜ ⎟ ⎜ ⎟ ∂ W 1 ∂ y 1 ∂ W 1 ⎝ ⎠ ⎝ ⎠ f 1 (x; W 1 ) � ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ∂ C ∂ y 2 ∂ y 1 = x � ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ∂ y 2 ⎠ ⎝ ∂ y 1 ⎠ ⎝ ∂ W 1 ⎠ Application of the Chain Rule �

  32. Chain Rule for Gradient Computation � ⎛ ∂ C ⎞ Given: � ⎜ ⎟ ⎝ ∂ y ⎠ ∂ C ⎟ , ∂ C ⎛ ⎞ ⎛ ⎞ We are interested in computing: � ⎜ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ y � ∂ W ∂ x Intrinsic to the layer are: � f(x; W) � ∂ y ⎛ ⎞ ⎟ − How does output change due to params ⎜ ⎝ ⎠ ∂ W x � ∂ y ⎛ ⎞ ⎟ − How does output change due to inputs ⎜ ⎝ ⎠ ∂ x ∂ C ⎟ = ∂ C ⎛ ⎞ ∂ y ∂ C ⎟ = ∂ C ⎛ ⎞ ∂ y ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎜ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ∂ W ⎝ ∂ y ⎠ ∂ W ∂ x ⎝ ∂ y ⎠ ∂ x

  33. Chain Rule for Gradient Computation � ⎛ ∂ C ⎞ Given: � ⎜ ⎟ ⎝ ∂ y ⎠ ⎛ ∂ C ⎞ ∂ C ⎟ , ∂ C ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ We are interested in computing: � ⎜ ⎜ ⎟ ⎝ ∂ y ⎠ ⎝ ⎠ ⎝ ⎠ ∂ W ∂ x Intrinsic to the layer are: � f(x; W) � ∂ y ⎛ ⎞ ⎟ − How does output change due to params ⎜ ⎝ ⎠ ∂ W ∂ C ⎛ ⎞ ⎜ ⎟ ⎝ ⎠ ∂ y ∂ x ⎛ ⎞ ⎟ − How does output change due to inputs ⎜ ⎝ ⎠ ∂ x ∂ C ⎟ = ∂ C ⎛ ⎞ ∂ y ∂ C ⎟ = ∂ C ⎛ ⎞ ∂ y ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎜ ⎜ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ∂ W ⎝ ∂ y ⎠ ∂ W ∂ x ⎝ ∂ y ⎠ ∂ x Equations for common layers: http://arunmallya.github.io/writeups/nn/backprop.html �

  34. Extension to Computational Graphs � y 1 � y 2 � y � f 1 (y; W 1 ) � f 2 (y; W 2 ) � f(x; W) � y � y � x � f(x; W) � x �

  35. Extension to Computational Graphs � ⎛ ⎞ ⎛ ⎞ ∂ C 1 ∂ C 2 ⎜ ⎟ ⎜ ⎟ ⎝ ∂ y 1 ⎠ ⎝ ∂ y 2 ⎠ ⎛ ∂ C ⎞ ⎜ ⎟ ⎝ ∂ y ⎠ f 1 (y; W 1 ) � f 2 (y; W 2 ) � f(x; W) � ⎛ ∂ C 1 ⎞ ⎛ ∂ C 2 ⎞ ⎜ ⎟ ⎜ ⎟ ⎝ ∂ y ⎠ ⎝ ∂ y ⎠ ∂ C ⎛ ⎞ ⎜ ⎟ ⎝ ⎠ ∂ x Σ f(x; W) � ∂ C ⎛ ⎞ ⎜ ⎟ ⎝ ⎠ ∂ x

Recommend


More recommend