Deep-Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Université Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1 Acknowledgements During preparation of these slides, I got inspiration and borrowed some slide content from several sources, in particular: • Fei-Fei Li + J.Johnson + S.Yeung: slides on “ Recurrent Neural Networks ” from the “ Convolutional Neural Networks for Visual Recognition ” course at Stanford http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf • Yingyu Liang: slides on “ Recurrent Neural Networks ” from the “ Deep Learning Basics ” course at Princeton https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL _lecture9_RNN.pdf • Arun Mallya: slides “ Introduction to RNNs ” from the “ Trends in Deep Learning and Recognition ” course of Svetlana LAZEBNIK at University of Illinois at Urbana-Champaign http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf • Tingwu Wang: slides on “ Recurrent Neural Network ” for a course at University of Toronto https://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf • Christopher Olah: online tutorial “ Understanding LSTM Networks ” https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2
Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 3 Recurrent Neural Networks (RNN) output output 2 x 2 (t-1) x 3 x 2 x 2 (t) 1 f f f S S S S 1 1 0 x 3 (t) f S 1 1 1 1 0 0 x 3 (t-1) input x 1 x 1 (t) x 2 (t-2) input x 2 (t-1) Time-delay for each connection Equivalent form Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 4
Canonical form of RNN Output Y(t) X(t) State variables .......... ........ Non-recurrent network ....... 1 1 1 ........ ............ U(t) X(t-1) External input State variables Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 5 Time unfolding of RNN State variables Output at time t : X(t) Y(t) at time t Non-recurrent network External input at time t : U(t) State variables X(t-1) Output at time t-1 : Y(t-1) at time t-1 Non-recurrent network External input at time t-1 : U(t-1) State variables X(t-2) Output at time t-2 : Y(t-2) at time t-2 Non-recurrent network External input at time t-2 : U(t-2) State variables X(t-3) at time t-3 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 6
Dynamic systems & RNN ! ("#$) = % ! (") , & ("#$) If using a Neural Net for f, this is EXACTLY a RNN! Figures from Deep Learning , Goodfellow, Bengio and Courville Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 7 Standard (“vanilla”) RNN State vector s ßà vector h of hidden neurons ou y t =softMax(W hy h t ) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 8
Advantages of RNN The hidden state s of the RNN builds a kind of lossy summary of the past RNN totally adapted to processing SEQUENTIAL data (same computation formula applied at each time step, but modulated by the evolving “memory” contained in state s) Universality of RNNs: any function computable by a Turing Machine can be computed by a finite-size RNN (Siegelmann and Sontag, 1995) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 9 RNN hyper-parameters • As for MLP, main hyperparameter = size of hidden layer (=size of vector h) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 10
Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 11 RNN training Horizon N t = 4 W(t) W(t) W(t) W(t) e4 e3 W(t+4) t Temporal sequence t+1 t+2 t+3 t+4 • BackPropagation Through Time (BPTT) gradients update for a whole sequence • or Real Time Recurrent Learning (RTRL) gradients update for each frame in a sequence Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 12
BackPropagation THROUGH TIME (BPTT) • Forward through entire sequence to compute SUM of losses at ALL (or part of) time steps • Then backprop through ENTIRE sequence to compute gradients Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 13 BPTT computation principle U(t) U(t+1) D(t+1) U(t+2) D(t+2) bloc 1 bloc 2 bloc 3 W(t) W(t) W(t) X d (t-1) X(t+1) X(t+2) X(t) dE/dX n dE/dX n+1 d W 3 d W 1 d W 2 d W = d W 1 + d W 2 + d W 3 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 14
BPTT algorithm U(t) Y(t) Feedforward Network X(t-1) X(t) state delay W(t+N t ) = W(t) - l grad W (E) avec E = S t ( Y t -D t ) 2 6 +- . '*, +- . +/ = +- . +0 +1 .23 +- . (chain rule) and +/ = 4 +/ +0 +1 .23 +/ . .53 .23 +1 . . +1 . +1 .27 +1 +1 . 9 +/ = 4 = 8 Jacobian matrix of the +1 .27 +/ +1 .27 +1 Feedforward net 923 753 953 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 15 Vanishing/exploding gradient problem • If eigenvalues of Jacobian matrix >1, then gradients tend to EXPLODE è Learning will never converge. • Conversely, if eigenvalues of Jacobian matrix <1, then gradients tend to VANISH è Error signals can only affect small time lags è short-term memory. è Possible solutions for exploding gradient: CLIPPING trick è Possible solutions for vanishing gradient: – use ReLU instead of tanh – change what is inside the RNN! Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 16
Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 17 Long Short-Term Memory (LSTM) Problem of standard RNNs = no actual LONG-TERM memory LSTM = RNN variant for solving this issue (proposed by Hochreiter & Schmidhuber in 1997) [Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/] • Key idea = use “gates” that modulate respective influences of input and memory Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 18
LSTM gates Gate = pointwise multiplication by s in ]0;1[ è modulate between “ let nothing through ” and “ let everything through ” • FORGET gate • INPUT gate è next state = mix between pure memory or pure new [Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/] Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 19 LSTM summary • OUTPUT gate ALL weigths W f , W i , W c and W o (and biases) are LEARNT [Figure from Deep Learning book by I. Goodfellow, Y. Bengio & A. Courville] Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 20
Why LSTM avoids vanishing gradients? Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 21 Gated Recurrent Unit (GRU) Simplified variant of LSTM, with only 2 gates: a RESET gate & an UPDATE gate (proposed by Cho, et al. in 2014) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 22
Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 23 Typical usages of RNNs Sequence Sequence to Sequence Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 24
Combining RNN with CNN Input into RNN the features from last convolutional layer For example, for image captioning Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 25 Deep RNNs Several RNNs stacked (like layers in MLP) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 26
Recommend
More recommend