Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for - PDF document

Deep-Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Université Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 1 Acknowledgements During preparation of these slides, I got inspiration and borrowed some slide content from several sources, in particular: • Fei-Fei Li + J.Johnson + S.Yeung: slides on “ Recurrent Neural Networks ” from the “ Convolutional Neural Networks for Visual Recognition ” course at Stanford http://cs231n.stanford.edu/slides/2019/cs231n_2019_lecture10.pdf • Yingyu Liang: slides on “ Recurrent Neural Networks ” from the “ Deep Learning Basics ” course at Princeton https://www.cs.princeton.edu/courses/archive/spring16/cos495/slides/DL _lecture9_RNN.pdf • Arun Mallya: slides “ Introduction to RNNs ” from the “ Trends in Deep Learning and Recognition ” course of Svetlana LAZEBNIK at University of Illinois at Urbana-Champaign http://slazebni.cs.illinois.edu/spring17/lec02_rnn.pdf • Tingwu Wang: slides on “ Recurrent Neural Network ” for a course at University of Toronto https://www.cs.toronto.edu/%7Etingwuwang/rnn_tutorial.pdf • Christopher Olah: online tutorial “ Understanding LSTM Networks ” https://colah.github.io/posts/2015-08-Understanding-LSTMs/ Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 2

Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 3 Recurrent Neural Networks (RNN) output output 2 x 2 (t-1) x 3 x 2 x 2 (t) 1 f f f S S S S 1 1 0 x 3 (t) f S 1 1 1 1 0 0 x 3 (t-1) input x 1 x 1 (t) x 2 (t-2) input x 2 (t-1) Time-delay for each connection Equivalent form Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 4

Canonical form of RNN Output Y(t) X(t) State variables .......... ........ Non-recurrent network ....... 1 1 1 ........ ............ U(t) X(t-1) External input State variables Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 5 Time unfolding of RNN State variables Output at time t : X(t) Y(t) at time t Non-recurrent network External input at time t : U(t) State variables X(t-1) Output at time t-1 : Y(t-1) at time t-1 Non-recurrent network External input at time t-1 : U(t-1) State variables X(t-2) Output at time t-2 : Y(t-2) at time t-2 Non-recurrent network External input at time t-2 : U(t-2) State variables X(t-3) at time t-3 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 6

Dynamic systems & RNN ! ("#$) = % ! (") , & ("#$) If using a Neural Net for f, this is EXACTLY a RNN! Figures from Deep Learning , Goodfellow, Bengio and Courville Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 7 Standard (“vanilla”) RNN State vector s ßà vector h of hidden neurons ou y t =softMax(W hy h t ) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 8

Advantages of RNN The hidden state s of the RNN builds a kind of lossy summary of the past RNN totally adapted to processing SEQUENTIAL data (same computation formula applied at each time step, but modulated by the evolving “memory” contained in state s) Universality of RNNs: any function computable by a Turing Machine can be computed by a finite-size RNN (Siegelmann and Sontag, 1995) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 9 RNN hyper-parameters • As for MLP, main hyperparameter = size of hidden layer (=size of vector h) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 10

Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 11 RNN training Horizon N t = 4 W(t) W(t) W(t) W(t) e4 e3 W(t+4) t Temporal sequence t+1 t+2 t+3 t+4 • BackPropagation Through Time (BPTT) gradients update for a whole sequence • or Real Time Recurrent Learning (RTRL) gradients update for each frame in a sequence Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 12

BackPropagation THROUGH TIME (BPTT) • Forward through entire sequence to compute SUM of losses at ALL (or part of) time steps • Then backprop through ENTIRE sequence to compute gradients Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 13 BPTT computation principle U(t) U(t+1) D(t+1) U(t+2) D(t+2) bloc 1 bloc 2 bloc 3 W(t) W(t) W(t) X d (t-1) X(t+1) X(t+2) X(t) dE/dX n dE/dX n+1 d W 3 d W 1 d W 2 d W = d W 1 + d W 2 + d W 3 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 14

BPTT algorithm U(t) Y(t) Feedforward Network X(t-1) X(t) state delay W(t+N t ) = W(t) - l grad W (E) avec E = S t ( Y t -D t ) 2 6 +- . '*, +- . +/ = +- . +0 +1 .23 +- . (chain rule) and +/ = 4 +/ +0 +1 .23 +/ . .53 .23 +1 . . +1 . +1 .27 +1 +1 . 9 +/ = 4 = 8 Jacobian matrix of the +1 .27 +/ +1 .27 +1 Feedforward net 923 753 953 Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 15 Vanishing/exploding gradient problem • If eigenvalues of Jacobian matrix >1, then gradients tend to EXPLODE è Learning will never converge. • Conversely, if eigenvalues of Jacobian matrix <1, then gradients tend to VANISH è Error signals can only affect small time lags è short-term memory. è Possible solutions for exploding gradient: CLIPPING trick è Possible solutions for vanishing gradient: – use ReLU instead of tanh – change what is inside the RNN! Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 16

Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 17 Long Short-Term Memory (LSTM) Problem of standard RNNs = no actual LONG-TERM memory LSTM = RNN variant for solving this issue (proposed by Hochreiter & Schmidhuber in 1997) [Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/] • Key idea = use “gates” that modulate respective influences of input and memory Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 18

LSTM gates Gate = pointwise multiplication by s in ]0;1[ è modulate between “ let nothing through ” and “ let everything through ” • FORGET gate • INPUT gate è next state = mix between pure memory or pure new [Figures from https://colah.github.io/posts/2015-08-Understanding-LSTMs/] Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 19 LSTM summary • OUTPUT gate ALL weigths W f , W i , W c and W o (and biases) are LEARNT [Figure from Deep Learning book by I. Goodfellow, Y. Bengio & A. Courville] Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 20

Why LSTM avoids vanishing gradients? Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 21 Gated Recurrent Unit (GRU) Simplified variant of LSTM, with only 2 gates: a RESET gate & an UPDATE gate (proposed by Cho, et al. in 2014) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 22

Outline • Standard Recurrent Neural Networks • Training RNN: BackPropagation Through Time • LSTM and GRU • Applications of RNNs Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 23 Typical usages of RNNs Sequence Sequence to Sequence Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 24

Combining RNN with CNN Input into RNN the features from last convolutional layer For example, for image captioning Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 25 Deep RNNs Several RNNs stacked (like layers in MLP) Deep-Learning: Recurrent Neural Networks (RNN), Pr. Fabien MOUTARDE, Center for Robotics, MINES ParisTech, PSL, May 2019 26

Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for - PDF document

Deep-Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Universit Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Network Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline Dependent Random

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

RNN Input Layer RNN Hidden Layer RNN h t-1 h t x t (Picture adapted from Andrej

Recurrent Neural Networks (RNN) Artificial Intelligence @ Allegheny College Janyl Jumadinova

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

Recurrent Neural Network Agenda Recurrent Neural Network

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Unsupervised Recurrent Neural Network Grammars Yoon Kim Alexander Rush Lei Yu Adhiguna Kuncoro

Dropout in RNNs Following a VI Interpretation Yarin Gal yg279@cam.ac.uk Unless specified

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Jeff

Recurrent Recommendation with Local Coherence Jianling Wang and James Caverlee Dynamics in

Recurrent neural network grammars Slide credits: Chris Dyer, Adhiguna Kuncoro Widespread

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for - PDF document

Deep-Learning: Recurrent Neural Networks (RNN) Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Universit Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde Deep-Learning: Recurrent Neural

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN &amp; Gated RNN

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Network Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline Dependent Random

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

RNN Input Layer RNN Hidden Layer RNN h t-1 h t x t (Picture adapted from Andrej

Recurrent Neural Networks (RNN) Artificial Intelligence @ Allegheny College Janyl Jumadinova

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Some RNN Variants Arun Mallya Best viewed with Computer Modern fonts installed Outline

Recurrent Neural Network Agenda Recurrent Neural Network

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Language Models CMSC 470 Marine Carpuat Toward a Neural Language Model Figures by

Unsupervised Recurrent Neural Network Grammars Yoon Kim Alexander Rush Lei Yu Adhiguna Kuncoro

Dropout in RNNs Following a VI Interpretation Yarin Gal yg279@cam.ac.uk Unless specified

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Jeff

Recurrent Recommendation with Local Coherence Jianling Wang and James Caverlee Dynamics in

Recurrent neural network grammars Slide credits: Chris Dyer, Adhiguna Kuncoro Widespread

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN