How to Construct Deep Recurrent Neural Networks AUTHORS: R. - PowerPoint PPT Presentation

How to Construct Deep Recurrent Neural Networks AUTHORS: R. PASCANU, C. GULCEHRE, K. CHO, Y. BENGIO PRESENTATION: HAROUN HABEEB PAPER: HTTPS://ARXIV.ORG/ABS/1312.6026

This presentation Motivation Formal RNN paradigm Deep RNN designs Experiments Note on training Takeaways

Motivation: Better RNNs? Depth makes feedforward neural networks more expressive What about RNNS? How do you make them deep? Does depth help?

𝒊 𝑢 = 𝑔 ℎ (𝒚 𝑢 , 𝒊 𝑢−1 ) Conventional 𝒛 𝑢 = 𝑔 𝑝 𝒊 𝑢 RNNs Specifically: ℎ 𝒚 𝑢 , 𝒊 𝑢−1 ; 𝑿, 𝑽 = 𝜚 ℎ 𝑿 𝑈 𝒊 𝑢−1 + 𝑽 𝑼 𝒚 𝑢 𝑔 𝑝 𝒊 𝑢 ; 𝑾 = 𝜚 𝑝 (𝑾 𝑈 𝒊 𝑢 ) 𝑔 ▪ How general is this? ▪ How easy is it to represent an LSTM/GRU in this form? ▪ What about bias terms? ▪ How would you make an LSTM deep?

THE DEEPENING

𝒛 𝑢 = 𝑔 𝑝 𝒊 𝑢 DT(S)-RNN 𝒊 𝑢 = 𝑔 ℎ (𝑕 𝒚 𝑢 , 𝒊 𝑢−1 , 𝒚 𝑢 , 𝒊 𝑢−1 ) Specifically: 𝑧 𝑢 = 𝜔(𝑋ℎ 𝑢 ) ℎ 𝑢 = 𝜚 𝑀 ( 𝑈 𝜚 𝑀−1 … 𝑊 𝑈 𝜚 1 𝑊 𝑈 ℎ 𝑢−1 + 𝑉𝑦 𝑢 𝑊 𝑀 2 1 + ഥ 𝑋 𝑈 ℎ 𝑢−1 +ഥ 𝑉 𝑈 𝑦 𝑢 )

𝒛 𝑢 = 𝑔 𝑝 𝒊 𝑢 DOT(S)-RNN 𝒊 𝑢 = 𝑔 ℎ (𝑕 𝒚 𝑢 , 𝒊 𝑢−1 , 𝒚 𝑢 , 𝒊 𝑢−1 ) Specifically: 𝑈 𝜔 𝑀 (… 𝑋 𝑈 𝜔 1 𝑋 𝑈 ℎ 𝑢 ) 𝑧 𝑢 = 𝜔 0 (𝑋 𝑀 1 ℎ 𝑢 = 𝜚 𝑀 ( 𝑈 𝜚 𝑀−1 … 𝑊 𝑈 𝜚 1 𝑊 𝑈 ℎ 𝑢−1 + 𝑉𝑦 𝑢 𝑊 𝑀 2 1 + ഥ 𝑋 𝑈 ℎ 𝑢−1 +ഥ 𝑉 𝑈 𝑦 𝑢 )

0 = 𝒈 ℎ 0 (𝒚 𝑢 , 𝒊 𝑢−1 0 𝒊 𝑢 ) sRNN (𝑚) = 𝑔 (𝑚) (𝒊 𝑢 𝑚−1 , 𝒊 𝑢−1 𝑚 ∀𝑚 ∶ 𝒊 𝑢 ) ℎ (𝑀) 𝒛 𝑢 = 𝑔 𝑝 𝒊 𝑢 Specifically: (𝑀) 𝒛 𝑢 = 𝜔 𝑋 𝑈 𝒊 𝑢 (0) = 𝜚 0 (0) 𝑈 𝒚 𝑢 + 𝑋 𝑈 𝒊 𝑢−1 ℎ 𝑢 𝑉 0 0 𝑚 = 𝜚 𝑚 𝑚−1 + 𝑋 (𝑚) 𝑈 𝒊 𝑢 𝑈 𝒊 𝑢−1 ∀𝑚: 𝒊 𝑢 𝑉 𝑚 𝑚

Experiment 0: Parameter count Food for thought: Not clear which one has most number of parameters – sRNN or DOT(S)-RNN.

Experiment 1: Polyphonic Music Prediction Next Task: note(s) Sequence of musical notes Food for thought: Sure, depth helps, but * helps a lot more in this case. What about RNN* and other models with *?

Experiment 2: Language Modelling Next Task : Sequence of characters/words character/word (LM on PTB) Food for thought: Deepening LSTMs? Stack them or DOT(S) them?

Note on training ▪ Training RNNs can be hard because of vanishing/exploding gradients. ▪ Authors did a bunch of things: ▪ Clipped gradients, threshold = 1 ▪ Sparse weight matrices ( 𝑋 0 = 20 ) ▪ ⇒ max 𝑗,𝑘 𝑋 𝑗,𝑘 = 1 Normalized weight matrices ▪ Add gaussian noise to gradients ▪ Used dropout, maxout, 𝑀 𝑞 units

Takeaways ▪ Plain, shallow RNNs are not great. ▪ DOT-RNNs do well. Following should be deep networks ▪ 𝑧 = 𝑔(ℎ, 𝑦) ▪ ℎ 𝑢 = 𝑔 𝑕 𝑦 𝑢 , ℎ 𝑢−1 , 𝑦 𝑢 , ℎ 𝑢−1 - both 𝑔 and 𝑕 ▪ Training can be really hard. ▪ Thresholding gradients, Dropout, maxout units are helpful/needed ▪ LSTMs are good Questions?

How to Construct Deep Recurrent Neural Networks AUTHORS: R. - PowerPoint PPT Presentation

How to Construct Deep Recurrent Neural Networks AUTHORS: R. PASCANU, C. GULCEHRE, K. CHO, Y. BENGIO PRESENTATION: HAROUN HABEEB PAPER: HTTPS://ARXIV.ORG/ABS/1312.6026 This presentation Motivation Formal RNN paradigm Deep RNN designs

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Network Agenda Recurrent Neural Network

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

IN5550 Neural Methods in Natural Language Processing Applications of Recurrent Neural Networks

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

Deep Neural Nets and Features Sung-Eui Yoon ( ) Course URL:

On the interplay of network structure and gradient convergence in deep learning Vikas Singh

Practical Methodology for Deploying Machine Learning Ian Goodfellow (An homage to Advice for

Deep Learning Jiseob Kim (jkim@bi.snu.ac.kr) Artificial Intelligence Class of 2016 Spring Dept.

Bayesian Deep Learning Mohd Adnan Problems With Deep Learning What does a model not know?

CS7015 (Deep Learning) : Lecture 21 Variational Autoencoders Mitesh M. Khapra Department of

deep learning for natural language processing Sergey I. Nikolenko 1,2 FinTech 2.0 SPb St.

EXPLAINABLE AI (AND RELATED CONCEPTS) A QUICK TOUR AI Present and future Jacques Fleuriot