Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP

Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units 1

Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units 2

Recurrent neural networks • First introduced by Elman 1990 • Provides a mechanism for representing sequences of arbitrary length into vectors that encode the sequential information • Currently, perhaps one of the most commonly used tool in the deep learning toolkit for NLP applications

The RNN abstraction A high level overview that doesn’t go into details Output An RNN cell is a unit of differentiable An RNN compute that maps inputs to outputs cell Input 4

The RNN abstraction A high level overview that doesn’t go into details Output An RNN cell is a unit So far, no way to of differentiable build a sequence of An RNN compute that maps such cells inputs to outputs cell Input 5

The RNN abstraction A high level overview that doesn’t go into details Output To allow the ability to compose these cells, they take a recurrent input from a previous such cell An RNN Recurrent input cell Input 6

The RNN abstraction A high level overview that doesn’t go into details Output In addition to the output, To allow the ability to they also produce a compose these cells, they recurrent output that can take a recurrent input serve as a memory of past from a previous such cell states for the next such cell An RNN Recurrent input Recurrent output cell Input 7

The RNN abstraction A high level overview that doesn’t go into details Conceptually two operations Using the input and the recurrent input (also called the previous cell state), compute 1. The next cell state 2. The output 8

The RNN abstraction: A simple example lives in Salt Lake City John This template is unrolled for each input 9

The RNN abstraction: A simple example lives in Salt Lake City John Output 1 Initial state John This computation graph is used here 10

The RNN abstraction: A simple example lives in Salt Lake City John Output 2 Output 1 Initial state John lives This computation graph is used here 11

The RNN abstraction: A simple example lives in Salt Lake City John Output 2 Output 1 Output 3 Initial state John lives in This computation graph is used here 12

The RNN abstraction: A simple example lives in Salt Lake City John Output 2 Output 1 Output 3 Output 4 Initial state John lives in Salt This computation graph is used here 13

The RNN abstraction: A simple example lives in Salt Lake City John Output 2 Output 1 Output 3 Output 4 Output 5 Initial state John lives in Salt Lake This computation graph is used here 14

The RNN abstraction: A simple example lives in Salt Lake City John Output 2 Output 6 Output 1 Output 3 Output 4 Output 5 Initial state John lives in Salt Lake City This computation graph is used here 15

The RNN abstraction Sometimes this is represented as a “neural network with a loop”. But really, when unrolled, there are no loops. Just a big feedforward network. Output An RNN Recurrent input Recurrent output cell Input 16

An abstract RNN :Notation • Inputs to cells: 𝐲 " at the 𝑢 $% step – These are vectors • Cell states (i.e. recurrent inputs and outputs): 𝐭 " at the 𝑢 $% step – These are also vectors • Outputs: 𝐳 " at the 𝑢 $% step – These are also vectors • At each step: – Compute the next cell state: 𝐭 "() = R(𝐲 " , 𝒕 " ) – Compute the output: 𝒛 " = O(𝐭 "() ) 17

An abstract RNN :Notation • Inputs to cells: 𝐲 " at the 𝑢 $% step – These are vectors • Cell states (i.e. recurrent inputs and outputs): 𝐭 " at the 𝑢 $% step – These are also vectors • Outputs: 𝐳 " at the 𝑢 $% step – These are also vectors • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) 20

An abstract RNN :Notation • Inputs to cells: 𝐲 " at the 𝑢 $% step – These are vectors • Cell states (i.e. recurrent inputs and outputs): 𝐭 " at the 𝑢 $% step – These are also vectors • Outputs: 𝐳 " at the 𝑢 $% step – These are also vectors • At each step: Both these functions can be parameterized. – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) That is, they can be – Compute the output: 𝒛 " = O(𝐭 " ) neural networks whose parameters are trained. 21

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) 22

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) 23

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) Encodes the sequence – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) upto t=2 into a single vector 24

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) – 𝐭 6 = R(𝐭 4 , 𝐲 6 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ) 25

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) Encodes the sequence – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) upto t=3 into a single vector – 𝐭 6 = R(𝐭 4 , 𝐲 6 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ) 26

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) – 𝐭 6 = R(𝐭 4 , 𝐲 6 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ) – 𝐭 7 = R(𝐭 6 , 𝐲 7 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ), 𝐲 7 ) 27

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) Encodes the sequence – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) upto t=4 into a single vector – 𝐭 6 = R(𝐭 4 , 𝐲 6 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ) – 𝐭 7 = R(𝐭 6 , 𝐲 7 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ), 𝐲 7 ) 28

What does unrolling the RNN do? • At each step: – Compute the next cell state: 𝐭 " = R(𝐭 "2) , 𝐲 " ) – Compute the output: 𝒛 " = O(𝐭 " ) • We can write this as: – 𝐭 ) = R(𝐭 3 , 𝐲 ) ) Encodes the sequence – 𝐭 4 = R(𝐭 ) , 𝐲 4 ) = R(R 𝐭 3 , 𝐲 ) , 𝐲 4 ) upto t=4 into a single vector – 𝐭 6 = R(𝐭 4 , 𝐲 6 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ) – 𝐭 7 = R(𝐭 6 , 𝐲 7 ) = R R R(𝐭 3 , 𝐲 ) , 𝐲 4 , 𝐲 6 ), 𝐲 7 ) … and so on 29

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

Natural Language Processing with Deep Learning Language Modeling with Recurrent Neural Networks

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) CMSC 678 UMBC Recap

Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop

Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection

Recurrent Neural Networks for Language Modeling CSE392 - Spring 2019 Special Topic in CS Tasks

Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a lar Glehre, John

Differential Categories, Recurrent Neural Networks, and Machine Learning Shin-ya Katsumata and