Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman

Outline Forgetting, remembering and updating (review) • • Gated networks, LSTM and GRU • RNN Structures Bidirectional • • Deep RNN • Sequence to Sequence Teacher Forcing • • Attention models CS109B, P ROTOPAPAS , G LICKMAN 2

Notation Using conventional and convenient notation 𝑍 " 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN 4

Simple RNN again 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN 5

Simple RNN again 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN 6

Simple RNN again: Memories 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN

Simple RNN again: Memories - Forgetting 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN

Simple RNN again: New Events 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN

Simple RNN again: New Events Weighted 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN

Simple RNN again: Updated memories 𝑍 " σ W ℎ " State σ U + V 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN

RNN + Memory 0.3 0.1 0.1 0.4 0.6 ------- ------- dog barking white shirt apple pie ------- dog barking white shirt apple pie knee hurts RNN RNN RNN RNN RNN dog barking get dark white shirt apple pie knee hurts 0.3 0.1 0.1 0.6 0.9 ------- dog barking dog barking dog barking ------- dog barking ------- white shirt apple pie knee hurts RNN RNN RNN RNN RNN dog barking get dark white shirt apple pie knee hurts CS109B, P ROTOPAPAS , G LICKMAN

RNN + Memory + Output 0.3 0.1 0.1 0.6 0.9 ------- ------- dog barking dog barking dog barking ------- dog barking white shirt apple pie knee hurts RNN RNN RNN RNN RNN dog barking apple pie get dark white shirt knee hurts CS109B, P ROTOPAPAS , G LICKMAN

LSTM: Long short term memory CS109B, P ROTOPAPAS , G LICKMAN 15

Gates A key idea in the LSTM is a mechanism called a gate. CS109B, P ROTOPAPAS , G LICKMAN 16

Forgetting Each value is multiplied by a gate, and the result is stored back into the memory. CS109B, P ROTOPAPAS , G LICKMAN 17

Remembering Remembering involves two steps. 1. We determine how much of each new value we want to remember and we use gates to control that. 2. Remember the gated values, we merely add them in to the existing contents of the memory. CS109B, P ROTOPAPAS , G LICKMAN 18

Remembering (cont) CS109B, P ROTOPAPAS , G LICKMAN 19

Updating To select from memory we just determine how much of each element we want to use, we apply gates to the memory elements, and the results are a list of scaled memories. CS109B, P ROTOPAPAS , G LICKMAN 20

LSTM C t C t-1 h t-1 h t CS109B, P ROTOPAPAS , G LICKMAN 21

Before to really understand LSTM lets see the big picture … Forget Gate f t = σ ( W f · [ h t − 1 , x t ] + b f ) C t C t-1 Output Gate o t i t f t + 𝐷 " h t-1 h t Input Gate Cell State CS109B, P ROTOPAPAS , G LICKMAN 22

LSTM are recurrent neural networks with a 1. cell and a hidden state, boths of these are updated in each step and can be thought as C t C t-1 memories. o t i t f t + 𝐷 " Cell states work as a long term memory and 2. the updates depends on the relation between h t-1 h t the hidden state in t -1 and the input. The hidden state of the next step is a 3. transformation of the cell state and the output (which is the section that is in general used to calculate our loss, ie information that we want in a short memory). 23 CS109B, P ROTOPAPAS , G LICKMAN

Let's think about my cell state Let's predict if I will help you with the homework in time t CS109B, P ROTOPAPAS , G LICKMAN 24

The forget gate tries to estimate what features of the cell Forget Gate state should be forgotten. f t = σ ( W f · [ h t − 1 , x t ] + b f ) C t C t-1 o t i t f t + 𝐷 " h t-1 h t Erase everything! CS109B, P ROTOPAPAS , G LICKMAN 25

The input gate layer works in a similar way that the Input Gate forget layer, the input gate layer estimates the degree of confidence of . is a new estimation of the cell state. Let’s say that my input gate estimation is: C t C t-1 o t i t f t + 𝐷 " h t-1 h t CS109B, P ROTOPAPAS , G LICKMAN 26

Cell state After the calculation of forget gate and input gate we can update our new cell state. C t C t-1 o t i t f t + 𝐷 " h t-1 h t CS109B, P ROTOPAPAS , G LICKMAN 27

The output gate layer is calculated using the ● Output gate information of the input x in time t and hidden state of the last step. It is important to notice that the hidden state used ● in the next step is obtained using the output gate layer which is usually the function that we optimize. C t C t-1 o t i t f t + 𝐷 " h t-1 h t CS109B, P ROTOPAPAS , G LICKMAN 28

GRU A variant of the LSTM is called the Gated Recurrent Unit, or GRU. The GRU is like an LSTM but with some simplifications. 1. The forget and input gates are combined into a single gate 2. No cell state Since there’s a bit less work to be done, a GRU can be a bit faster than an LSTM. It also usually produces results that are similar to the LSTM. Note: Worthwhile to try both the LSTM and GRU to see if either provides more accurate results for a data set. CS109B, P ROTOPAPAS , G LICKMAN 29

GRU (cont) C t C t-1 o t i t f t + 𝐷 " h t-1 h t CS109B, P ROTOPAPAS , G LICKMAN 30

CS109B, P ROTOPAPAS , G LICKMAN 31

wcct! = we can calculate this! To optimize my parameters i basically need to do: Let’s calculate all the derivatives in some time t! wcct! wcct! wcct! wcct! So… every derivative is wrt the cell state or the hidden state CS109B, P ROTOPAPAS , G LICKMAN 32

Let’s calculate the cell state and the hidden state CS109B, P ROTOPAPAS , G LICKMAN 33

RNN Structures 𝑍 " • The one to one structure is useless. • It takes a single input and it produces a single output. • Not useful because the RNN cell is making little use of its unique ability to remember things about its input sequence 𝑌 " one to one CS109B, P ROTOPAPAS , G LICKMAN 34

RNN Structures (cont) 𝑍 �The many to one structure " reads in a sequence and gives us back a single value. Example: Sentiment analysis, where the network is given a piece of text and then reports on some quality inherent in the writing. A common example is 𝑌 "-. 𝑌 "-/ 𝑌 " to look at a movie review and determine if it was positive or many to one negative. (see lab on Thursday) CS109B, P ROTOPAPAS , G LICKMAN 35

RNN Structures (cont) 𝑍 𝑍 𝑍 "-/ " "-. The one to many takes in a single piece of data and produces a sequence. For example we give it the starting note for a song, and the network produces the rest of the melody for us. 𝑌 "-. one to many CS109B, P ROTOPAPAS , G LICKMAN 36

RNN Structures (cont) 𝑍 𝑍 𝑍 "-/ " "-. The many to many structures are in some ways the most interesting. Example: Predict if it will rain given some inputs. 𝑌 "-. 𝑌 "-/ 𝑌 " many to many CS109B, P ROTOPAPAS , G LICKMAN 37

RNN Structures (cont) 𝑍 𝑍 This form of many to many can be "-/ " used for machine translation. For example, the English sentence: “ The black dog jumped over the cat ” In Italian as: “ Il cane nero saltò sopra il gatto ” In the Italia, the adjective “ nero ” (black) follows the noun “ cane ” (dog), 𝑌 "-. 𝑌 "-/ so we need to have some kind of many to many buffer so we can produce the words in their proper English. CS109B, P ROTOPAPAS , G LICKMAN 38

Bidirectional RNNs (LSTMs and GRUs) are designed to analyze sequence of values. For example: � Srivatsan said he needs a vacation. he here means Srivatsan and we know this because the word Srivatsan was before the word he. However consider the following sentence: He needs to work harder, Pavlos said about Srivatsan. He here comes before Srivatsan and therefore the order has to be reversed or combine forward and backward. These are called bidirectional RNN or BRNN or bidirectional LSTM or BLSTM when using LSTM units (BGRU etc). CS109B, P ROTOPAPAS , G LICKMAN 39

Bidirectional (cond) 𝑍 𝑍 𝑍 "-/ "-. " �symbol for a BRNN 𝑍 " previous state previous state 𝑌 " 𝑌 "-. 𝑌 "-/ 𝑌 " CS109B, P ROTOPAPAS , G LICKMAN 40

Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman Outline Forgetting, remembering and updating (review) Gated networks, LSTM and GRU RNN Structures Bidirectional Deep

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Lecture 4: Recurrent neural networks for natural language processing Plan of the lecture Part

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

CSC413/2516 Lecture 7: Generalization & Recurrent Neural Networks Jimmy Ba Jimmy Ba

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

Laptop Sales Presentation + eApp = Success First United American currently offers two great

HELCOM Indicator Abundance and Distribution of Harbour porpoises progress on Indicator

PPO2012-08 Pitt Island Shag foraging ecology MIKE BELL Wildlife Management International Limited,

Using Collaboration to Engage Businesses Ashley Gustafson SOAR Fox Cities MiKayla Kunde

Presentation of the FERPA Activity Report by the President Dear friends, dear delegates, dear

pill-poppers Meet Eric We want to help Jespers grandfather! a bit about dosage packs.

Six Pitfalls Avoided by World Class Injury Prevention Programs Stephen Brown CEO, Briotix

Avoiding Contests: Donor Relations When Legal Capacity Becomes an Issue Marcia Inger Navrtil

Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 - PowerPoint PPT Presentation

Lecture 11: Recurrent Neural Networks 2 CS109B Data Science 2 Pavlos Protopapas and Mark Glickman Outline Forgetting, remembering and updating (review) Gated networks, LSTM and GRU RNN Structures Bidirectional Deep

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Lecture 4: Recurrent neural networks for natural language processing Plan of the lecture Part

Lecture 9: Recurrent Neural Networks Princeton University COS 495 Instructor: Yingyu Liang

CSC413/2516 Lecture 7: Generalization &amp; Recurrent Neural Networks Jimmy Ba Jimmy Ba

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

Laptop Sales Presentation + eApp = Success First United American currently offers two great

HELCOM Indicator Abundance and Distribution of Harbour porpoises progress on Indicator

PPO2012-08 Pitt Island Shag foraging ecology MIKE BELL Wildlife Management International Limited,

Using Collaboration to Engage Businesses Ashley Gustafson SOAR Fox Cities MiKayla Kunde

Presentation of the FERPA Activity Report by the President Dear friends, dear delegates, dear

pill-poppers Meet Eric We want to help Jespers grandfather! a bit about dosage packs.

Six Pitfalls Avoided by World Class Injury Prevention Programs Stephen Brown CEO, Briotix

Avoiding Contests: Donor Relations When Legal Capacity Becomes an Issue Marcia Inger Navrtil

CSC413/2516 Lecture 7: Generalization & Recurrent Neural Networks Jimmy Ba Jimmy Ba