Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term - PowerPoint PPT Presentation

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials Study materials http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (http://karpathy.github.io/2015/05/21/rnn-effectiveness/) http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf (http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf) Chapter 10 of textbook

RNN concepts RNN concepts Based on Christopher Olah's blog on LSTM Based on Christopher Olah's blog on LSTM You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence . For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the �lm to inform later ones.

RNN concepts RNN concepts That's why RNN's have loops, allowing information to persist. That's why RNN's have loops, allowing information to persist.

Here below, is some neural network that takes is some input at time , and A x t t that outputs . h t A loop allows information to be passed from one step of the network to the next.

RNN concepts RNN concepts A recurrent neural network can be thought of as multiple A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a copies of the same network, each passing a message to a successor. successor. Consider what happens if we unroll the loop:

The problem of long-term dependencies The problem of long-term dependencies One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. Sometimes, we only need to look at recent information to perform the present task.

The problem of long-term dependencies The problem of long-term dependencies I was up all night wondering where the Sun had gone. Then it I was up all night wondering where the Sun had gone. Then it dawned on me. Sky is dawned on me. Sky is ___

The problem of long-term dependencies The problem of long-term dependencies Answer to the life, the universe and everything is Answer to the life, the universe and everything is _

The problem of long-term dependencies The problem of long-term dependencies Due to poor grades in high school, Steven Spielberg was rejected from the University of Southern California three times. He was awarded an honorary degree in 1994 and became a trustee of the university in 1996. "Since 1980, I've been trying to be associated with this school," joked the 62-year-old �lmmaker. "I eventually had to buy my way in," he told the Los Angeles Times. Spielberg has to date directed 51 �lms and won three Oscars. Forbes Magazine puts Spielberg's wealth at $3 billion. He is ___

The problem of long-term dependencies The problem of long-term dependencies In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.

The problem of long-term dependencies The problem of long-term dependencies Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. Because of the "vanishing gradient" problem. LSTMs solve this problem

RNN RNN RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps.

RNN RNN : input at time step . x t t : hidden state value at time step . It's the memory of the network. It is s t t calculated based on the previous hidden state and the input at the current step: s t = f ( U x t + W s t −1 ) Here, function usually is a non-linearity such as sigmoid , or tanh or ReLU f : it is used to compute value of the �rst hidden state, , and typically is s −1 s 0 s −1 initialized to 0. : output at step . It's calculated only based on the memory at time . o t t t o t = softmax( V s t )

RNN RNN RNN shares the same parameters across all time steps. U , V , W It is like we are performing the same task at each step, just with different inputs. This property greatly reduces total number of parameters we need to learn. The following RNN has outputs at each time step, but depending on the task this may not be necessary. ( Remember the types of RNN? one-to-one, vs one-to-many, vs many-to-many )

Language modeling and generating text Language modeling and generating text Given a sequence of words we want to predict the probability of each word given the previous words. Let's we have a sentence of words. A language model allows us to predict the m probability of observing the sentence: m P ( w 1 , ⋯ , w m ) = ∏ P ( w i w 1 | , ⋯ , w i −1 ) i =1 P ( ABC ) = P ( AB ) P ( C | AB ) = P ( A ) P ( B | A ) P ( C | AB )

How to train RNN How to train RNN You can �nd that the parameters are shared in different time steps. U , V , W s t = tanh ( U x t + W s t −1 ) y ^ t = softmax ( V s t )

How to train RNN How to train RNN

The loss, as the cross-entropy loss at time step is given by: t E t y t y ( , ^ t ) = − y t log y ^ t Therefore, the total error is just the sum of errors at each time step: E ( y , ) y ^ = ∑ E t y t y ( , ^ t ) t = − ∑ y t log y ^ t t Here, is the correct word at time step , and is the corresponding y t t y ^ t prediction.

How to train RNN How to train RNN We need to compute gradients of the error with respect to the parameters, , and adjust the parameters using stochastic gradient descent: U , V , W

How to train RNN How to train RNN ∂ E ∂ E t = ∑ ∂ W ∂ W t ∂ E ∂ E t = ∑ ∂ U ∂ U t ∂ E ∂ E t = ∑ ∂ V ∂ V t

Computing Computing ∂ E t ∂ V ∂ y ∂ E 3 ∂ E 3 ^ 3 = ∂ V ∂ y ∂ V ^ 3 ∂ y ∂ E 3 ^ 3 ∂ z 3 = ∂ y ∂ z 3 ∂ V ^ 3 = ( y ^ 3 − y 3 ) ⨂ s 3 Here, , and is the outer product between two vectors. z 3 = V s 3 ⨂

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term - PowerPoint PPT Presentation

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials Study materials http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

An Introduction to Neural Networks Long Short Term Memory (LSTM) and the Attention mechanism Ange

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2017/ Encoder-decoder Models

Attention Graham Neubig Site https://phontron.com/class/nn4nlp2020/ Encoder-decoder Models

Memory Memory processing is the ability to: Acquire (Short term memory) Manipulate

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

AMMI Introduction to Deep Learning 11.2. LSTM and GRU Fran cois Fleuret

The short- -term and long term and long- -term term The short stratospheric and tropospheric

Multi-Dimensional LSTM Networks for Video Prediction Wonmin Byeon NVIDIA Research March 29, 2018

E-LSTM: Efficient Inference of Sparse LSTM on Embedded Heterogeneous System Runbin Shi 1 Junjie

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting LSTM

Long-Term Memory Introduction STM versus LTM Episodic Memory Semantic Memory

Chapter 3 - Cognition Types of human memory Short term memory and cognitive processes

1 Memory SoC Persistent Memory-Driven Memory Memory Processor-Centric Memory SoC SoC

Networks Computer-Computer Comm CPU CPU CPU CPU Memory Device Device Memory Memory

SHORT-TERM RENTALS IN AUSTIN, TX Smart City Policy Summit September 17, 2019 Todd LaRue,

CS1110 Nate Brunelle Today: How do computers? Questions? Last Time Paper airplanes

Full OCM model for the ATC Andrea Lorenzani http://www.di.unipi.it/~lorenzan/work/FM4IS.ppt

ARFIMA (long memory) models Christopher F Baum EC 327: Financial Econometrics Boston College,

Creating a long-term memory for the global DNS Mattijs Jonker Introduction Almost fjve

2. Cognitive Perspective of Learning Cognition: Big Questions How do things out there

Cognitive Psychology Philipp Koehn 13 February 2020 Philipp Koehn Artificial Intelligence:

Recurrent Neural Networks + LSTMs + Attention Surag Nair (based on slides by Xavier

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural Networks Introduction The