class 15 long short term memory lstm class 15 long short
play

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term - PowerPoint PPT Presentation

Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials Study materials http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (http://colah.github.io/posts/2015-08-Understanding-LSTMs/)


  1. Class 15 - Long Short-Term Memory (LSTM) Class 15 - Long Short-Term Memory (LSTM) Study materials Study materials http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (http://colah.github.io/posts/2015-08-Understanding-LSTMs/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/ (http://karpathy.github.io/2015/05/21/rnn-effectiveness/) http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf (http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture10.pdf) Chapter 10 of textbook

  2. RNN concepts RNN concepts Based on Christopher Olah's blog on LSTM Based on Christopher Olah's blog on LSTM You don’t throw everything away and start thinking from scratch again. Your thoughts have persistence . For example, imagine you want to classify what kind of event is happening at every point in a movie. It’s unclear how a traditional neural network could use its reasoning about previous events in the �lm to inform later ones.

  3. RNN concepts RNN concepts That's why RNN's have loops, allowing information to persist. That's why RNN's have loops, allowing information to persist.

  4. Here below, is some neural network that takes is some input at time , and A x t t that outputs . h t A loop allows information to be passed from one step of the network to the next.

  5. RNN concepts RNN concepts A recurrent neural network can be thought of as multiple A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a copies of the same network, each passing a message to a successor. successor. Consider what happens if we unroll the loop:

  6. The problem of long-term dependencies The problem of long-term dependencies One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. Sometimes, we only need to look at recent information to perform the present task.

  7. The problem of long-term dependencies The problem of long-term dependencies I was up all night wondering where the Sun had gone. Then it I was up all night wondering where the Sun had gone. Then it dawned on me. Sky is dawned on me. Sky is ___

  8. The problem of long-term dependencies The problem of long-term dependencies Answer to the life, the universe and everything is Answer to the life, the universe and everything is _

  9. The problem of long-term dependencies The problem of long-term dependencies Due to poor grades in high school, Steven Spielberg was rejected from the University of Southern California three times. He was awarded an honorary degree in 1994 and became a trustee of the university in 1996. "Since 1980, I've been trying to be associated with this school," joked the 62-year-old �lmmaker. "I eventually had to buy my way in," he told the Los Angeles Times. Spielberg has to date directed 51 �lms and won three Oscars. Forbes Magazine puts Spielberg's wealth at $3 billion. He is ___

  10. The problem of long-term dependencies The problem of long-term dependencies In such cases, where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information.

  11. The problem of long-term dependencies The problem of long-term dependencies Unfortunately, as that gap grows, RNNs become unable to learn to connect the information. Because of the "vanishing gradient" problem. LSTMs solve this problem

  12. RNN RNN RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory RNNs can make use of information in arbitrarily long sequences, but in practice they are limited to looking back only a few steps.

  13. RNN RNN : input at time step . x t t : hidden state value at time step . It's the memory of the network. It is s t t calculated based on the previous hidden state and the input at the current step: s t = f ( U x t + W s t −1 ) Here, function usually is a non-linearity such as sigmoid , or tanh or ReLU f : it is used to compute value of the �rst hidden state, , and typically is s −1 s 0 s −1 initialized to 0. : output at step . It's calculated only based on the memory at time . o t t t o t = softmax( V s t )

  14. RNN RNN RNN shares the same parameters across all time steps. U , V , W It is like we are performing the same task at each step, just with different inputs. This property greatly reduces total number of parameters we need to learn. The following RNN has outputs at each time step, but depending on the task this may not be necessary. ( Remember the types of RNN? one-to-one, vs one-to-many, vs many-to-many )

  15. Language modeling and generating text Language modeling and generating text Given a sequence of words we want to predict the probability of each word given the previous words. Let's we have a sentence of words. A language model allows us to predict the m probability of observing the sentence: m P ( w 1 , ⋯ , w m ) = ∏ P ( w i w 1 | , ⋯ , w i −1 ) i =1 P ( ABC ) = P ( AB ) P ( C | AB ) = P ( A ) P ( B | A ) P ( C | AB )

  16. How to train RNN How to train RNN You can �nd that the parameters are shared in different time steps. U , V , W s t = tanh ( U x t + W s t −1 ) y ^ t = softmax ( V s t )

  17. How to train RNN How to train RNN

  18. The loss, as the cross-entropy loss at time step is given by: t E t y t y ( , ^ t ) = − y t log y ^ t Therefore, the total error is just the sum of errors at each time step: E ( y , ) y ^ = ∑ E t y t y ( , ^ t ) t = − ∑ y t log y ^ t t Here, is the correct word at time step , and is the corresponding y t t y ^ t prediction.

  19. How to train RNN How to train RNN We need to compute gradients of the error with respect to the parameters, , and adjust the parameters using stochastic gradient descent: U , V , W

  20. How to train RNN How to train RNN ∂ E ∂ E t = ∑ ∂ W ∂ W t ∂ E ∂ E t = ∑ ∂ U ∂ U t ∂ E ∂ E t = ∑ ∂ V ∂ V t

  21. Computing Computing ∂ E t ∂ V ∂ y ∂ E 3 ∂ E 3 ^ 3 = ∂ V ∂ y ∂ V ^ 3 ∂ y ∂ E 3 ^ 3 ∂ z 3 = ∂ y ∂ z 3 ∂ V ^ 3 = ( y ^ 3 − y 3 ) ⨂ s 3 Here, , and is the outer product between two vectors. z 3 = V s 3 ⨂

Recommend


More recommend