cs480 680 lecture 18 july 8 2019
play

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Variable length data Traditional feed forward neural networks can only handle fixed


  1. CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

  2. Variable length data • Traditional feed forward neural networks can only handle fixed length data • Variable length data (e.g., sequences, time- series, spatial data) leads to a variable # of parameters • Solutions: – Recurrent neural networks – Recursive neural networks University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

  3. Recurrent Neural Network (RNN) • In RNNs, outputs can be fed back to the network as inputs, creating a recurrent structure that can be unrolled to handle varying length data. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

  4. Training • Recurrent neural networks are trained by backpropagation on the unrolled network – E.g. backpropagation through time • Weight sharing: – Combine gradients of shared weights into a single gradient • Challenges: – Gradient vanishing (and explosion) – Long range memory – Prediction drift University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

  5. RNN for belief monitoring • HMM can be simulated and generalized by a RNN University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

  6. Bi-Directional RNN • We can combine past and future evidence in separate chains University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

  7. Encoder-Decoder Model • Also known as sequence2sequence – ! (#) : % &' input – ( (#) : % &' output – ) : context (embedding) • Usage: – Machine translation – Question answering – Dialog University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

  8. Machine Translation • Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

  9. Long Short Term Memory (LSTM) • Special gated structure to control memorization and forgetting in RNNs • Mitigate gradient vanishing • Facilitate long term memory University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

  10. Unrolled LSTM • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

  11. ̅ ̅ ̅ ̅ LSTM cell in practice • Adjustments: – Hidden state ℎ " calledcell state # " – Output $ " called hidden state ℎ " • Update equations Input gate: % " = '() ** , " + ) (.*) ℎ "01 ) " = '() *3 , " + ) (.3) ℎ "01 ) Forget gate: 2 Output gate: 4 " = '() *5 , " + ) (.5) ℎ "01 ) # " = tanh() * ̃ ; , " + ) (. ̃ ;) ℎ "01 ) Process input: ̃ Cell update: # " = 2 " ∗ # "01 + % " ∗ ̃ # " Output: $ " = ℎ " = 4 " ∗ tanh(# " ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

  12. ̅ ̅ ̅ Gated Recurrent Unit (GRU) • Simplified LSTM – No cell state – Two gates (instead of three) – Fewer weights • Update equations " = $(& '( * " + & (,() ℎ "/0 ) Reset gate: ! Update gate: 1 " = $(& '2 * " + & (,2) ℎ "/0 ) , ℎ "/0 Process input: 3 ℎ " = tanh & '8 " ∗ & ,8 , * " + ! Hidden state update: ℎ " = (1 − 1 " ) ∗ ℎ "/0 + 1 " ∗ 3 ℎ " Output: < " = ℎ " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

  13. Attention • Mechanism for alignment in machine translation, image captioning, etc. • Attention in machine translation: a lign each output word with relevant input words by computing a softmax of the inputs – Context vector ! " : weighted sum of input encodings ℎ $ ! " = ∑ $ ' "$ ℎ $ – Where ' "$ is an alignment weight between input encoding ℎ $ and output encoding ( " )*+ ,-"./01/2(4 567 ,9 : ) ' "$ = ∑ :< )*+(,-"./01/2(4 567 ,9 :< )) (softmax) F ℎ $ – Alignment example: '=>?@AB@C ( "DE , ℎ $ = ( "DE University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

  14. Attention • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

  15. Machine Translation with Bidirectional RNNs, LSTM units and attention • Bahdanau, Cho, Bengio (ICLR-2015) RNNsearch: with attention RNNenc: no attention • Bleu: BiLingual Evaluation Understudy – Percentage of translated words that appear in ground truth University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

  16. Alignment example • Bahdanau, Cho, Bengio (ICLR-2015) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

  17. Recursive Neural network • Recursive neural networks generalize recurrent neural networks from chains to trees. • Weight sharing allows trees of different sizes to fit variable length data. • What structure should the tree follow? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

  18. Example: Semantic Parsing • Use a parse tree or dependency graph as the structure of the recursive neural network • Example: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Recommend


More recommend