neural machine translation
play

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp - PowerPoint PPT Presentation

Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020 Language Models 1 Modeling variants feed-forward neural network recurrent neural network long


  1. Neural Machine Translation Philipp Koehn 6 October 2020 Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  2. Language Models 1 • Modeling variants – feed-forward neural network – recurrent neural network – long short term memory neural network • May include input context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  3. Feed Forward Neural Language Model 2 w i Output Word Softmax h Hidden Layer FF Ew Embedding Embed Embed Embed Embed w i-4 w i-3 w i-2 w i-1 History Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  4. Recurrent Neural Language Model 3 y i Output Word the Output Word t i Softmax Prediction Recurrent h j RNN State Input Word E x j Embed Embedding x j Input Word <s> Predict the first word of a sentence Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  5. Recurrent Neural Language Model 4 y i Output Word the house Output Word t i Softmax Softmax Prediction Recurrent h j RNN RNN State Input Word E x j Embed Embed Embedding x j Input Word <s> the Predict the second word of a sentence Re-use hidden state from first word prediction Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  6. Recurrent Neural Language Model 5 y i Output Word the house is Output Word t i Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN State Input Word E x j Embed Embed Embed Embedding x j Input Word <s> the house Predict the third word of a sentence ... and so on Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  7. Recurrent Neural Language Model 6 y i Output Word the house is big . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  8. Recurrent Neural Translation Model 7 • We predicted the words of a sentence • Why not also predict their translations? Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  9. Encoder-Decoder Model 8 y i Output Word the house is big . </s> das Haus ist groß . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> das Haus ist groß . • Obviously madness • Proposed by Google (Sutskever et al. 2014) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  10. What is Missing? 9 • Alignment of input words to output words ⇒ Solution: attention mechanism Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  11. 10 neural translation model with attention Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  12. Input Encoding 11 y i Output Word the house is big . </s> Output Word t i Softmax Softmax Softmax Softmax Softmax Softmax Prediction Recurrent h j RNN RNN RNN RNN RNN RNN State Input Word E x j Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . • Inspiration: recurrent neural network language model on the input side Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  13. Hidden Language Model States 12 • This gives us the hidden states RNN RNN RNN RNN RNN RNN RNN • These encode left context for each word • Same process in reverse: right context for each word RNN RNN RNN RNN RNN RNN RNN Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  14. Input Encoder 13 Right-to-Left h j RNN RNN RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN RNN RNN Encoder Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> • Input encoder: concatenate bidrectional RNN states • Each word representation includes full left and right sentence context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  15. Encoder: Math 14 Right-to-Left h j RNN RNN RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN RNN RNN Encoder Input Word E x j Embed Embed Embed Embed Embed Embed Embed Embedding x j Input Word <s> the house is big . </s> • Input is sequence of words x j , mapped into embedding space ¯ E x j • Bidirectional recurrent neural networks ← h j = f ( ← − − − h j +1 , ¯ E x j ) − → h j = f ( − − → h j − 1 , ¯ E x j ) • Various choices for the function f () : feed-forward layer, GRU, LSTM, ... Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  16. Decoder 15 • We want to have a recurrent neural network predicting output words Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  17. Decoder 16 • We want to have a recurrent neural network predicting output words Output Word E y i Embed Embed Embed Embed Embeddings Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN • We feed decisions on output words back into the decoder state Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  18. Decoder 17 • We want to have a recurrent neural network predicting output words Output Word E y i Embed Embed Embed Embed Embeddings Output Word t i Softmax Softmax Softmax Prediction s i Decoder State RNN RNN RNN RNN c i Input Context • We feed decisions on output words back into the decoder state • Decoder state is also informed by the input context Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  19. More Detail 18 • Decoder is also recurrent neural network over sequence of hidden states s i Output Word E y i Embed Embed Embeddings s i = f ( s i − 1 , Ey − 1 , c i ) y i Output Word • Again, various choices for the function f () : <s> das feed-forward layer, GRU, LSTM, ... Output Word t i Softmax Prediction • Output word y i is selected by computing a vector t i (same size as vocabulary) s i Decoder State RNN RNN t i = W ( Us i − 1 + V Ey i − 1 + Cc i ) c i Input Context then finding the highest value in vector t i • If we normalize t i , we can view it as a probability distribution over words • Ey i is the embedding of the output word y i Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  20. Attention 19 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Given what we have generated so far (decoder hidden state) • ... which words in the input should we pay attention to (encoder states)? Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  21. Attention 20 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Given: – the previous hidden state of the decoder s i − 1 – the representation of input words h j = ( ← h j , − − → h j ) • Predict an alignment probability a ( s i − 1 , h j ) to each input word j (modeled with with a feed-forward neural network layer) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  22. Attention 21 s i Decoder State RNN RNN Input Context α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Normalize attention (softmax) exp ( a ( s i − 1 , h j )) α ij = � k exp ( a ( s i − 1 , h k )) Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  23. Attention 22 s i Decoder State RNN RNN Weighted c i Input Context Sum α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Relevant input context: weigh input words according to attention: c i = � j α ij h j Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  24. Attention 23 s i Decoder State RNN RNN RNN Weighted c i Input Context Sum α ij Attention Attention Right-to-Left h j RNN RNN RNN RNN RNN Encoder Left-to-Right h j RNN RNN RNN RNN RNN Encoder • Use context to predict next hidden state and output word Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

  25. 24 training Philipp Koehn Machine Translation: Neural Machine Translation 6 October 2020

Recommend


More recommend