semisupervised and unsupervised methods
play

Semisupervised and Unsupervised Methods Antonis Anastasopoulos - PowerPoint PPT Presentation

CS11-731 Machine Translation and Sequence-to-Sequence Models Semisupervised and Unsupervised Methods Antonis Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/ Supervised Learning We are provided the ground truth


  1. CS11-731 
 Machine Translation and 
 Sequence-to-Sequence Models Semisupervised and Unsupervised Methods Antonis Anastasopoulos Site https://phontron.com/class/mtandseq2seq2019/

  2. Supervised Learning We are provided the ground truth

  3. Unsupervised Learning No ground labels: 
 the task is to uncover latent structure

  4. Semi-supervised Learning A happy medium: use both annotated and unannotated data By Techerin - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=19514958

  5. Incorporating Monolingual Data

  6. On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015) MT ef Parallel Train NMT French English LM f Monolingual Train LM French Combine the two!

  7. On Using Monolingual Corpora in Neural Machine Translation (Gulcehre et al. 2015)

  8. Back-translation (Sennrich et al. 2016) Train French->English Parallel English French MT fe Back-Translate 
 Monolingual data Monolingual English French Train English->French

  9. Dual Learning 
 (He et al. 2016) Assume MT ef, MT fe, LM e, LM f Game: MT ef Parallel English French MT fe Translate sample with MT ef Get reward with LM f LM f LM e Monolingual English French Translate sample with MT fe Get reward with LM e

  10. Semi-Supervised Learning for MT 
 (Cheng et al. 2016) Round-trip translation for 
 supervision MT ef Parallel English French MT fe Translate e to f’ with MT ef Translate f’ to e’ with MT fe MT ef Loss from e and e’ MT fe Monolingual English French

  11. Another idea: use monolingual data to pretrain model components Use the monolingual data 
 to train the encoder 
 and the decoder. Parallel English French LM f LM e Monolingual English French

  12. Another idea: use monolingual data to pretrain model components Shaded regions are pre-trained From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

  13. Another idea: use monolingual data to pretrain model components Shaded regions are pre-trained From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

  14. Another idea: use monolingual data to pretrain model components From "Unsupervised Pretraining for Sequence to Sequence Learning", Ramachadran et al. 2017.

  15. Another idea: use monolingual data to pretrain model components From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

  16. Another idea: use monolingual data to pretrain model components From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

  17. Another idea: use monolingual data to pretrain model components From "MASS: Masked Sequence to Sequence Pre-training for Language Generation", Song et al. 2019.

  18. Pre-trained Word Embeddings in NMT

  19. Modern neural embeddings 
 (Mikolov et al, 2014) Skip-gram model: predict a word’s context CBOW model: predict a word from its context Others: GLoVe, fastText, etc

  20. Pre-trained embeddings From "A Bag of Useful Tricks for Practical Neural Machine Translation: 
 Embedding Layer Initialization and Large Batch Size", Neishi et al. 2017.

  21. Pre-trained embeddings: 
 when are they useful? From "When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?", Qi et al. 2017.

  22. Bilingual Lexicon Induction

  23. What is Bilingual Lexicon Induction? From "Learning Bilingual Lexicons from Monolingual Corpora", Haghighi et al. 2008.

  24. What is Bilingual Lexicon Induction? From "Learning Bilingual Lexicons from Monolingual Corpora", Haghighi et al. 2008.

  25. Bilingual Skip-gram model: 
 Using translations and alignments From "Bilingual Word Representations with Monolingual Quality in Mind", Luong et al. 2015.

  26. Mapping two monolingual embedding spaces Rotation Scaling Translation From "Earth Mover’s Distance Minimization for Unsupervised Bilingual Lexicon Induction", Zhang et al. 2015.

  27. Finding the best mapping The orthogonality assumption is important! What about if we don’t have a seed lexicon? From "Word Translation Without Parallel Data", Conneau et al. 2018.

  28. Unsupervised Mapping + Refinement From "Word Translation Without Parallel Data", Conneau et al. 2018.

  29. Issues with mapping methods From "On the Limitations of Unsupervised Bilingual Dictionary Induction", Søgaard et al. 2018.

  30. Unsupervised Translation

  31. … at the core of it all: decipherment French Weaver (1955): This is really English, encrypted in some strange symbols English French From "Deciphering Foreign Language", Ravi and Knight 2011.

  32. Unsupervised MT 
 (Lample et al. and Artetxe et al. 2018) LM e 1. Embeddings + Unsup. BLI 2. BLI —> Word Translations MT fe English French French French 3. Train MT fe and MT ef systems 4. Meanwhile, use unsupervised 
 objectives (denoising LM) 5. Iterate MT ef English English English French LM f

  33. Unsupervised MT 
 (Lample et al. 2018) Also add an adversarial loss for the intermediate representations: From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.

  34. Unsupervised MT 
 (Lample et al. 2018) From "Unsupervised MT Using Monolingual Corpora Only", Lample et al 2018.

Recommend


More recommend