Statistical Machine Translation May 13th, 2014 Josef van Genabith - PowerPoint PPT Presentation

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH Josef.van_Genabith@dfki.de Language Technology II SS 2014 With some additional slides from Chris Dyer MT Marathon 2011 and Sabine Hunsiker LT SS 2012

Overview  Introduction: the basic idea  IBM models: the noisy channel  Phrase-Based SMT Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 2

 Want to learn translation from data  Data = bitext  Texts and their translations  Aligned at sentence level  Brown et al, “ The Mathematics of Statistical Machine Translation ”, Computational Linguistics, 1993  Tough going  Fortunately: “ A Statistical MT Workbook” , Kevin Knight, 1999  These slides are based on Kevin Knight’s explanations … Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 3

Mary did not slap the green witch Mary  not slap slap slap the green witch Mary not slap slap slap NULL the green witch Maria no daba una bofetada a la verde bruja Maria no daba una bofetada a la bruja verde Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 4

 A generative story  Given a string in the source language, how can we generate a string in the target language that is a translation  Components of the story:   Fertility  t Translation (between words)  d Distortion (reordering)   0 NULL generated words  Putting them into a model  Learning the model (parameters) from data Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 5

 𝑄 𝑓  𝑄 𝑓, 𝑔 = 𝑄 𝑓 × 𝑄 𝑔 if e and f independent  𝑄 𝑓, 𝑔 = 𝑄 𝑓 × 𝑄(𝑔|𝑓) if e and f are not independent 𝑄(𝑓,𝑔)  𝑄 𝑓 𝑔 = 𝑄(𝑔)  𝑄 𝑓, 𝑔 = 𝑄 𝑔, 𝑓  𝑄 𝑓 𝑔 ≠ 𝑄 𝑔 𝑓 in general Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 6

 𝑓 = arg max 𝑄(𝑓|𝑔) 𝑓 𝑄 𝑔 𝑓 ×𝑄(𝑓)  𝑄 𝑓 𝑔 = 𝑄(𝑔) 𝑄 𝑔 𝑓 ×𝑄(𝑓)  𝑓 = arg max 𝑓 𝑄 𝑓 𝑔 = arg max = 𝑞(𝑔) 𝑓 arg max 𝑄 𝑔 𝑓 × 𝑄(𝑓) 𝑓  this is the Noisy Channel Model Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 7

The Noisy Channel Model arg max 𝑄 𝑔 𝑓 × 𝑄(𝑓) 𝑓  The noisy channel works like this. We imagine that someone has e in his head, but by the time it gets on to the printed page it is corrupted by “noise” and becomes f . To recover the most likely e , we reason about (1) what kinds of things people say any English, and (2) how English gets turned into French. These are sometimes called “ source modeling” and “ channel modeling.” (Knight, 1999, p.2)  People use the noisy channel metaphor for a lot of engineering problems, like actual noise on telephone transmissions. (ibid) Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 8

The Noisy Channel Model 𝑓 = arg max 𝑄 𝑔 𝑓 × 𝑄(𝑓) 𝑓 𝑄 𝑓 the source model, the language model 𝑄(𝑔|𝑓) the channel model, the translation model Observed f Source Channel What is most likely e ? e 𝑓 𝑄(𝑓) 𝑄(𝑔|𝑓) e f Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 9

Interlude Chris Dyers slides from MT Marathon 2011 on the Noisy Channel and SMT Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 10

Slide: Chris Dyer, MT Marathon 2011 Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statsitical Machine Translation 11

Slide: Chris Dyer, MT Marathon 2011 Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 29

End of Interlude Back to our slides based on Kevin Knight’s 1999 workbook Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 30

Translation Modelling  Remember that translating f to e we reason backwards  We observe f  We want to know what e is (most) likely to be uttered and likely to have been translated into f 𝑓 = arg max 𝑄 𝑔 𝑓 × 𝑄(𝑓) 𝑓  Story: replace words in e by French words and scramble them around  “What kind of a crackpot story is that ?” (Kevin Knight, 1999)  IBM Model 3  Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 31

 What happens in translation?  Actually a lot …. o EN: Mary did not slap the green witch o ES: Mary no daba una botefada a la bruja verde  But from a purely external point of view  Source words get replaced by target words  Words in target are moved around (“reordered”)  Source and target need not be equally long ….  So minimally that is what we need to model … Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 32

Some parts of the Model For each word 𝑓 𝑗 in an English sentence 𝑗 = 1 … 𝑚 , we choose a 1. fertility  𝑗 . The choice of fertility is dependent solely on the English word in question, nothing else. For each word 𝑓 𝑗 , we generate  𝑗 French words: 𝑢(𝑔|𝑓) . The choice of 2. French word is dependent solely on the English word that generates it. It is not dependent on the English context around the English word. It is not dependent on other French words that have been generated from this or any other English word. All those French words are permuted: 𝑒(  𝑔 |  𝑓 , 𝑚, 𝑛) . Each French 3. word is assigned an absolute target “position slot.” For example, one word may be assigned position 3 , and another word may be assigned position 2 -- the latter word would then precede the former in the final French sentence. The choice of position for a French word is dependent solely on the absolute position of the English word that generates it. Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 33

Translation as String Rewriting Mary did not slap the green witch  Mary  not slap slap slap the the green witch 𝑢 Maria no daba una bofetada a la verde bruja 𝑒 Maria no daba una bofetada a la bruja verde Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 34

Parameters  We would like to learn the Parameters for fertility, (word) translation and distortion from data  The parameters look like this  𝑜 3 𝑡𝑚𝑏𝑞  𝑢 𝑛𝑏𝑗𝑡𝑝𝑜 ℎ𝑝𝑣𝑡𝑓  𝑒 5 2,4,6  And they have probabilities associated with them Josef.van_Genabith@dfki.de Language Technology II (SS 2014): Statistical Machine Translation 35

Statistical Machine Translation May 13th, 2014 Josef van Genabith - PowerPoint PPT Presentation

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH Josef.van_Genabith@dfki.de Language Technology II SS 2014 With some additional slides from Chris Dyer MT Marathon 2011 and Sabine Hunsiker LT SS 2012 Overview

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

RAID 2001 Symposium Concluding Remarks October 12, Davis, CA Ludovic M / Wenke Lee / Felix Wu

Strings I Strings are built from characters The string "Computer" is represented

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon

Announcement Career Fair Strings and Languages Wednesday, September 27th 10am -- 4pm

CSE 158 Lecture 3 Web Mining and Recommender Systems Classification Learning outcomes This

Harry Potter, Wicca, and Bible Prophecy The girl with supernatural powers who learns

Methods (part 2) Alice In Action, Ch 2 10 July 2013 Slides Credit: Joel Adams, Alice in Action

Machine Translation Some slides are borrowed from Kevin Knight, University of Southern

Statistical Machine Translation May 13th, 2014 Josef van Genabith - PowerPoint PPT Presentation

Statistical Machine Translation May 13th, 2014 Josef van Genabith DFKI GmbH Josef.van_Genabith@dfki.de Language Technology II SS 2014 With some additional slides from Chris Dyer MT Marathon 2011 and Sabine Hunsiker LT SS 2012 Overview

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

RAID 2001 Symposium Concluding Remarks October 12, Davis, CA Ludovic M / Wenke Lee / Felix Wu

Strings I Strings are built from characters The string &quot;Computer&quot; is represented

Winter School Day 3: Decoding / Phrase-based models MT Marathon 28 January 2009 MT Marathon

Announcement Career Fair Strings and Languages Wednesday, September 27th 10am -- 4pm

CSE 158 Lecture 3 Web Mining and Recommender Systems Classification Learning outcomes This

Harry Potter, Wicca, and Bible Prophecy The girl with supernatural powers who learns

Methods (part 2) Alice In Action, Ch 2 10 July 2013 Slides Credit: Joel Adams, Alice in Action

Machine Translation Some slides are borrowed from Kevin Knight, University of Southern

Strings I Strings are built from characters The string "Computer" is represented