statistical machine translation statistical machine
play

Statistical Machine Translation Statistical Machine Translation p - PDF document

Components: Translation model, language model, decoder Statistical Machine Translation Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding


  1. � Components: Translation model, language model, decoder Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of Decoding foreign/English English Philipp Koehn parallel text text pkoehn@inf.ed.ac.uk statistical analysis statistical analysis School of Informatics University of Edinburgh Translation Language Model Model Decoding Algorithm – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 � A number of research groups developed phrase-based Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase-Based Systems p Phrase-Based Translation p Morgen fliege ich nach Kanada zur Konferenz � Systems differ in systems ( RWTH Aachen, Univ. of Southern California/ISI, CMU, IBM, Johns Hopkins Univ., Cambridge Univ., Univ. of Catalunya, � Foreign input is segmented in phrases ITC-irst, Univ. Edinburgh, Univ. of Maryland...) Tomorrow I will fly to the conference in Canada � Each phrase is translated into English – training methods – model for phrase translation table – any sequence of words, not necessarily linguistically motivated � Phrases are reordered � Currently best method for SMT (MT?) – reordering models – additional feature functions – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 � Phrase Translations for “den Vorschlag”: Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Phrase Translation Table p Decoding Process p � (e j f) � (e j f) Maria no dio una bofetada a la bruja verde English English � Build translation left to right the proposal 0.6227 the suggestions 0.0114 ’s proposal 0.1068 the proposed 0.0114 a proposal 0.0341 the motion 0.0091 – select foreign words to be translated the idea 0.0250 the idea of 0.0091 this proposal 0.0227 the proposal , 0.0068 proposal 0.0205 its proposal 0.0068 of the proposal 0.0159 it 0.0068 the proposals 0.0159 ... ... – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6

  2. Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Build translation left to right � Build translation left to right Mary Mary – select foreign words to be translated – select foreign words to be translated – find English phrase translation – find English phrase translation – add English phrase to end of partial translation – add English phrase to end of partial translation – mark foreign words as translated – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � One to many translation Mary did not � Many to one translation Mary did not slap – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Decoding Process p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde � Many to one translation Mary did not slap the � Reordering Mary did not slap the green – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12

  3. Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Decoding Process p Translation Options p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to � Look up possible phrase translations the � Translation finished Mary did not slap the green witch slap the witch – many different ways to segment words into phrases – many different ways to translate each phrase – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Hypothesis Expansion p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green Mary not give a slap to the witch green did not a slap by green witch did not a slap by green witch no slap to the no slap to the did not give to did not give to the the slap the witch slap the witch � Start with empty hypothesis � Pick translation option e: e: e: Mary f: --------- f: --------- f: *-------- � Create hypothesis p: 1 p: 1 p: .534 – e: no English words – f: no foreign words covered – e: add English phrase Mary – f: first foreign word covered – p: probability 1 – p: probability 0.534 – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 � Not going into detail here, but... Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p Statistical Machine Translation — Lecture 2: Theory and Praxis of Decoding p A Quick Word on Probabilities p � Translation Model Hypothesis Expansion p j Maria) Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to – phrase translation probability p(Mary the slap the witch – reordering costs � Language Model e: witch f: -------*- p: .182 – phrase/word count costs � Add another hypothesis e: e: Mary – ... f: --------- f: *-------- j < s > ) * p(did j Mary, < s > ) * p(not j Mary did) p: 1 p: .534 – uses trigrams: – p(Mary did not) = p(Mary – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18

Recommend


More recommend