Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Philipp Koehn koehn@csail.mit.edu Computer Science and Artificial Intelligence Lab Massachusetts Institute of Technology – p.1
� � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Outline p Phrase-Based Statistical MT Beam Search Decoding Experiments Advanced Features – p.2 Philipp Koehn, Massachusetts Institute of Technology 2
� � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Machine Translation p Task: Make sense of foreign text like Long-standing problem in artificial intelligence Ultimately requires syntax, semantics, pragmatics – p.3 Philipp Koehn, Massachusetts Institute of Technology 3
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Statistical Machine Translation p Components: Translation model, language model, decoder foreign/English English parallel text text statistical analysis statistical analysis Translation Language Model Model Decoding Algorithm – p.4 Philipp Koehn, Massachusetts Institute of Technology 4
� � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Phrase-Based Translation p Morgen fliege ich nach Kanada zur Konferenz Tomorrow I will fly to the conference in Canada Foreign input is segmented in phrases – any sequence of words, not necessarily linguistically motivated Each phrase is translated into English Phrases are reordered – p.5 Philipp Koehn, Massachusetts Institute of Technology 5
� � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Phrase-Based Systems p A number of research groups developed phrase-based systems ( RWTH Aachen, USC/ISI, CMU, IBM, JHU, ITC-irst, MIT, ... ) Systems differ in – training methods – model for phrase translation table – reordering models – additional feature functions Currently best method for SMT (MT?) – top systems in DARPA/NIST evaluation are phrase-based – best commercial system for Arabic-English is phrase-based – p.6 Philipp Koehn, Massachusetts Institute of Technology 6
� � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Pharaoh p Translation engine – works with various phrase-based models – beam search algorithm – time complexity roughly linear with input length – good quality takes about 1 second per sentence Very good performance in DARPA/NIST Evaluation Freely available for researchers http://www.isi.edu/licensed-sw/pharaoh/ – p.7 Philipp Koehn, Massachusetts Institute of Technology 7
� � � � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Outline p Phrase-Based Statistical MT Beam Search Decoding Experiments Advanced Features – p.8 Philipp Koehn, Massachusetts Institute of Technology 8
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Build translation left to right – select foreign words to be translated – p.9 Philipp Koehn, Massachusetts Institute of Technology 9
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation – p.10 Philipp Koehn, Massachusetts Institute of Technology 10
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary Build translation left to right – select foreign words to be translated – find English phrase translation – add English phrase to end of partial translation – mark foreign words as translated – p.11 Philipp Koehn, Massachusetts Institute of Technology 11
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not One to many translation – p.12 Philipp Koehn, Massachusetts Institute of Technology 12
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap Many to one translation – p.13 Philipp Koehn, Massachusetts Institute of Technology 13
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the Many to one translation – p.14 Philipp Koehn, Massachusetts Institute of Technology 14
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the green Reordering – p.15 Philipp Koehn, Massachusetts Institute of Technology 15
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Decoding Process p Maria no dio una bofetada a la bruja verde Mary did not slap the green witch Translation finished – p.16 Philipp Koehn, Massachusetts Institute of Technology 16
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Translation Options p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch Look up possible phrase translations – many different ways to segment words into phrases – many different ways to translate each phrase – p.17 Philipp Koehn, Massachusetts Institute of Technology 17
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: f: --------- p: 1 Start with null hypothesis – e: no English words – f: no foreign words covered – p: probability 1 – p.18 Philipp Koehn, Massachusetts Institute of Technology 18
� � Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: e: Mary f: --------- f: *-------- p: 1 p: .534 Pick translation option Create hypothesis – e: add English phrase Mary – f: first foreign word covered – p: probability 0.534 – p.19 Philipp Koehn, Massachusetts Institute of Technology 19
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch f: -------*- p: .182 e: e: Mary f: --------- f: *-------- p: 1 p: .534 Add another hypothesis – p.20 Philipp Koehn, Massachusetts Institute of Technology 20
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: ... slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary f: --------- f: *-------- p: 1 p: .534 Further hypothesis expansion – p.21 Philipp Koehn, Massachusetts Institute of Technology 21
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 ... until all foreign words covered – find best hypothesis that covers all foreign words – backtrack to read off translation – p.22 Philipp Koehn, Massachusetts Institute of Technology 22
� Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation p Hypothesis Expansion p Maria no no dio una bofetada a la bruja verde Mary not give a slap to the witch green did not a slap by green witch no slap to the did not give to the slap the witch e: witch e: slap f: -------*- f: *-***---- p: .182 p: .043 e: e: Mary e: did not e: slap e: the e:green witch f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 Adding more hypothesis Explosion of search space – p.23 Philipp Koehn, Massachusetts Institute of Technology 23
Recommend
More recommend