� Reminder: Modeling and Decoding Syntax-Based Statistical Machine Translationp Statistical Machine Translation Outline p � Why Syntax? Lecture 5 � Yamada and Knight: translating into trees Syntax-Based Models � Wu: tree-based transfer Philipp Koehn � Chiang: hierarchical transfer pkoehn@inf.ed.ac.uk � Koehn: clause structure School of Informatics University of Edinburgh � Other approaches – p.1 – p.2 Philipp Koehn, University of Edinburgh 2 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Phrase-Based Translation Model p Decoding p Maria no dio una bofetada a la bruja verde Morgen fliege ich nach Kanada zur Konferenz � Foreign input is segmented in phrases Mary did not slap the green � Decoding process builds an English translation left to right, Tomorrow I will fly to the conference in Canada � Each phrase is translated into English – any sequence of words, not necessarily linguistically motivated � Phrases are reordered by picking foreign phrases to translate into English phrases – p.3 – p.4 Philipp Koehn, University of Edinburgh 3 Philipp Koehn, University of Edinburgh 4 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Search Space for Decoding Too Big p Word-Based Translation Model p Maria no no dio una bofetada a la bruja verde Mary did not slap the green witch n(3|slap) Mary not give a slap to the witch green did not a slap by green witch no slap to the Mary not slap slap slap the green witch did not give to the p-null slap the witch Mary not slap slap slap NULL the green witch e: witch e: slap f: -------*- f: *-***---- t(la|the) p: .182 p: .043 Maria no daba una botefada a la verde bruja e: e: Mary e: did not e: slap e: the e:green witch � Explosion of search space f: --------- f: *-------- f: **------- f: *****---- f: *******-- f: ********* � Translation process is broken up into small step: d(4|4) p: 1 p: .534 p: .154 p: .015 p: .004283 p: .000271 Maria no daba una bofetada a la bruja verde ) Pruning, Beam Search � Decoding can be done similarly to phrase-based decoding word translation, reordering, duplication, insertion – p.5 – p.6 Philipp Koehn, University of Edinburgh 5 Philipp Koehn, University of Edinburgh 6
� Reordering for syntactic reasons Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp The Challenge of Syntax p Advantages of Syntax-Based Translation p � Better explanation for function words interlingua – e.g., move German object to end of sentence � Conditioning to syntactically related words foreign english semantics semantics – e.g., prepositions, determiners � Use of syntactic language models foreign english – translation of verb may depend on subject or object syntax syntax � The classical machine translation pyramid foreign english words words – p.7 – p.8 Philipp Koehn, University of Edinburgh 7 Philipp Koehn, University of Edinburgh 8 � Good syntax tree ! good English Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Syntactic Language Model p String to Tree Translation p � Allows for long distance constraints interlingua foreign english semantics semantics � Use of English syntax trees [Yamada and Knight, 2001] foreign english syntax syntax S ? foreign english words words NP S VP VP NP PP NP VP � Left translation preferred by syntactic LM – exploit rich resources on the English side the house of the man is small the house the man is small is – obtained with statistical parser [Collins, 1997] – flattened tree to allow more reorderings – works well with syntactic language model – p.9 – p.10 Philipp Koehn, University of Edinburgh 9 Philipp Koehn, University of Edinburgh 10 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp j original) Yamada and Knight [2001] p Reordering Table p VB VB Original Order Reordering p(reorder reorder PRP VB1 VB2 PRP VB2 VB1 PRP VB1 VB2 PRP VB1 VB2 0.074 he adores VB TO he TO VB adores PRP VB1 VB2 PRP VB2 VB1 0.723 listening TO MN MN TO listening to music music to PRP VB1 VB2 VB1 PRP VB2 0.061 PRP VB1 VB2 VB1 VB2 PRP 0.037 VB VB insert PRP VB2 VB1 PRP VB2 VB1 PRP VB1 VB2 VB2 PRP VB1 0.083 he TO VB adores ha TO VB ga desu ha ga desu kare daisuki PRP VB1 VB2 VB2 VB1 PRP 0.021 MN TO listening MN TO no no kiku VB TO VB TO 0.107 translate music to ongaku wo VB TO TO VB 0.893 take leaves TO NN TO NN 0.251 Kare ha ongaku wo kiku no ga daisuki desu TO NN NN TO 0.749 – p.11 – p.12 Philipp Koehn, University of Edinburgh 11 Philipp Koehn, University of Edinburgh 12
� Chart Parsing � Chart Parsing Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Decoding as Parsing p Decoding as Parsing p PRP PRP NN TO he he music to � Pick Japanese words � Pick Japanese words kare ha ongaku wo kiku no ga daisuki desu kare ha ongaku wo kiku no ga daisuki desu � Translate into tree stumps � Translate into tree stumps – p.13 – p.14 Philipp Koehn, University of Edinburgh 13 Philipp Koehn, University of Edinburgh 14 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Decoding as Parsing p Decoding as Parsing p PP PP PRP NN TO PRP NN TO VB he music to he music to listening � Adding some more entries... � Combine entries kare ha ongaku wo kiku no ga daisuki desu kare ha ongaku wo kiku no ga daisuki desu – p.15 – p.16 Philipp Koehn, University of Edinburgh 15 Philipp Koehn, University of Edinburgh 16 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Decoding as Parsing p Decoding as Parsing p VB2 VB2 PP PP PRP NN TO VB PRP NN TO VB VB1 he music to listening he music to listening adores kare ha ongaku wo kiku no ga daisuki desu kare ha ongaku wo kiku no ga daisuki desu – p.17 – p.18 Philipp Koehn, University of Edinburgh 17 Philipp Koehn, University of Edinburgh 18
� Parsing of the English side Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Decoding as Parsing p Yamada and Knight: Training p � EM training VB – using Collins statistical parser VB2 ! unity of training and decoding as in IBM models – translation model is used to map training sentence pairs PP – EM training finds low-perplexity model PRP NN TO VB VB1 he music to listening adores � Finished when all foreign words covered kare ha ongaku wo kiku no ga daisuki desu – p.19 – p.20 Philipp Koehn, University of Edinburgh 19 Philipp Koehn, University of Edinburgh 20 � Do English trees match foreign strings? � Generation of both English and foreign trees [Wu, 1997] Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Is the Model Realistic? p � Crossings between French-English [Fox, 2002] Inversion Transduction Grammars p � Rules (binary and unary) A ! A A k A A 1 2 1 2 A ! A A k A A 1 2 2 1 � Can be reduced by A ! e k f – 0.29-6.27 per sentence, depending on how it is measured – A ! e k� – A ! �k f – – flattening tree, as done by [Yamada and Knight, 2001] ) Common binary tree required � Most coherence between dependency structures – – detecting phrasal translation – – special treatment for small number of constructions – limits the complexity of reorderings – p.21 – p.22 Philipp Koehn, University of Edinburgh 21 Philipp Koehn, University of Edinburgh 22 Syntax-Based Statistical Machine Translationp Syntax-Based Statistical Machine Translationp Syntax Trees p Syntax Trees (2) p � English binary tree Mary did not slap the green witch � Spanish binary tree Maria no daba una bofetada a la bruja verde – p.23 – p.24 Philipp Koehn, University of Edinburgh 23 Philipp Koehn, University of Edinburgh 24
Recommend
More recommend