Bilingual Markov Reordering Labels for Hierarchical SMT Gideon Maillette de Buy Wenniger and Khalil Sima’an gemdbw AT gmail.com k.simaan AT uva.nl http://staff.science.uva.nl/~gemaille/ http://staff.science.uva.nl/~simaan/ Statistical Language Processing and Learning Lab Institute for Logic Language and Computation University of Amsterdam, the Netherlands October 25th, 2014 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 1
The incoherence of translation reordering Sentence type Sentence contents der handlungsspielraum der beiden betroffenen regierung Source Sentence ist also durch das internationale recht begrenzt . any action by the two governments concerned Reference is therefore limited by this international law . the margin for manoeuvre of two government Hiero (Baseline) is concerned by the international community limited . G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 2
Hiero and Memento Question: what do they have in common? S 10 accordingly X 11 X 13 X 17 tailor our X 12 policy should politik X 14 wir X 17 ausrichten we unsere X 14 X 13 müssen X 12 X 11 darauf S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 3
Lexicalization and Language model: the words are not enough G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 4
Coherence demands (reordering) context Vision: Hierarchical Alignment Trees (HATs) G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 5
Outline Part 1: Bilingual Phrase Reordering Labels Part 2: Label Substitution Features Part 3: Experiments Conclusions G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 6
Part 1: Bilingual Phrase Reordering Labels
NDT with Alignment structure 1 2 3 4 5 6 7 1 1 3 4 5 6 we should tailor our policy accordingly darauf müsen wir unsere politik ausrichten 1 2 3 4 5 6 ([ 1 , 6 ] , [ 1 , 6 ] , 1 ) ([ 1 , 2 ] , [ 2 , 3 ] , 2 ) ([ 4 , 5 ] , [ 4 , 5 ] , 3 ) ([ 1 , 1 ] , [ 3 , 3 ] , 4 ) ([ 2 , 2 ] , [ 2 , 2 ] , 5 ) ([ 4 , 4 ] , [ 4 , 4 ] , 6 ) ([ 5 , 5 ] , [ 5 , 5 ] , 7 ) G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 7
NDT with Alignment structure = HAT G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 8
Reordering Labeled Grammar Extraction Word Alignment Hierarchical Align- ment Trees Chart Extract Reordering labels Label Chart Grammar Extractor SCFG G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 9
Bilingual Phrase Reordering label categories Phrase-Centric Parent-Relative G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 10
Phrase-centric reordering labels Complexity relation between base phrase and children in HAT determines label Five cases distinguished, ordered by increasing complexity Monotonic Inversion 1 2 1 2 this is an important matter we all agree on this das ist ein wichtige angelegenheit das sehen wir alle 1 2 2 1 Permutation Complex Atomic 1 2 3 1 1 2 3 4 we owe this to our citizens it would be possible i want to stress two points auf zwei punkte möchte ich hinweisen das sind wir unsern burgern schuldig kann mann 2 4 1 3 2 1 3 1 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 11
Known labels from ITG and Phrase pair Theory
Monotonic Monotonic : If the alignment can be split into two monotonically ordered parts. Monotonic 1 2 this is an important matter Inversion 1 2 we all agree on this das ist ein wichtige angelegenheit 1 2 das sehen wir alle 2 1 Permutation Complex Atomic 1 2 3 1 2 3 1 4 i want to stress two points we owe this to our citizens it would be possible auf zwei punkte möchte ich hinweisen das sind wir unsern burgern schuldig kann mann 2 4 1 3 2 1 3 1 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 12
Inverted Inverted : If the alignment can be split into two inverted parts. Inversion 1 2 we all agree on this Monotonic 1 2 this is an important matter das sehen wir alle das ist ein wichtige angelegenheit 1 2 1 2 Permutation Complex Atomic 1 2 3 4 1 2 3 1 i want to stress two points we owe this to our citizens it would be possible auf zwei punkte möchte ich hinweisen das sind wir unsern burgern schuldig kann mann 2 4 1 3 2 1 3 1 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 13
Atomic Atomic : If the alignment does not allow the existence of smaller (child) phrase pairs. Monotonic Inversion 1 2 1 2 this is an important matter we all agree on this das ist ein wichtige angelegenheit das sehen wir alle 1 2 2 1 Atomic 1 it would be possible Permutation Complex 1 2 3 1 2 3 4 i want to stress two points we owe this to our citizens kann mann auf zwei punkte möchte ich hinweisen das sind wir unsern burgern schuldig 1 2 4 1 3 2 1 3 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 14
New labels based on HATs
Permutation Permutation : If the alignment can be factored as a permutation of more than 3 parts. Monotonic Inversion 1 2 1 2 this is an important matter we all agree on this das ist ein wichtige angelegenheit das sehen wir alle 1 2 2 1 Permutation 1 2 3 4 i want to stress two points Complex Atomic 1 2 3 1 we owe this to our citizens it would be possible punkte auf zwei möchte ich hinweisen 2 4 1 3 das sind wir unsern burgern schuldig kann mann 2 1 3 1 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 15
Complex Complex : No alignment factorization as a permutation of parts, but smaller phrase pair is contained (i.e., it is composite). Monotonic Inversion 1 2 1 2 this is an important matter we all agree on this das ist ein wichtige angelegenheit das sehen wir alle 1 2 2 1 Complex 1 2 3 we owe this to our citizens Permutation Atomic 1 2 3 1 4 i want to stress two points it would be possible das sind wir unsern burgern schuldig 2 1 auf zwei punkte möchte ich hinweisen 3 kann mann 2 4 1 3 1 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 16
Phrase-Centric labeled derivation S 10 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 tailor accordingly INVERTED MONO 12 13 darauf ausrichten INVERTED MONO 12 13 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 tailor accordingly INVERTED MONO 12 13 should ATOMIC 14 müssen ATOMIC 14 darauf ausrichten INVERTED MONO 12 13 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 tailor accordingly INVERTED MONO 12 13 should ATOMIC 14 we wir müssen ATOMIC 14 darauf ausrichten INVERTED MONO 12 13 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 tailor accordingly INVERTED MONO 12 13 should our ATOMIC ATOMIC 14 17 we wir müssen unsere ATOMIC ATOMIC 14 17 darauf ausrichten INVERTED MONO 12 13 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Phrase-Centric labeled derivation S 10 COMPLEX 11 tailor accordingly INVERTED MONO 12 13 should our ATOMIC ATOMIC 14 17 we policy wir politik müssen unsere ATOMIC ATOMIC 14 17 darauf ausrichten INVERTED MONO 12 13 COMPLEX 11 S 10 G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 17
Parent-Relative reordering labels Describe type of reordering relative to embedding “parent” phrase First-order view on reordering (Details ommitted due to time constraints) G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 18
Part 2: Label Substitution Features
Label substitution features Unique feature for every label pair � L α , L β � Marks specific LHS substitutes specific gap Substituting rule LHS 10 Two more coarse α β γ N1 N2 11 12 features: ◮ Match GAP1 GAP2 11 12 ◮ Nomatch Decoder chart Basic Features G. Wenniger, K. Sima’an (ILLC) Bilingual Markov Reordering Labels October 25th, 2014 19
Part 3: Experiments
Recommend
More recommend