Chunk-based Verb Reordering in VSO Sentences for Arabic-English SMT Arianna Bisazza, Marcello Federico FBK-irst Trento, Italy WMT 2010, Uppsala, 15-16 July 2010 1
Introduc)on ● English word order : Subject‐Verb‐Object ● Arabic : both SVO and VSO ● Common errors in phrase‐based SMT outputs: − wrong order of syntacBc consBtuents − verbless sentences WMT 2010, Uppsala A. Bisazza, M. Federico 2
Outline ● Reordering paEerns in Arabic‐English ● Chunk‐based verb reordering: technique and analysis ● Impact of VSO sentences on translaBon quality ● Chunk‐based reordering laIces WMT 2010, Uppsala A. Bisazza, M. Federico 3
Outline ● Reordering pa3erns in Arabic‐English ● Chunk‐based verb reordering: technique and analysis ● Impact of VSO sentences on translaBon quality ● Chunk‐based reordering laIces WMT 2010, Uppsala A. Bisazza, M. Federico 4
Reordering pa3erns in Arabic‐English VSO sentence: Arabic verb an#cipated wrt English WMT 2010, Uppsala A. Bisazza, M. Federico 5
Reordering pa3erns in Arabic‐English VSO sentence: Arabic verb an#cipated wrt English Several local, one long reordering involving the verb Typical phrase‐based SMT outputs: *The Moroccan monarch King Mohamed VI __ his support to… *He renewed the Moroccan monarch King Mohamed VI his support to… WMT 2010, Uppsala A. Bisazza, M. Federico 6
Previous works (Habash '07; Crego&Habash '08; Elming&Habash '09) • preprocess source data to approximate target word order • address all reorderings • determinisBc reordering => 1 most probable permutaBon • non‐determinisBc => word reordering laIces Our work: • only one class of reorderings • mixed approach: determinisBc for train, laIces for test WMT 2010, Uppsala A. Bisazza, M. Federico 7
Reordering pa3erns in Arabic‐English Working hypothesis: uneven distribu#on of reordering phenomena WMT 2010, Uppsala A. Bisazza, M. Federico 8
Reordering pa3erns in Arabic‐English Working hypothesis: uneven distribu#on of reordering phenomena Many local − adjecBval modifiers following their noun − head‐iniBal geniBve construcBons ( idafa ) Example => Few global − Verb‐Subject‐Object sentences WMT 2010, Uppsala A. Bisazza, M. Federico 9
Reordering pa3erns in Arabic‐English Working hypothesis: uneven distribu#on of reordering phenomena Many local − adjecBves follow nouns − head‐iniBal geniBve construcBons ( idafa ) Example => Few global − Verb‐Subject‐Object sentences WMT 2010, Uppsala A. Bisazza, M. Federico 10
Reordering pa3erns in Arabic‐English Working hypothesis: uneven distribu#on of reordering phenomena Many local − adjecBves follow nouns − head‐iniBal geniBve construcBons ( idafa ) Example => Few global − Verb‐Subject‐Object sentences WMT 2010, Uppsala A. Bisazza, M. Federico 11
Reordering pa3erns in Arabic‐English VSO sentences: moving verb a\er subject simplifies reordering Other (local) reorderings: handled inside phrases or through distorBon WMT 2010, Uppsala A. Bisazza, M. Federico 12
Outline ● Reordering paEerns in Arabic‐English ● Chunk‐based verb reordering: technique and analysis ● Impact of VSO sentences on translaBon quality ● Chunk‐based reordering laIces WMT 2010, Uppsala A. Bisazza, M. Federico 13
Chunk‐based verb reordering – Simplifying assumpBons: 1) verb reordering only between shallow syntax chunks; 2) no overlap between consecuBve verb movements WMT 2010, Uppsala A. Bisazza, M. Federico 14
Chunk‐based verb reordering – Simplifying assumpBons: 1) verb reordering only between shallow syntax chunks; 2) no overlap between consecuBve verb movements – Possible movements: move verb chunk… WMT 2010, Uppsala A. Bisazza, M. Federico 15
Chunk‐based verb reordering – Simplifying assumpBons: 1) verb reordering only between shallow syntax chunks; 2) no overlap between consecuBve verb movements – Possible movements: move verb chunk… ...or verb chunk + next chunk (e.g. adverbials) by up to X chunks to the right WMT 2010, Uppsala A. Bisazza, M. Federico 16
Chunk‐based verb reordering Best movement: minimizes distorBon wrt English translaBon WMT 2010, Uppsala A. Bisazza, M. Federico 17
Chunk‐based verb reordering: corpus analysis DistribuBon by movement length IntersecBon of GIZA++ alignments Manual alignments WMT 2010, Uppsala A. Bisazza, M. Federico 18
Chunk‐based verb reordering: corpus analysis DistribuBon by movement length => Good coverage (≥ 99.5%) with max movement length 6 WMT 2010, Uppsala A. Bisazza, M. Federico 19
Outline ● Reordering paEerns in Arabic‐English ● Chunk‐based verb reordering: technique and analysis ● Impact of VSO sentences on transla)on quality ● Chunk‐based reordering laIces WMT 2010, Uppsala A. Bisazza, M. Federico 20
Impact of VSO sentences on MT quality • Baseline: Moses, 30M words newswire from NIST09 WMT 2010, Uppsala A. Bisazza, M. Federico 21
Impact of VSO sentences on MT quality • Baseline: Moses, 30M words newswire from NIST09 • Shallow syntax chunking: AMIRA (Diab&al.2004) • Verb‐reorder training and devset, re‐train whole system WMT 2010, Uppsala A. Bisazza, M. Federico 22
Impact of VSO sentences on MT quality • Baseline: Moses, 30M words newswire from NIST09 • Shallow syntax chunking: AMIRA (Diab&al.2004) • Verb‐reorder training and devset, re‐train whole system • Verb‐reorder test aligned with reference (oracle) • Tested with different DistorBon Limits (DL) from 2 to 10 and wide beam search WMT 2010, Uppsala A. Bisazza, M. Federico 23
Impact of VSO sentences on MT quality %BLEU scores on Eval08‐NW (MERT on Dev06‐NW): WMT 2010, Uppsala A. Bisazza, M. Federico 24
Impact of VSO sentences on MT quality %BLEU scores on Eval08‐NW (MERT on Dev06‐NW): Verb reordering of training data only => posiBve effect (9% more phrases extracted) WMT 2010, Uppsala A. Bisazza, M. Federico 25
Impact of VSO sentences on MT quality %BLEU scores on Eval08‐NW (MERT on Dev06‐NW): Verb reordering of training and test => further gain (+1.2 with 1/3 of sentences modified) Verb reordering of training data only => posiBve effect (9% more phrases extracted) WMT 2010, Uppsala A. Bisazza, M. Federico 26
Impact of VSO sentences on MT quality %BLEU scores on Eval08‐NW (MERT on Dev06‐NW): Verb reordering of training and test => further gain Relaxing the DL to high (+1.2 with 1/3 of sentences modified) values doesn’t help Verb reordering of training data only => posiBve effect (9% more phrases extracted) WMT 2010, Uppsala A. Bisazza, M. Federico 27
Impact of VSO sentences on MT quality To resume: • VSO sentences affect negaBvely phrase‐based SMT • Specific models needed to handle verb reordering of test WMT 2010, Uppsala A. Bisazza, M. Federico 28
Recommend
More recommend