improving trees and alignments for syntax based machine
play

Improving Trees and Alignments for Syntax- Based Machine - PowerPoint PPT Presentation

Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007 Syntactic Approaches to MT


  1. Improving Trees and Alignments for Syntax- Based Machine Translation Kevin Knight USC/Information Sciences Institute joint work with Steven DeNeefe, Daniel Marcu, Wei Wang, and Jonathan May SRI, July 12, 2007

  2. Syntactic Approaches to MT • Use of syntactic information (noun, verb, etc) in the translation process: – Manually constructed rule-based systems – Statistical systems • Wu & Wong, 1998 • Yamada & Knight, 2001-2002 • Galley et al, 2004 – Contrast with phrase-based statistical approaches

  3. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman of police killed . Hypothesis #1

  4. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman of police attack . Hypothesis #7

  5. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman by police killed . Hypothesis #12

  6. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Killed gunman by police . Hypothesis #134

  7. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman killed the police . Hypothesis #9,329

  8. Phrase-Based Output . 被 枪手 警方 击毙 Decoder Gunman killed by police . Hypothesis #50,654 Problematic – - Output lacks English auxiliary and determiner - Re-ordering relies on luck, instead of on Chinese passive marker

  9. Syntax-Based Output . 被 枪手 警方 击毙 Decoder The gunman killed by police . Hypothesis #1 DT NN VBD IN NN NPB PP NP-C VP S

  10. Syntax-Based Output . 被 枪手 警方 击毙 Decoder Gunman by police shot . Hypothesis #16 NN IN NN VBD NPB PP NP-C VP S

  11. Syntax-Based Output . 被 枪手 警方 击毙 Decoder The gunman was killed by police . Hypothesis #1923 DT NN AUX VBN IN NN NPB PP NP-C VP S

  12. Why Might Syntax Help? • Phrase-based MT output is “n-grammatical”, not grammatical – Every sentence needs a subject and a verb • Re-ordering is poorly explained as “distortion” -- better explained as syntactic transformation – Arabic to English, VSO � SVO • Function words have syntactic effects even if they are not themselves translated

  13. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more difficult to navigate • Rule extraction from bilingual text has limitations this talk

  14. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations this talk

  15. Why Might Syntax Hurt? • Less freedom to glue available phrase-based pieces of output translations together -- search space has fewer output strings • Search space is more available syntax-based difficult to navigate translations • Rule extraction from bilingual text has limitations

  16. Comparing Phrase-Based Extraction with Syntax-Based Extraction • Quantitatively compare – A typical phrase-based bilingual extraction algorithm ( ATS , Och & Ney 2004) – A typical syntax-based bilingual extraction algorithm ( GHKM , Galley et al 2004) – These algorithms picked from two good- scoring NIST-06 systems • Identify areas of improvement for syntax- based rule coverage

  17. Phrase-Based and Syntax-Based Pattern Extraction estring … alignment cstring ATS [Och & Ney, 2004] phrase pairs consistent with word alignment etree … alignment cstring GHKM [Galley et al 2004] syntax transformation rules consistent with word alignment

  18. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  19. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  20. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  21. ATS (Och & Ney, 2004) PHRASE PAIRS ACQUIRED: � 有 felt � 有 责任 felt obliged felt obliged to do � 有 责任 尽 � 责任 obliged i felt obliged to do my part � 责任 尽 obliged to do � 尽 do � 一份 part � 一份 力 part 我 有 责任 尽 一份 力

  22. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  23. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  24. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  25. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  26. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  27. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB PRP PRP$ NN VP(VBD(felt) VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 有 责任 x0 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力

  28. GHKM (Galley et al, 2004) RULES ACQUIRED: S NP-C VP � 有 VP-C VBD(felt) VBD SG-C � 责任 VP VBN(obliged) VBN VP(x0:VBD TO VP-C VB NP-C VP-C(x1:VBN x2:SG-C) � x0 x1 x2 NPB NPB VP(VBD(felt) PRP PRP$ NN VP-C(VBN(obliged)) i felt obliged to do my part x0:SG-C) � 责任 x0 � 有 � � 有 有 责任 有 责任 责任 S(x0:NP-C x1:VP) � x0 x1 我 有 责任 尽 一份 力 minimal rules tile the tree/string/alignment triple. composed rules are made by combining those tiles.

  29. GHKM Syntax Rules Phrasal Translation Non-constituent Phrases Non-contiguous Phrases hay , NP S poner , NP está , cantando VP VP PRO VP VB NP PRT VBZ VBG there VB NP put on is singing are Context-Sensitive Multilevel Re-Ordering Lexicalized Re-Ordering Word Insertion NP S NP1, , NP2 VB, NP1, NP2 NPB NNS NP2 PP NP1 VP DT NNS P NP1 VB NP2 the of

  30. GHKM Syntax Rules Phrasal Translation Non-constituent Phrases Non-contiguous Phrases hay , NP S poner , NP está , cantando VP VP PRO VP VB NP PRT VBZ VBG there VB NP put on is singing are Context-Sensitive Multilevel Re-Ordering Lexicalized Re-Ordering Word Insertion NP S NP1, , NP2 VB, NP1, NP2 NPB NNS NP2 PP NP1 VP DT NNS P NP1 VB NP2 the of

Recommend


More recommend