translation model adaptation using genre revealing text
play

Translation Model Adaptation Using Genre-Revealing Text Features - PowerPoint PPT Presentation

Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz Domain adaptation for SMT Prioritize translation candidates that are most relevant to a specific task Translation Model


  1. Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz

  2. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  3. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  4. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features

  5. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source p(f|e) p(e|f) … target Heterogeneous !ﻟﺣﻣ% & praise be to 0.1 0.2 … training data Specific !ﻟﺣﻣ% & praise for 0.2 0.2 … translation task thank 0.1 0.2 … !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + my dear 0.2 0.1 … my love 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features

  6. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features

  7. Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … ✤ What type of domain information to use? Translation Model Adaptation 2 Using Genre-Revealing Text Features

  8. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  9. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  10. Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ ✤ Provenance refers to document’s origin LDC2005T13, Europarl, EMEA ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features

  11. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? Translation Model Adaptation 4 Using Genre-Revealing Text Features

  12. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property Translation Model Adaptation 4 Using Genre-Revealing Text Features

  13. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ Translation Model Adaptation 4 Using Genre-Revealing Text Features

  14. The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ ✤ Often combines particular topic and genre Translation Model Adaptation 4 Using Genre-Revealing Text Features

  15. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in Kurdish!!! Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  16. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  17. Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you ✤ We want to adapt 13 industrial zones. talking about? to different genres in a test corpus! Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features

  18. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features

  19. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features

  20. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  21. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  22. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  23. What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features

  24. Genre adaptation: the task ✤ Arabic-English phrase-based SMT Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  25. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: ✦ newswire (NW) • comments (UG) • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  26. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

  27. Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • ✤ Translation model adaptation Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features

Recommend


More recommend