Translation Model Adaptation Using Genre-Revealing Text Features Marlies van der Wees, Arianna Bisazza, Christof Monz
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Translation Model Adaptation 2 Using Genre-Revealing Text Features
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task Heterogeneous training data Specific translation task Translation Model Adaptation 2 Using Genre-Revealing Text Features
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source p(f|e) p(e|f) … target Heterogeneous !ﻟﺣﻣ% & praise be to 0.1 0.2 … training data Specific !ﻟﺣﻣ% & praise for 0.2 0.2 … translation task thank 0.1 0.2 … !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + my dear 0.2 0.1 … my love 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … Translation Model Adaptation 2 Using Genre-Revealing Text Features
Domain adaptation for SMT ✤ Prioritize translation candidates that are most relevant to a specific task source source p(f|e) p(e|f) … p(f|e) p(e|f) … target target Heterogeneous !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise be to praise be to 0.1 0.2 … 0.1 0.2 … training data Specific !ﻟﺣﻣ% & !ﻟﺣﻣ% & praise for praise for 0.2 0.2 … 0.2 0.2 … translation task thank thank 0.1 0.2 … 0.1 0.2 … !ﻟﺣﻣ% & !ﻟﺣﻣ% & ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my dear my dear 0.2 0.1 … 0.2 0.1 … my love my love 0.2 0.1 … 0.2 0.1 … ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + ﺣﺑ(ﺑﺗﻲ + my sweetheart my sweetheart 0.1 0.1 … 0.1 0.1 … ✤ What type of domain information to use? Translation Model Adaptation 2 Using Genre-Revealing Text Features
Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features
Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features
Dimensions of domains ✤ Topic refers to general subject politics, sports, tennis ✦ ✤ Genre refers to function, style, text type editorials, newswire, user-generated text ✦ orthogonal to topic ✦ ✤ Provenance refers to document’s origin LDC2005T13, Europarl, EMEA ✦ Translation Model Adaptation 3 Using Genre-Revealing Text Features
The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? Translation Model Adaptation 4 Using Genre-Revealing Text Features
The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property Translation Model Adaptation 4 Using Genre-Revealing Text Features
The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ Translation Model Adaptation 4 Using Genre-Revealing Text Features
The problem with provenance Provenance information has proven useful for adaptation in SMT, but is it the best representation of a domain? ✤ It’s not an intrinsic text property ✤ We might need manual labeling labor-intensive ✦ arbitrary ✦ ✤ Often combines particular topic and genre Translation Model Adaptation 4 Using Genre-Revealing Text Features
Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in Kurdish!!! Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features
Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you 13 industrial zones. talking about? Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features
Disentangling topic and genre in SMT* ✤ Experiments on News Comment controlled test set: The 12 contestants You allowed Barwas Culture competed during a to represent Iraq Gen&Topic May 3rd Prime. while she sings in ✤ Genre has larger Kurdish!!! impact on SMT than topic Economy Yemen is mulling What development the establishment of in Yemen are you ✤ We want to adapt 13 industrial zones. talking about? to different genres in a test corpus! Translation Model Adaptation * Van der Wees et al., 2015 5 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ Translation Model Adaptation 6 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features
What information to use for adaptation? ✤ Provenance information manual grouping of sub-corpora ✦ ✤ Topic information unsupervised LDA-inferred topics ✦ ✤ Genre information determine and exploit intrinsic ✦ genre-revealing text features Translation Model Adaptation 6 Using Genre-Revealing Text Features
Genre adaptation: the task ✤ Arabic-English phrase-based SMT Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features
Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: ✦ newswire (NW) • comments (UG) • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features
Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features
Genre adaptation: the task ✤ Arabic-English phrase-based SMT ✤ Two multi-genre evaluation sets: Gen&Topic*: NIST: ✦ ✦ newswire (NW) newswire (NW) • • comments (UG) weblogs (UG) • • ✤ Translation model adaptation Translation Model Adaptation * ilps.science.uva.nl/resources/gen-topic/ 7 Using Genre-Revealing Text Features
Recommend
More recommend