Using Log-linear Models for Tuning Machine Translation Output - PowerPoint PPT Presentation

Using Log-linear Models for Tuning Machine Translation Output Michael Carl IAI L REC 2008 1

Overview: METIS: Architecture described in session p28 (Friday, 14:40 ) • Statistical MT using: – Shallow linguistic ressources (SL Analysis, mapping, re-ordering) – Hand-made dictionaries (assign weights) – Generate (partial) translations and filter – Huge TL corpus (n-gram TL models) Feature Functions • • Evaluation test set and results Conclusion: best results: lemmatisation, tagging, lexical weights • L REC 2008 2

Overview of the System SL Sentence Source language model SL Analysis Dictionary Look-up Translation model ‚Expander‘ Search Engine Target language model Token Generation TL Sentence L REC 2008 3

AND/OR Graph for SL: Hans kommt nicht {lu=Hans,c=noun, wnr=1, ...} @{c=noun}@{lu=hans,c=NP0}. . ,{lu=nicht,c=adv,wnr=3, ...} @{c=verb}@{lu=do,c=VDZ},{lu=not,c=XX0}. , {c=adv}@{lu=not,c=XX0}.. ,{lu=kommen,c=verb,wnr=2, ...} @{c=verb}@{lu=come,c=VVB;VVZ}. , {c=verb}@{lu=come,c=VVB;VVZ},{lu=along,c=AVP}. , {c=verb}@{lu=come,c=VVB;VVZ},{lu=off,c=AVP}. , {c=verb}@{lu=come,c=VVB;VVZ},{lu=up,c=AVP}.. . L REC 2008 4

Types of Feature Functions Source features: • – probabilities of dependencies in SL representations (parse tree dictionary matching) Channel features: • – SL-to-TL alignment and lexical translation probabilities • lexical translation weights • Target features: – probabilities of TL sentence ( n-gram language models) • n -gram token, lemma, tag models • lemma-tag co-occurrence weights L REC 2008 5

Log-linear feature functions Set of specified features h that describe properties of the • data Associated set of learned weights w that determine the • contribution of each feature. e = argmax ∑ m w m h m   Find weights to allow a search procedure ( argmax ) to • find the target sentence ê with the highest probability L REC 2008 6

Lexical Feature Function Train L(g => e) on 10.000 aligned EURPARL sentences: L  g ⇒ e = h  g ⇔ e / ∑ e h  g ⇔ e  n  g ⇒ e  • noise: n  g ⇒ e  g in SL no realization of e in the TL side h  g ⇔ e  • hit : g in SL and e in the TL side L REC 2008 7

Lemma-Tag Cooccurrance Weights T(lem, tag) = C(lem, tag) +1 / NL + C(lem) – NL : number of different CLAWS5 tags (~ 70) – C(lem) : number of occurrences of lem in the BNC – C(lem,tag) : number of co-occurrences of a lem and a tag L REC 2008 8

Statistical Language Models SRILM toolkit: • n-gram language models based on BNC – 20K, 100K, 1M and 2M sentences • Lemma n-gram language models – n={3,4,5} • Tag m-gram lanhguage models: – m={3,4,5,6,7} L REC 2008 9

Two Evaluation Test Sets German ==> English Tested on a 200 sentences test corpus. • ● lexical translation problems: ● separable prefixes, fixed verb constructions, degree of adjectives and adverbs, lexical ambiguities, and others ● syntactic translation problems: ● pronominalization, determination, word order, different complementation, relative clauses, tense/aspect, etc .. 200 sentences selected from the EUROPARL Corpus • (extracted from the STAT-MT Website) – between 2 and 32 words length (each language side) L REC 2008 10

Evaluation Start with one feature function ( n-gram lemma/token model) • incrementally added feature functions • – n-gram CLAWS5 tag model – m-gram lemma model – Lemma-tag co-occurrence weights – Lexical translation weights Experimentally assign weights • Evaluate (with BLEU) • L REC 2008 11

BLEU Evaluation of 200 Test Sentences using token, lemma and tag language models L REC 2008 12

BLEU Evaluation of 200 EUROPARL Sentences using token, lemma and tag language models L REC 2008 13

BLEU Evaluation of 200 Test Sentences with added lexical (Lex) and token-tag cooccurrence (TTF) models L REC 2008 14

BLEU Evaluation of 200 EUROPARL Sentences with added lexical (Lex) and token-tag cooccurrence (TTF) models L REC 2008 15

Conclusion ● Lemma-based models are better than token-based models: ● increasing size of the training material for lemma models provides better results than increasing the length of the n-gram models ● Adding a tag model improves the output in any case: ● larger values of n (in our case n=5 ) may be an easier way to increase perform than to increase the size of the training set ● Token-tag cooccurrance feature function does not help ● Lexical weights are suitable if the training material is similar to the texts to be translated L REC 2008 16

Using Log-linear Models for Tuning Machine Translation Output - PowerPoint PPT Presentation

Using Log-linear Models for Tuning Machine Translation Output Michael Carl IAI L REC 2008 1 Overview: METIS: Architecture described in session p28 (Friday, 14:40 ) Statistical MT using: Shallow linguistic ressources (SL Analysis,

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao Liu Liang Huang City

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

!" I'J''B K 'L''''''''B "' M'2 " '

Out line Learning f rom complet e Dat a St at ist ical Learning EM algor it hm (part I

Maxent Models (III), & Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

Restructuring anab::ParticleID Kirsty Duffy and Adam Lister 1 2 Introduction The current

The real story of the film so far... X a continuous random variable : for all x , { X x } is an

PARTON DISTRIBUTIONS AT THE DAWN OF THE LHC S TEFANO F ORTE U NIVERSIT ` A DI M ILANO & INFN

A generalized confidence interval for the mean response in log-regression models Miguel Fonseca

What is logistic regression ? MU LTIP L E AN D L OG ISTIC R E G R E SSION IN R Ben Ba u mer

Using Log-linear Models for Tuning Machine Translation Output - PowerPoint PPT Presentation

Using Log-linear Models for Tuning Machine Translation Output Michael Carl IAI L REC 2008 1 Overview: METIS: Architecture described in session p28 (Friday, 14:40 ) Statistical MT using: Shallow linguistic ressources (SL Analysis,

(142733/102960-Log[4])+(614851/73920-2 Log[64]) h 2 +(2329/1680-Log[4]) h 4 -h 10 /20160

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Chandra data reduction The CDFs Giorgio, Margherita, Elisabeta, Eleonora, Lazarus, Enrica,

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Search Aware Tuning for Machine Translation 0 1 2 3 4 Lemao Liu Liang Huang City

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

!&quot; I'J''B K 'L''''''''B &quot;' M'2 &quot; '

Out line Learning f rom complet e Dat a St at ist ical Learning EM algor it hm (part I

Maxent Models (III), &amp; Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

Restructuring anab::ParticleID Kirsty Duffy and Adam Lister 1 2 Introduction The current

The real story of the film so far... X a continuous random variable : for all x , { X x } is an

PARTON DISTRIBUTIONS AT THE DAWN OF THE LHC S TEFANO F ORTE U NIVERSIT ` A DI M ILANO &amp; INFN

A generalized confidence interval for the mean response in log-regression models Miguel Fonseca

What is logistic regression ? MU LTIP L E AN D L OG ISTIC R E G R E SSION IN R Ben Ba u mer

!" I'J''B K 'L''''''''B "' M'2 " '

Maxent Models (III), & Neural Language Models CMSC 473/673 UMBC September 25 th , 2017 Some

PARTON DISTRIBUTIONS AT THE DAWN OF THE LHC S TEFANO F ORTE U NIVERSIT ` A DI M ILANO & INFN