STATISTICAL MACHINE TRANSLATION 14.05.19 Statistical Natural - PowerPoint PPT Presentation

Jurafsky, D. and Martin, J. H. (2009): Speech and Language Processing. An Introduction to • Natural Language Processing, Computational Linguistics and Speech Recognition . Second Edition. Pearson: New Jersey: Chapter 25 Koehn, P. (2009). Statistical machine translation. Cambridge University Press. • Material von Bonnie Dorr’s lecture • Material from Kevin Knight’s lecture at Berkeley, 2004 • noisy channel model, word alignment, phrase-based translation STATISTICAL MACHINE TRANSLATION 14.05.19 Statistical Natural Language Processing 1

Rule-based vs. Statistical Machine Translation (MT) Rule-based MT : Hand-written transfer rules • Rules can be based on lexical or structural transfer • Pro: firm grip on complex translation phenomena • Con: Often very labor-intensive, lack of robustness • Statistical MT : Mainly word or phrase-based translations • Translation are learned from actual data • Pro: Translations are learned automatically • Con: Difficult to model complex translation phenomena • Neural MT : the most recent paradigm (the state-of-the-art as of now). 14.05.19 Statistical Natural Language Processing

The Machine Translation Pyramid Interlingua Source meaning Target meaning rule-based Source syntax Target syntax statistical Source word Target word Analysis Generation 14.05.19 Statistical Natural Language Processing

Parallel Corpus: Training resource for MT Most popular: EuroParl: European parliament protocols • in 11 languages Hansards: Canadian Parliament protocols • in French and English Software manuals (KDE, Open Office …) • Parallel webpages • For the remainder, we assume that we have a sentence-aligned parallel corpus. there are methods to get to aligned • sentences from aligned documents there are methods to extract parallel • Rosetta stone (196 BC): sentences from comparable corpora Greek-Egyptian-Demotic 14.05.19 Statistical Natural Language Processing

Fun bits Early results from translating English into Russian and back to English: The spirit is willing but the flesh is • weak è The vodka is good but the meat • is rotten Out of sight, out of mind • Invisible idiot • è

Why machine translation is hard? • Languages are structurally very different: – Word order – Syntax (e.g. SVO vs SOV vs VSO languages) – Lexical level: words, alphabets are different. – Agglutination, …. Statistical Natural Language Processing

Why machine translation is hard? The complex overlap between English leg , foot , etc. and various French translations like patte . Statistical Natural Language Processing

Why machine translation is hard? • Complex reorderings may be needed. • German often puts adverbs in initial position that English would put later. • German tensed verbs often occur in second position causing the subject and verb to be inverted. Statistical Natural Language Processing

RULE-BASED SYNTACTIC TRANSFER APPROACH English à Spanish English à Japanese Statistical Natural Language Processing

interlingua Interlingual representation of “Mary did not slap the green witch”. Statistical Natural Language Processing

Statistical machine translation Statistical Natural Language Processing

Computing Translation Probabilities Imagine that we want to translate from French (f) into English (e). • Given a parallel corpus we can estimate P (e|f) . The maximum likelihood estimation of P (e|f) is: freq (e,f) /freq (f) • Way too specific to get any reasonable frequencies when done on the basis of sentences, vast majority of unseen data will have zero counts • P (e|f ) could be re-defined as: P ( e i | f j ) P ( e | f ) = ∏ max e i f j • Problem: The English words maximizing P (e|f ) might not result in a readable sentence 14.05.19 Statistical Natural Language Processing

Computing Translation Probabilities • We can account for adequacy: each foreign word translates into its most likely English word • We cannot guarantee that this will result in a fluent English sentence • Solution: transform P(e | f) with Bayes’ rule: P ( e | f ) = P ( f | e ) ⋅ P ( e ) P ( f ) • P (f|e) accounts for adequacy • P (e) accounts for fluency 14.05.19 Statistical Natural Language Processing

Statistical Machine Translation (SMT): The noisy channel model • SMT as a function e.g. of French (f) → English (e) • French is, in fact, English that was garbled by a noisy channel. Input Output argmax P ( f | e ) ⋅ P ( e ) = argmax = argmax P ( e | f ) P ( f | e ) ⋅ P ( e ) P ( f ) e e e 14.05.19 Statistical Natural Language Processing

Three Problems for Statistical MT Language model • – Given a target language string e, assigns P(e) – good target language string è high P(e) – random word sequence è low P(e) Translation model • – Given a pair of strings <f,e>, assigns P(f|e) by formula – <f,e> look like translations è high P(f|e) – <f,e> don’t look like translations è low P(f|e) Decoding algorithm • – Given a language model, a translation model, and a new sentence f: find translation e maximizing P(e) P(f|e) 14.05.19 Statistical Natural Language Processing

Language Modeling: P(e) • Determine the probability of an English sequence P(e) • Can use n-gram models, PCFG-based models etc.: anything that assigns a probability for a sequence • Standard: n-gram model P ( e ) = P ( e 1 ) P ( e 2 | e 1 ) l P ( e i | e i − 1 .. e i − n + 1 ) ∏ i = 3 • Language model picks the most fluent translation of many possible translations • Language model can be estimated from a large monolingual corpus 14.05.19 Statistical Natural Language Processing

Translation Modeling: P(f|e) Determines the probability that the foreign word f j is a translation of the • English word e i How to compute P(f j | e i ) from a parallel corpus? Need to align their • translations Statistical approaches rely on the co-occurrence of e i and f j in the parallel data: • If e i and f j tend to co-occur in parallel sentence pairs, they are likely to be translations of one another Commonly, four factors are used: • – translation: How often do e i and f j co-occur? – distortion: How likely is a word occurring at position x to translate into a word occurring at position y? For example: English is a verb-second language, whereas German is a verb-final language – fertility : How likely is e i to translate into more than one word? For example: “defeated” can translate into "eine Niederlage erleiden" – null translation : How likely is a foreign word to be spuriously generated? 14.05.19 Statistical Natural Language Processing

Sentence alignment Statistical Natural Language Processing

Word Alignment 14.05.19 Statistical Natural Language Processing

Word Alignment A = 2, 3, 4, 5, 6, 6, 6 14.05.19 Statistical Natural Language Processing

IBM Models 1-5 by brown et al. (1993) Model 1 : lexical translation • Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational linguistics , 19 (2), 263-311. – Bag of words – Unique local maxima – Efficient EM algorithm Model 2 : adds absolute alignment model: • a ( e pos | f pos , e length , f length ) Model 3 : add fertility model: n (k|e) • – No full EM, count only neighbors (Model 3–5) – Leaky (Model 3–4) Model 4 : adds relative alignment model • – Relative distortion – word classes Model 5 : fixes deficiency • – Extra variables to avoid leakiness 14.05.19 Statistical Natural Language Processing

IBM Models • Given an English sentence e 1 … e l and a foreign sentence f 1 … f m • We want to find the ’best’ alignment a , where a is a set of pairs of the form {(i , j), . . . , (i’, j’)}, 0<= i , i’ <= l and 1<= j , j’<= m • Note that if (i , j), (i’, j) are in a , then i equals i’, i.e. no many-to- one alignments are allowed • We add a spurious NULL word to the English sentence at position 0 • In total there are (l+1) m different alignments A • Allowing for many-to-many alignments results in (2 l ) m possible alignments A 14.05.19 Statistical Natural Language Processing

Translation steps in IBM models: generative view 14.05.19 Statistical Natural Language Processing

IBM Model 1 Simplest of the IBM models • Does not model one-to-many alignments • Computationally inexpensive • Useful for parameter estimations that are passed on to more elaborate • models 14.05.19 Statistical Natural Language Processing

IBM Model 1: generative story 14.05.19 Statistical Natural Language Processing

STATISTICAL MACHINE TRANSLATION 14.05.19 Statistical Natural - PowerPoint PPT Presentation

Jurafsky, D. and Martin, J. H. (2009): Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition . Second Edition. Pearson: New Jersey: Chapter 25 Koehn, P. (2009).

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Pirah Pirah Numbers & Stuff ISO 639-2 <myp> Spoken by Hi'aiti'ihi (

New NP dependency marking in the second generation IE languages Artemij Keidan, Sapienza

C2NLU: An Overview Heike Adel CIS, LMU Munich Dagstuhl January 23, 2017 C2NLU: An Overview

Machine Translation 3: Linguistics in SMT and NMT Ond rej Bojar bojar@ufal.mff.cuni.cz

MORPHOLOGY A Study of the internal structure of words and the relationships among words

South Asian Languages K. V. S. Prasad (Chalmers University) Suma Bhat (University of Illinois)

Counting Words: the basics Introduction Zipfs law Typical frequency patterns Zipfs law

MOL2NET, 2017 , 3, doi:10.3390/mol2net-03-xxxx 2 Introduction Torilis species (Apiaceae),

STATISTICAL MACHINE TRANSLATION 14.05.19 Statistical Natural - PowerPoint PPT Presentation

Jurafsky, D. and Martin, J. H. (2009): Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition . Second Edition. Pearson: New Jersey: Chapter 25 Koehn, P. (2009).

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

Pirah Pirah Numbers &amp; Stuff ISO 639-2 &lt;myp&gt; Spoken by Hi'aiti'ihi (

New NP dependency marking in the second generation IE languages Artemij Keidan, Sapienza

C2NLU: An Overview Heike Adel CIS, LMU Munich Dagstuhl January 23, 2017 C2NLU: An Overview

Machine Translation 3: Linguistics in SMT and NMT Ond rej Bojar bojar@ufal.mff.cuni.cz

MORPHOLOGY A Study of the internal structure of words and the relationships among words

South Asian Languages K. V. S. Prasad (Chalmers University) Suma Bhat (University of Illinois)

Counting Words: the basics Introduction Zipfs law Typical frequency patterns Zipfs law

MOL2NET, 2017 , 3, doi:10.3390/mol2net-03-xxxx 2 Introduction Torilis species (Apiaceae),

Pirah Pirah Numbers & Stuff ISO 639-2 <myp> Spoken by Hi'aiti'ihi (