Statistical machine translation in a few slides Mikel L. Forcada 1 , - PowerPoint PPT Presentation

Statistical machine translation in a few slides Mikel L. Forcada 1 , 2 1 Departament de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-03071 Alacant (Spain) 2 Prompsit Language Engineering, S.L., E-03690 St. Vicent del Raspeig (Spain) April 14-16, 2009: Free/open-source MT tutorial at the CNGL

Translation as probability/1 ◮ Instead of saying that ◮ a source-language (SL) sentence s in a SL text ◮ and a target-language (TL) sentence t as found in a SL–TL bitext are or are not a translation of each other, ◮ in SMT one says that they are a translation of each other with a probability p ( s , t ) = p ( t , s ) (a joint probability). ◮ We’ll assume we have such a probability model available. Or at least a reasonable estimate.

Translation as probability/2 ◮ According to basic probability laws, we can write: p ( s , t ) = p ( t , s ) = p ( s | t ) p ( t ) = p ( t | s ) p ( s ) (1) where p ( x | y ) is the conditional probability of x given y . ◮ We are interested in translating from SL to TL. That is, we want to find the most likely translation given the SL sentence s : t ⋆ = arg max p ( t | s ) (2) t

The “canonical” model ◮ We can rewrite eq. (1) as p ( t | s ) = p ( s | t ) p ( t ) (3) p ( s ) ◮ and then with (2) to get t ⋆ = arg max p ( s | t ) p ( t ) (4) t

“Decoding”/1 t ⋆ = arg max p ( s | t ) p ( t ) t ◮ We have a product of two probability models: ◮ A reverse translation model p ( s | t ) which tells us how likely the SL sentence s is a translation of the candidate TL sentence t , and ◮ a target-language model p ( t ) which tells us how likely the sentence t is in the TL side of bitexts. ◮ These may be related (respectively) to the usual notions of ◮ [reverse] adequacy : how much of the meaning of t is conveyed by s ◮ fluency : how fluent is the candidate TL sentence. ◮ The arg max strikes a balance between the two.

“Decoding”/2 ◮ In SMT parlance, the process of finding t ∗ is called decoding . 1 ◮ Obviously, it does not explore all possible translations t in the search space . There are infinitely many. ◮ The search space is pruned . ◮ Therefore, one just gets a reasonable t ≃ ⋆ instead of the ideal t ⋆ ◮ Pruning and search strategies are a very active research topic. Free/open-source software: Moses . 1 Reading SMT articles usually entails deciphering jargon which may be very obscure to outsiders or newcomers

Training/1 ◮ So where do these probabilities come from? ◮ p ( t ) may easily be estimated from a large monolingual TL corpus (free/open-source software: irstlm ) ◮ The estimation of p ( s | t ) is more complex. It’s usually made of ◮ a lexical model describing the probability that the translation of certain TL word or sequence of words (“phrase” 2 ) is a certain SL word or sequence of words. ◮ an alignment model describing the reordering of words or “phrases”. 2 A very unfortunate choice in SMT jargon

Training/2 ◮ The lexical model and the alignment model are estimated using a large sentence-aligned bilingual corpus through a complex iterative process. ◮ An initial set of lexical probabilities is obtained by assuming, for instance, that any word in the TL sentence aligns with any word in its SL counterpart. And then: ◮ Alignment probabilities in accordance with the lexical probabilities are computed. ◮ Lexical probabilities are obtained in accordance with the alignment probabilities This process (“expectation maximization”) is repeated a fixed number of times or until some convergence is observed (free/open-source software: Giza++ ).

Training/3 ◮ In “phrase-based” SMT, alignments may be used to extract ◮ (SL- phrase , TL- phrase ) pairs of phrases ◮ and their corresponding probabilities for easier decoding and to avoid “word salad”.

“Log-linear”/1 ◮ More SMT jargon! ◮ It’s short for linear combination of log arithms of probabilities. ◮ And, sometimes, even features that aren’t logarithms or probabilities of any kind. ◮ OK, let’s take a look at the maths.

“Log-linear”/2 ◮ One can write a more general formula: p ( t | s ) = exp ( � n F k = 1 λ k f k ( t , s )) (5) Z with n F feature functions f k ( t , s ) which can depend on s , t or both. ◮ Setting n F = 2, f 1 ( s , t ) = log p ( s | t ) , f 2 ( s , t ) = log p ( t ) , and Z = p ( s ) one recovers the canonical formula (3). ◮ The best translation is then n F t ⋆ = arg max � λ k f k ( t , s ) (6) t k = 1 Most of the f k ( t , s ) are logarithms, hence “log-linear”.

“Log-linear”/3 ◮ “Feature selection is a very open problem in SMT” (Lopez 2008) ◮ Other possible functions include length penalties (discouraging unreasonably short or long translations), “inverted” versions of p ( s | t ) , etc. ◮ Where do we get the λ k ’s from? ◮ They are usually tuned so as to optimize the results on a tuning set , according to a certain objective function that ◮ is taken to be an indicator that correlates with translation quality ◮ may be automatically obtained from the output of the SMT system and the translation in the corpus. This is called MERT ( minimum error rate training ) sometimes (free/open-source software: the Moses suite).

Ain’t got nothin’ but the BLEUs? ◮ The most famous “quality indicator” is called BLEU, but there are many others. ◮ BLEU counts which fraction of the 1-word, 2-word,. . . n -word sequences in the output match the reference translation. ◮ Correlation with subjective assessments of quality is still an open question. ◮ A lot of SMT research is currently BLEU-driven and makes little contact with real applications of MT.

The SMT lifecycle Development: Training: monolingual and sentence-aligned bilingual corpora are used to estimate probability models (features) Tuning: a held-out portion of the sentence-aligned bilingual corpus is used to tune the coeficients λ k Decoding: sentences s are fed into the SMT system and “decoded” into their translations t . Evaluation: the system is evaluated against a reference corpus.

License This work may be distributed under the terms of ◮ the Creative Commons Attribution–Share Alike license: http: //creativecommons.org/licenses/by-sa/3.0/ ◮ the GNU GPL v. 3.0 License: http://www.gnu.org/licenses/gpl.html Dual license! E-mail me to get the sources: mlf@ua.es

Statistical machine translation in a few slides Mikel L. Forcada 1 , - PowerPoint PPT Presentation

Statistical machine translation in a few slides Mikel L. Forcada 1 , 2 1 Departament de Llenguatges i Sistemes Informtics, Universitat dAlacant, E-03071 Alacant (Spain) 2 Prompsit Language Engineering, S.L., E-03690 St. Vicent del Raspeig

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

WEBINAR WEDNESDAY IB Insights: The New IB Math Curriculum and University Considerations October

A computation with Bernstein projectors of depth 0 for SL(2) Allen Mo y Chia go September

The SL(2) sector of at strong coupling Ivan Kostov IPhT-Saclay with Didina Serban and Dmytro

Proof-of-Stake Sidechains Peter Ga i, Aggelos Kiayias, Dionysis Zindros IEEE Security &

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

( a | s, ) . = Pr { A t = a | S t = s } n 1 r ( ) . p ( s , r |

Market Timing: Why and How Mark Pankin MDP Associates LLC Registered Investment Advisor March

Board Succession Planning BoardVision Julie Hembrock Daum, Practice Leader, North American Board

Statistical machine translation in a few slides Mikel L. Forcada 1 , - PowerPoint PPT Presentation

Statistical machine translation in a few slides Mikel L. Forcada 1 , 2 1 Departament de Llenguatges i Sistemes Informtics, Universitat dAlacant, E-03071 Alacant (Spain) 2 Prompsit Language Engineering, S.L., E-03690 St. Vicent del Raspeig

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Machine Translation: Going Deep Philipp Koehn 4 June 2015 Philipp Koehn Machine Translation:

WEBINAR WEDNESDAY IB Insights: The New IB Math Curriculum and University Considerations October

A computation with Bernstein projectors of depth 0 for SL(2) Allen Mo y Chia go September

The SL(2) sector of at strong coupling Ivan Kostov IPhT-Saclay with Didina Serban and Dmytro

Proof-of-Stake Sidechains Peter Ga i, Aggelos Kiayias, Dionysis Zindros IEEE Security &amp;

Continuous Nearest Neighbor Search Yufei Tao, Dimitris Papadias, Qiongmao Shen Hong Kong

( a | s, ) . = Pr { A t = a | S t = s } n 1 r ( ) . p ( s , r |

Market Timing: Why and How Mark Pankin MDP Associates LLC Registered Investment Advisor March

Board Succession Planning BoardVision Julie Hembrock Daum, Practice Leader, North American Board

Proof-of-Stake Sidechains Peter Ga i, Aggelos Kiayias, Dionysis Zindros IEEE Security &