Decoding in Statistical Machine Translation Christian Hardmeier - PDF document

Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/MT16/ mid-course-eval.html Decoding The decoder is the part of the SMT system that creates the translations. Given a set of models, how can we translate efficiently and accurately ?

Decoding Find the best translation among all possible translations. � t ∗ = arg max f ( s , t ) = arg max λ i h i ( s , t ) t t i Scoring function f ( s , t ) Feature functions h i ( s , t ) Feature weights λ i Model error vs. search error Model error: The solution with the highest score under our models is not a good translation. Search error: The decoder cannot find the solution with the highest model score. Phrase-based SMT: Generative Model Bakom huset hittade polisen en stor mängd narkotika . Behind the house found police a large quantity of narcotics . Behind the house police found a large quantity of narcotics . 1 Phrase segmentation 2 Phrase translation 3 Output ordering

Phrase-based SMT: Generative Model Bakom huset hittade polisen en stor mängd narkotika . Behind the house found police a large quantity of narcotics . Behind the house police found a large quantity of narcotics . Behind the house the house police house police found police found a found a large Translation Options er geht ja nicht nach hause he is yes not after house it are is do not to home , it goes , of course does not according to chamber , he go , is not in at home it is not home he will be is not under house it goes does not return home he goes do not do not is to are following is after all not after does not to not is not are not is not a Illustrations by Philipp Koehn Decoding by Hypothesis Expansion er geht ja nicht nach hause yes he goes home are does not go home it to Illustrations by Philipp Koehn

Is it always possible to translate any sentence in this way? What would cause the process to break down so the decoder can’t find a translation that covers the whole input sentence? How could you make sure that this never happens? er geht ja nicht nach hause yes he goes home are does not go home it to Decoding complexity Naively, in a sentence of N words with T translation options for each phrase, we can have O ( 2 N ) phrase segmentations, O ( T N ) sets of phrase translations, and O ( N ! ) word reordering permutations. Exploiting Model Locality Bakom huset hittade polisen en stor mängd narkotika . Behind the house police found a big To score a new hypothesis, we need: the score of the previous hypothesis the translation model score the new language model scores

Hypothesis recombination The translation model only looks at the current phrase. The n -gram model only looks at a window of n words. The choices the decoder makes are independent of everything beyond this window! The decoder never reconsiders its choices once they’ve moved out of the n -gram history. Hypothesis recombination Suppose we have these hypotheses with the same coverage, and we use a trigram language model: After the house police Score = –12.5 Behind the house police Score = –11.2 , the house police Score = –22.0 We already know the winner! We can discard the competing hypotheses. Hypothesis recombination Hypothesis recombination combines branches in the search graph: It’s a form of dynamic programming. Recombination reduces the search space substantially. . . . . . it preserves search optimality. . . . . . but decoding is still exponential!

Pruning To make decoding really efficient, we expand only hypotheses that look promising. Bad hypotheses should be pruned early to avoid wasting time on them. Pruning compromises search optimality! Stack decoding goes does not he are it yes no word one word two words three words translated translated translated translated Illustrations by Philipp Koehn Stack decoding algorithm 1: AddToStack( s 0 , h 0 ) 2: for i = 0 . . . N − 1 do goes does not for all h ∈ s i do 3: he for all t ∈ T do 4: are if Applicable( h , t ) then 5: it yes h ′ ← Expand( h , t ) 6: no word one word two words three words translated translated translated translated j ← WordsCovered( h ) + WordsCovered( t ) 7: AddToStack( s j , h ′ ) ← pruning magic goes here 8: end if 9: end for 10: end for 11: 12: end for 13: return best hypothesis on stack s N

AddToStack( s , h ) 1: for all h ′ ∈ s do if Recombinable( h , h ′ ) then 2: add higher-scoring of h , h ′ to stack s , discard other 3: return 4: end if 5: 6: end for 7: add h to stack s 8: if stack too large then prune stack 9: 10: end if How to prune Histogram pruning Keep no more than S hypotheses per stack. Parameter: Stack size S Threshold pruning Discard hypotheses whose score is very low compared to that of the best hypothesis on the stack h ∗ : Score( h ) < η · Score( h ∗ ) Parameter: Beam size η Beam search: Complexity For each of the N words in the input sentence, expand S hypotheses by considering T translation options each: O ( S · N · T ) The number of translation options is linear in the sentence length: O ( S · N 2 )

Distortion limit When translating between closely related languages, most reorderings are local. . . . . . and anyhow, we haven’t got any reasonable models for long-range reordering! If we impose a limit on reordering, the number of translation options to consider at each step is bounded by a constant. Bakom huset hittade polisen en stor mängd narkotika . Behind the house police Distortion limit When translating between closely related languages, most reorderings are local. . . . . . and anyhow, we haven’t got any reasonable models for long-range reordering! If we impose a limit on reordering, the number of translation options to consider at each step is bounded by a constant. The number of hypotheses expanded by a beam search decoder with limited reordering is linear in the stack size and the input size: O ( S · N ) Incremental scoring and cherry picking 2 1 Bakom huset hittade polisen en stor mängd narkotika . Behind the house police found 4 3 0 Bakom huset hittade polisen en stor mängd narkotika . Behind the house police a big

Incremental scoring and cherry picking The path that looks cheapest necessarily incurs a much higher cost later. Pruning may discard better options before this is recognised. To make scores more comparable, we should take into account unavoidable future costs. Compare hypotheses based on current score + future score. Future cost estimation Calculating the future cost exactly would amount to full decoding! Cheaper approximations can be computed by making additional independence assumptions. Assume independence between models. Ignore LM history across phrase boundaries. the tourism initiative addresses this for the first time -1.0 -2.0 -1.5 -2.4 -1.4 -1.0 -1.0 -1.9 -1.6 -4.0 -2.5 -2.2 -1.3 -2.4 -2.7 -2.3 -2.3 -2.3 Illustrations by Philipp Koehn Stack Decoding and A ∗ Search Stack decoding is related to a standard search algorithm called A ∗ search. In A ∗ search, each partial hypothesis is evaluated with a score and a future cost estimate called heuristic . A heuristic is called admissible if it never underestimates the true future cost. A ∗ search with an admissible heuristic is optimal . The future cost estimate of stack decoding is not admissible.

DP Beam Search Decoding: Evaluation DP beam search is by far the most popular search algorithm for phrase-based SMT. It combines high speed with reasonable accuracy by exploiting the constraints of the standard models. It works well with very local models. Sentence-internal long-range dependencies increase search errors by inhibiting recombination. No cross-sentence dependencies on the target side. Current state of the art: Almost perfect local fluency, but serious problems with long-range reordering and discourse-level phenomena.

Decoding in Statistical Machine Translation Christian Hardmeier - PDF document

Decoding in Statistical Machine Translation Christian Hardmeier 2016-05-04 Mid-course Evaluation http://stp.lingfil.uu.se/~sara/kurser/MT16/ mid-course-eval.html Decoding The decoder is the part of the SMT system that creates the

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST)

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

The Formation of the First Stars Massimo Stiavelli STScI Baltimore (MD, USA) Plan of the

String coupling and interactions in type IIB matrix model arXiv:0812.3460[hep-th] Satoshi

UV Absorption in NGC 5548 Jerry Kriss STScI 8/17/2017 The Narrow Absorption Components in NGC

Chapter 2 Motion and Recombination of Electrons and Holes 2.1 Thermal Motion 3 1 = = 2

Towards Multi-objective Mixed Integer Evolution Strategies Koen van der Blom , Kaifeng Yang,

02/05/2014 2.1 Thermal Motion Chapter 2 Motion and Recombination of Electrons and Holes 2.1

Phrase-Based Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT

Vaccine Supply Update Advisory Committee on Immunization Practices Atlanta, GA June 24, 2020