4CSLL5 IBM Translation Models IBM models Probabilities and - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms IBM Model 1 definitions October 22, 2020 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models Probabilities and Translation Lexical Translation IBM models intro ◮ How to translate a word → look up in dictionary Haus — house, building, home, household, shell. ◮ Multiple translations ◮ some more frequent than others ◮ for instance: house , and building most common ◮ special cases: Haus of a snail is its shell

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Probabilities and Translation Probabilities and Translation Collect Statistics Estimation of Translation Probabilities ◮ Suppose a parallel corpus, with German sentences paired with English sentences, and suppose people inspect this marking how Haus is translated. . ◮ from this could use relative frequencies as estimate of translation . . probabilities t ( e | Haus ) das Haus ist klein the house is small ◮ technically this is a maximum likelihood estimate – there could be others . . . ◮ outcome would be ◮ Hypothetical table of frequencies  0 . 8 if e = house ,    0 . 16 if e = building ,     tr ( e | Haus ) = 0 . 02 if e = home , Translation of Haus Count  0 . 015 if e = household , house 8,000     1,600 building  0 . 005 if e = shell .  home 200 household 150 50 shell 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Probabilities and Translation Probabilities and Translation IBM models Notation ◮ the so-called IBM models seek a probabilistic model of translation one of whose ingredients is this kind of lexical translation probability. ◮ For reasons that will become apparent, we will use ◮ there’s a sequence of models of increasing complexity (models 1-5). The O for the language we want to translate from simplest models pretty much just use lexical translation probability S for the language we want to translate to ◮ parallel corpora are used (eg. pairing German sentences with English ◮ o is a single sentence from O , and is a sequence ( o 1 . . . o j . . . o ℓ o ); ℓ o is sentences) but crucially there is no human inspection to find how given length o German words are translated to English words , ie. info is of form ◮ s is a single sentence from S , and is a sequence ( s 1 . . . s i . . . s ℓ s ); ℓ s is . . length o . das Haus ist klein the house is small ◮ the set of all possible words of language O is V o . . ◮ the set of all possible words of language S is V s . ◮ comments on notation in Koehn, J&M ◮ though originally developed as models of translation, these models are now used as models of alignment, providing crucial training input for so-called ’phrase-based SMT’

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Probabilities and Translation Probabilities and Translation The sparsity problem The Noisy-Channel formulation ◮ recalling Bayesian classification, finding s from o : P ( s , o ) ◮ Suppose for two languages you have large sentence-aligned corpus d . Say arg max P ( s | o ) = arg max (1) P ( o ) s s the two languages are O and S . = arg max P ( s , o ) (2) ◮ in principle for any sentence o ∈ O could work out the probabilities of its s various translations s by relative frequency = arg max P ( o | s ) × P ( s ) (3) s count ( � o , s � ∈ d ) p ( s | o ) = � s ′ count ( � o , s ′ � ∈ d ) ◮ can then try to factorise P ( o | s ) and P ( s ) into clever combination of other probability distributions (not sparse, learnable, allowing solution of ◮ but even in very large corpora the vast majority of possible o and s occur arg-max problem). IBM models 1-5 can be used for P ( o | s ); P ( s ) is the zero times. So this method gives uselessly bad estimates. topic of so-called ’language models’. ◮ The reason for the notation s and o is that (3) is the defining equation of Shannons ’noisy-channel’ formulation of decoding, where an original ’source’ s has to be recovered from a noisy observed signal o , the noisiness defined by P ( o | s ) 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Probabilities and Translation Alignments Alignments (informally) ◮ When s and o are translations of each other, usually can say which pieces of s and o are translations of each other. eg. Now have to start look at the details of the IBM models of P ( o | s ), starting with the very simplest models 1 2 3 4 1 2 3 4 das Haus ist klein das Haus ist klitzeklein What all the models have in common is that they define P ( o | s ) as a combination of other probability distributions the house is small the house is very small 1 2 3 4 1 2 3 4 5 ◮ In SMT such a piece-wise correspondence is called an alignment ◮ warning: there are quite a lot of varying formal definitions of alignment

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Alignments Alignments Hidden Alignment IBM Alignments ◮ key feature of the IBM models is to assume there is a hidden alignment, a ◮ Define alignment with a function, between o and s from posn j in o to posn. i in s ◮ so a pair � o , s � from a sentence-aligned corpus is seen as a partial version so a : j → i of the fully observed case: ◮ the picture � o , a , s � 1 2 3 4 das Haus ist klein ◮ A model is essentially made of p ( o , a | s ), and having this allows other things to be defined the house is small ◮ best translation: 1 2 3 4 � arg max P ( s , o ) = arg max ([ p ( o , a | s )] × p ( s )) represents s s a ◮ best alignment: a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 } arg max [ p ( o , a | s )] a 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Alignments Alignments Some weirdness about directions Comparison to ’edit distance’ alignments 1 2 3 4 a : 1 → 1 , das Haus ist klein in case you have ever studied ’edit distance’ alignments . . . 2 → 2 , 3 → 3 , ◮ like edit-dist alignments, its a function: 4 → 4 so can’t align 1 o words with 2 s words the house is small ◮ like edit-dist alignments, some s words can be unmapped to 1 2 3 4 (cf. insertions) ◮ Note here o is English, and s is German ◮ like edit-dist alignments, some o words can be mapped to nothing ◮ the alignment goes up the page, English-to-German, (cf. deletions) ◮ unlike edit-dist alignments, order not preserved: so j < j ′ �→ a ( j ) < a ( j ′ ) ◮ they will be used though in a model of P ( o | s ), so down the page, German-to-English

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Alignments Alignments N-to-1 Alignment (ie. 1-to-N Translation) Reordering 1 2 3 4 klein ist das Haus 1 2 3 4 das Haus ist klitzeklein the house is small the house is very small 1 2 3 4 1 2 3 4 5 ◮ a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 4 , 5 → 4 } ◮ N words of o can be aligned to 1 word of s ◮ a : { 1 → 3 , 2 → 4 , 3 → 2 , 4 → 1 } (needed when 1 word of s translates into N words of o ) ◮ alignment does not preserve o word order (needed when s words reordered during translation) 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models IBM models Alignments Alignments s words not mapped to (ie. dropped in translation) o words mapped to nothing (ie. inserting in translation) 0 1 2 3 4 5 NULL ich gehe nicht zum haus 1 2 3 4 5 das Haus ist ja klein I do not go to the house the house is small 1 2 3 4 5 6 7 1 2 3 4 ◮ a : { 1 → 1 , 2 → 0 , 3 → 3 , 4 → 2 , 5 → 4 , 6 → 4 , 7 → 5 } ◮ a : { 1 → 1 , 2 → 2 , 3 → 3 , 4 → 5 } ◮ some o word are mapped to nothing by the alignment ◮ some s words are not mapped-to by the alignment (needed when o words have no clear origin during translation) (needed when s words are dropped during translation The is no clear origin in German of the English ’do’ (here the German flavouring particle ’ja’ is dropped) formally represented by alignment to special null token

4CSLL5 IBM Translation Models IBM models Probabilities and - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms IBM Model 1 definitions October 22, 2020 4CSLL5 IBM Translation Models 4CSLL5 IBM

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Empirical Methods in Natural Language Processing Lecture 15 Machine translation (II): Word-based

Whats New Since Nice 2013 Whats New Since Nice 2013 in Pediatric PH? in Pediatric PH?

Hypoxia-Dependent Epigenetic Modifications in the Pulmonary I have no financial disclosures or

CSC2547: Learning to Search Intro Lecture Sept 13, 2019 This week Course structure

QCD and EW NLO corrections with NLOX Effects in bg Zb Christian Reuschle CREUSCHLE @ HEP . FSU

Intro to SMT Sara Stymne 2019-09-09 Partly based on slides by J org Tiedemann and Fabienne

Modules and Programs 1 / 14 Python Programs Python code organized in modules, packages,

rian sanderson sensor platforms, inc. you have an android board, you have a sensor board, you