4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 23, 2020

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction The brute force EM algorithm defined A formula for p ( a | o , s ) Examples brute force EM in action

4CSLL5 IBM Translation Models Brute force EM learning

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Outline Parameter learning (brute force) Introduction The brute force EM algorithm defined A formula for p ( a | o , s ) Examples brute force EM in action

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D )

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . )

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . ) ◮ but we don’t . . .

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . ) ◮ but we don’t . . . ◮ if we knew the parameters , it would be (relatively) easy to calculate the ’odds’ on alignments ie. P ( a 1 | o 1 , s 1 ) . . . P ( a D | o D , s D )

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . ) ◮ but we don’t . . . ◮ if we knew the parameters , it would be (relatively) easy to calculate the ’odds’ on alignments ie. P ( a 1 | o 1 , s 1 ) . . . P ( a D | o D , s D ) ◮ but we don’t . . .

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . ) ◮ but we don’t . . . ◮ if we knew the parameters , it would be (relatively) easy to calculate the ’odds’ on alignments ie. P ( a 1 | o 1 , s 1 ) . . . P ( a D | o D , s D ) ◮ but we don’t . . . ◮ something of a ’Chicken and Egg’ situation

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction Learning Lexical Translation Models ◮ We would like to estimate the lexical translation probabilities t ( o | s ) from a parallel corpus ( o 1 , s 1 ) . . . ( o D , s D ) ◮ this would be easy if we had the alignments ie. ( o 1 , a 1 , s 1 ) . . . ( o D , a D , s D ) (or just how frequent . . . ) ◮ but we don’t . . . ◮ if we knew the parameters , it would be (relatively) easy to calculate the ’odds’ on alignments ie. P ( a 1 | o 1 , s 1 ) . . . P ( a D | o D , s D ) ◮ but we don’t . . . ◮ something of a ’Chicken and Egg’ situation ◮ but the EM algorithm embraces this exactly

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction EM Algorithm roughly Expectation Maximization (EM) in a nutshell 1. initialize model parameters (e.g. uniform) 2. assign probabilities to the missing data 3. treat probabilities like counts in complete data and estimate model parameters from the pseudo-completed data 4. iterate steps 2–3 until convergence

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction The EM algorithm keeps re -estimating the parameters. The following slides show in a graphical fashion the evolution of the parameters when the process is applied to the corpus s 1 s 2 s 3 la maison la maison bleu la fleur o 1 o 2 o 3 the house the blue house the flower and with all tr ( o | s ) values initially equal

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction initial la ma la ma la fle ble the ho the blu ho the flo

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction after one la ma la ma la fle ble the ho the blu ho the flo

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction after two la ma la ma la fle ble the ho the blu ho the flo

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction after four la ma la ma la fle ble the ho the blu ho the flo

4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction after ten la ma la ma la fle ble the ho the blu ho the flo

4CSLL5 IBM Translation Models Parameter learning (brute force) The brute force EM algorithm defined Outline Parameter learning (brute force) Introduction The brute force EM algorithm defined A formula for p ( a | o , s ) Examples brute force EM in action

4CSLL5 IBM Translation Models Parameter learning (brute force) The brute force EM algorithm defined

4CSLL5 IBM Translation Models Parameter learning (brute force) The brute force EM algorithm defined ◮ to arrive at the EM algorithm for this case its a good idea to first spell out explicitly what the counting and parameter-estimation would look like if you had the alignments

4CSLL5 IBM Translation Models Parameter learning (brute force) The brute force EM algorithm defined ◮ to arrive at the EM algorithm for this case its a good idea to first spell out explicitly what the counting and parameter-estimation would look like if you had the alignments ◮ then migrate that into the EM version replacing anything which assume a definite alignment with lines which consider all possible alignments, treating each has having a ’count’ of p ( a | o , s )

4CSLL5 IBM Translation Models Parameter learning (brute force) The brute force EM algorithm defined ◮ to arrive at the EM algorithm for this case its a good idea to first spell out explicitly what the counting and parameter-estimation would look like if you had the alignments ◮ then migrate that into the EM version replacing anything which assume a definite alignment with lines which consider all possible alignments, treating each has having a ’count’ of p ( a | o , s ) ◮ next 2 slides do exactly this

Estimating translation probs tr ( o | s ) from complete data Suppose you have a corpus of D pairs of sentence, and each has an alignment a . From this we can estimate the values of tr ( o | s ) for the model in a straightforward way 1 COUNT 1 If we wanted to be really thorough we could set up the differential equations which define the parameters which will maximise the likelihood of the data under the model and show that solving them for tr ( o | s ) parameters amounts to the counting procedure shown

Estimating translation probs tr ( o | s ) from complete data Suppose you have a corpus of D pairs of sentence, and each has an alignment a . From this we can estimate the values of tr ( o | s ) for the model in a straightforward way 1 COUNT for each o ∈ V o for each s ∈ V s ∪ { NULL } set #( o , s ) = 0 1 If we wanted to be really thorough we could set up the differential equations which define the parameters which will maximise the likelihood of the data under the model and show that solving them for tr ( o | s ) parameters amounts to the counting procedure shown

Estimating translation probs tr ( o | s ) from complete data Suppose you have a corpus of D pairs of sentence, and each has an alignment a . From this we can estimate the values of tr ( o | s ) for the model in a straightforward way 1 COUNT for each o ∈ V o for each s ∈ V s ∪ { NULL } set #( o , s ) = 0 for each aligned pair ( o , a , s ) // just counting freqs of (o,s) for each j ∈ 1 : ℓ o // word-pairs in the data #( o j , s a ( j ) ) += 1 1 If we wanted to be really thorough we could set up the differential equations which define the parameters which will maximise the likelihood of the data under the model and show that solving them for tr ( o | s ) parameters amounts to the counting procedure shown

4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 - PowerPoint PPT Presentation

4CSLL5 IBM Translation Models 4CSLL5 IBM Translation Models Martin Emms October 23, 2020 4CSLL5 IBM Translation Models Parameter learning (brute force) Introduction The brute force EM algorithm defined A formula for p ( a | o , s ) Examples

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

4CSLL5 IBM Translation Models Martin Emms October 29, 2020 4CSLL5 IBM Translation Models

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Problem solved: IBM Notes Replacement 2 IBM Notes Replacement Migrating from IBM Notes to

IBM i Its been a challenge to determine how to distill the essence of IBM i. Since IBM i is

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

Computer Aided Translation Philipp Koehn 15 November 2018 Philipp Koehn Machine Translation:

Global Translation Services Website translation using post-edited machine translation and

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Last class: Synchronization Problems and Primitives Today: Synchonization Solutions

Lecture 18 Jeffrey H. Shapiro Optical and Quantum Communications Group www.rle.mit.edu/qoptics

Introduction Today we begin a two-lecture treatment of semiclassical versus quantum photode-

EMLYON 07 April 2016 FILTERING WITH MULTIVARIATE COUNTING PROCESSES AND AN APPLICATION TO CREDIT

Dynamic Egocentric Models for Citation Networks Duy Vu Arthur Asuncion David Hunter Padhraic

Stability, convergence to equilibrium and simulation of non-linear Hawkes Processes with memory

Randomized Algorithms Lecture 6: Coupon Collectors problem Sotiris Nikoletseas

Foundations of Computing II Lecture 12: Multiple Random Variables, Linearity of Expectation.