Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Recap:   IBM models for MT CS447: Natural Language Processing (J. Hockenmaier) � 2

        The IBM models Use the noisy channel (Bayes rule) to get the best (most likely) target translation e for source sentence f :   noisy channel arg max P ( e | f ) = arg max P ( f | e ) P ( e ) e e The translation model P ( f | e ) requires alignments a   marginalize (=sum)   � P ( f | e ) = P ( f , a | e ) over all alignments a a ∈ A ( e , f ) � Generate f and the alignment a with P ( f , a | e ) :   ∈ A m ⇥ P ( f , a | e ) = P ( m | e ) P ( a j | a 1 ..j − 1 , f 1 ..j − 1 , m, e ) P ( f j | a 1 ..j f 1 ..j − 1 , e , m ) ⇧ ⌅⇤ ⌃ ⇧ ⌅⇤ ⌃ ⇧ ⌅⇤ ⌃ j =1 Length: | f | =m Word alignment a j Translation f j probability   probability of   m = #words   alignment a j of word f j in f j � 3 CS447 Natural Language Processing

Representing word alignments 1 2 3 4 5 6 7 8 Marie a traversé le lac à la nage 0 NULL 1 Mary 2 swam 3 across 4 the 5 lake Position 1 2 3 4 5 6 7 8 Foreign Marie a traversé le lac à la nage Alignment 1 3 3 4 5 0 0 2 Every source word f[i] is aligned to one target word e[j] (incl. NULL).   We represent alignments as a vector a (of the same length as the source) with a[i] = j � 4 CS447 Natural Language Processing

          IBM model 1: Generative process For each target sentence e = e 1 ..e n of length n :   0 0 1 1 2 2 3 3 4 4 5 5 NULL NULL Mary Mary swam swam across across the the lake lake 1. Choose a length m for the source sentence (e.g m = 8 ) Position 1 2 3 4 5 6 7 8 Alignment 1 3 3 4 5 0 0 2 2. Choose an alignment a = a 1 ... a m for the source sentence Each a j corresponds to a word e i in e : 0 ≤ a j ≤ n   Position 1 2 3 4 5 6 7 8 Alignment 1 3 3 4 5 0 0 2 3. Translate each target word e a j into the source language Position 1 2 3 4 5 6 7 8 Alignment 1 3 3 4 5 0 0 2 Translation Marie a traversé le lac à la nage � 5 CS447 Natural Language Processing

Expectation-Maximization (EM) 1. Initialize a first model, M 0   2. Expectation (E) step:   Go through training data to gather expected counts 〈 count( lac , lake ) 〉 3. Maximization (M) step:   Use expected counts to compute a new model M i+1 P i+1 ( lac | lake ) = 〈 count( lac , lake ) 〉 ⁄ 〈 ∑ w count( w , lake ) 〉 4.Check for convergence:   Compute log-likelihood of training data with M i+1   If the difference between new and old log-likelihood smaller than a threshold, stop. Else go to 2. � 6 CS447 Natural Language Processing

Phrase-based translation models CS447: Natural Language Processing (J. Hockenmaier) � 8

      Phrase-based translation models Assumption: fundamental units of translation are phrases :   主席：各位議員，早晨。   President (in Cantonese): Good morning, Honourable Members .   Phrase-based model of P ( F | E ): 1. Split target sentence deterministically into phrases ep 1 ...ep n 2. Translate each target phrase ep i into source phrase fp i   with translation probability φ ( fp i |ep i ) 3. Reorder foreign phrases with distortion probability   d ( a i -b i-1 ) = c |a i -b i-1 -1| a i = start position of source phrase generated by e i b i-1 = end position of source phrase generated by e i-1 � 9 CS447: Natural Language Processing (J. Hockenmaier)

  Phrase-based models of P ( f | e ) Split target sentence e =e 1..n into phrases ep 1 .. ep N :   [ The green witch ] [ is ] [ at home ] [ this week ]   Translate each target phrase ep i into source phrase fp i with translation probability P ( fp i | ep i ):   [ The green witch ] = [ die grüne Hexe] , ...   Arrange the set of source phrases { fp i } to get s   with distortion probability P ( fp |{ fp i }) :   [ Diese Woche ] [ ist ] [ die grüne Hexe ] [ z uhause ]   � P ( f | e = ⇤ ep 1 , ..., ep l ) = P ( fp i | ep i ) P ( fp |{ fp i } ) i � 10 CS447: Natural Language Processing (J. Hockenmaier)

                Translation probability P ( fp i | ep i ) Phrase translation probabilities can be obtained from a phrase table:   EP FP count green witch grüne Hexe … at home zuhause 10534 at home daheim 9890 is ist 598012 this week diese Woche …. This requires phrase alignment � 11 CS447: Natural Language Processing (J. Hockenmaier)

Word alignment Diese Woche ist die grüne Hexe zuhause The green witch is at home this week � 12 CS447: Natural Language Processing (J. Hockenmaier)

Phrase alignment Diese Woche ist die grüne Hexe zuhause The green witch is at home this week � 13 CS447: Natural Language Processing (J. Hockenmaier)

Obtaining phrase alignments We’ll skip over details, but here’s the basic idea:   For a given parallel corpus (F-E) 1. Train two word aligners , (F → E and E → F) 2. Take the intersection of these alignments   to get a high-precision word alignment 3. Grow these high-precision alignments   until all words in both sentences are included   in the alignment. Consider any pair of words in the union of the alignments, and incrementally add them to the existing alignments 4. Consider all phrases that are consistent with   this improved word alignment � 14 CS447: Natural Language Processing (J. Hockenmaier)

Decoding   (for phrase-based MT) CS447: Natural Language Processing (J. Hockenmaier) � 15

  Phrase-based models of P ( f | e ) Split target sentence e =e 1..n into phrases ep 1 .. ep N :   [ The green witch ] [ is ] [ at home ] [ this week ]   Translate each target phrase ep i into source phrase fp i with translation probability P ( fp i | ep i ):   [ The green witch ] = [ die grüne Hexe] , ...   Arrange the set of source phrases { fp i } to get s   with distortion probability P ( fp |{ fp i }) :   [ Diese Woche ] [ ist ] [ die grüne Hexe ] [ zuhause ]   � P ( f | e = ⇤ ep 1 , ..., ep l ) = P ( fp i | ep i ) P ( fp |{ fp i } ) i � 16 CS447: Natural Language Processing (J. Hockenmaier)

Translating How do we translate a foreign sentence (e.g. “Diese Woche ist die grüne Hexe zuhause” ) into English? - We need to find ê = argmaxe P ( f | e ) P ( e ) - There is an exponential number of candidate translations e - But we can look up phrase translations ep and   P ( fp | ep ) in the phrase table:   diese Woche ist die grüne Hexe z uhause this 0.2 week 0.7 is 0.8 the 0.3 green 0.3 witch 0.5 at home 1.00.5 these 0.5 the green 0.4 sorceress 0.6 this week 0.6 green witch 0.7 is this week 0.4 the green witch 0.7 � 17 CS447: Natural Language Processing (J. Hockenmaier)

Generating a (random) translation 1. Pick the first Target phrase ep 1 from the candidate list. P := P LM ( <s> ep 1 ) P Trans ( fp 1 | ep 1 ) E = the, F= <….die…> 2. Pick the next target phrase ep 2 from the candidate list P := P × P LM ( ep 2 | ep 1 ) P Trans ( fp 2 | ep 2 ) E = the green witch, F = <….die grüne Hexe...> 3. Keep going: pick target phrases ep i until the entire source sentence is translated P := P × P LM ( ep i | ep 1…i-1 ) P Trans ( fp i | ep i ) E = the green witch is, F = <….ist die grüne Hexe...> diese Woche ist die grüne Hexe z uhause 1 3 5 this 0.2 week 0.7 is 0.8 the 0.3 green 0.3 witch 0.5 at home 0.5 these 0.5 the green 0.4 sorceress 0.6 2 4 this week 0.6   green witch 0.7 is this week 0.4 the green witch 0.7 � 18 CS447: Natural Language Processing (J. Hockenmaier)

Finding the best translation How can we find the best translation efficiently? There is an exponential number of possible translations.   We will use a heuristic search algorithm We cannot guarantee to find the best (= highest-scoring) translation, but we’re likely to get close. We will use a “stack-based” decoder (If you’ve taken Intro to AI: this is A* (“A-star”) search) We will score partial translations based on how good we expect the corresponding completed translation to be. Or, rather: we will score partial translations on how bad we expect the corresponding complete translation to be.   That is, our scores will be costs (high=bad, low=good) � 19 CS447: Natural Language Processing (J. Hockenmaier)

Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Recap: IBM models for MT CS447: Natural Language Processing (J.

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Translation Model Parallel corpus source target translation e f phrase phrase features

From nano to PICO: the next generation of aberration corrected TEMs Joachim Mayer RWTH Aachen

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

Towards Optimal Constructions of Towards Optimal Constructions of Dynamically Corrected Quantum

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Los Angeles County Department of Mental Health Request for Statement of Qualifications (RFSQ)

Legislative Audit Commission Jane Stricklin, Executive Director strickln@ilga.gov 622 Stratton

Introduction (1) Pau Arumi parumi@iua.upf.es CLAM: C++ Library for Audio and Music A

[Non]-deterministic dynamics in cells: From multistabilility to stochastic switching dm

LiDs Seminar October 10 2018 Amanda Pagan, Principal Fellow for Inclusive Communities &

Location Based Web Search on GSM/GPRS Mobile Devices VVS. Nare sh, Prasad Ping ali, Vasude va

Pushback vs Censorship Fighting Internet Shutdowns in Africa BEFORE WE DIVE INTO THE SUBJECT,

Game semantics in string diagrams (work in progress) Paul-Andr Mellis CNRS, Universit

Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 23: Phrase-based MT (corrected) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Recap: IBM models for MT CS447: Natural Language Processing (J.

Corrected network measures Introduction Overlap weight Corrected Vladimir Batagelj overlap

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

Phrase-Based Models Philipp Koehn 15 September 2020 Philipp Koehn Machine Translation:

4CSLL5 Advanced Computational Linguistics Introduction Phrase Based Machine Trans Martin

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Translation Model Parallel corpus source target translation e f phrase phrase features

From nano to PICO: the next generation of aberration corrected TEMs Joachim Mayer RWTH Aachen

Oregon Bias-Corrected Climate Modeling Methodologies Wednesday, December 4 th , 2019 ASCE

Towards Optimal Constructions of Towards Optimal Constructions of Dynamically Corrected Quantum

Chapter 5 Phrase-based models Statistical Machine Translation Motivation Word-Based Models

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &amp;

CSE 517 Natural Language Processing Winter 2015 Phrase Based Translation Yejin Choi Slides

Los Angeles County Department of Mental Health Request for Statement of Qualifications (RFSQ)

Legislative Audit Commission Jane Stricklin, Executive Director strickln@ilga.gov 622 Stratton

Introduction (1) Pau Arumi parumi@iua.upf.es CLAM: C++ Library for Audio and Music A

[Non]-deterministic dynamics in cells: From multistabilility to stochastic switching dm

LiDs Seminar October 10 2018 Amanda Pagan, Principal Fellow for Inclusive Communities &amp;

Location Based Web Search on GSM/GPRS Mobile Devices VVS. Nare sh, Prasad Ping ali, Vasude va

Pushback vs Censorship Fighting Internet Shutdowns in Africa BEFORE WE DIVE INTO THE SUBJECT,

Game semantics in string diagrams (work in progress) Paul-Andr Mellis CNRS, Universit

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

Efficient solutions for word reordering in German-English phrase-based SMT Arianna Bisazza &

LiDs Seminar October 10 2018 Amanda Pagan, Principal Fellow for Inclusive Communities &