Machine Translation Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 February 28, 2017 Based on slides from Jason Eisenstein, Chris Dyer, Alan Ritter, Yejin Choi, and everyone else they copied from.
Upcoming… • Paper summaries: February 28 , March 14 Summaries Summary 1 graded • Status report due in 1 weeks: March 7, 2017 • Project Instructions coming today! • Almost final report, only 5 pages • Homework 4 is due on March 13 • Homework Write-up and data releasing soon. • CS 295: STATISTICAL NLP (WINTER 2017) 2
Outline Machine Translation Introduction to Statistical MT IBM Translation Models CS 295: STATISTICAL NLP (WINTER 2017) 3
Outline Machine Translation Introduction to Statistical MT IBM Translation Models CS 295: STATISTICAL NLP (WINTER 2017) 4
Machine Translation Yo, que me figuraba el Paraíso / Bajo la especie de una biblioteca. I have always imagined Paradise as a kind of library. CS 295: STATISTICAL NLP (WINTER 2017) 5
Challenges: Word Order English : IBM bought Lotus SVO vs SOV Japanese : IBM Lotus bought English : I will buy it Even for SVO French : Je vais l’acheter (I will it buy) English : I bought it French : Je l’ai achet´ e (I it have bought) CS 295: STATISTICAL NLP (WINTER 2017) 6
Challenges: Lexical Ambiguity pico bill cuenta CS 295: STATISTICAL NLP (WINTER 2017) 7
Challenges: Pronouns English possessive pronouns take the gender of the owner: Marie rides her bike Different French possessive pronouns take the gender of the object: Pronouns Marie monte sur son vélo In Spanish, you can recover the pronoun from verb inflection: Viv imos en Atlanta → We live in Atlanta Dropping I Again, discourse context is often crucial: Pronouns Viv e en Atlanta → She/he/it lives in Atlanta CS 295: STATISTICAL NLP (WINTER 2017) 8
Challenges: Tenses The preterite tense is for events with a definite time, e.g. I biked to work this morning The imperfect is for events with indefinite times, e.g. I biked to work all last summer To translate English to Spanish, we must pick the right tense. CS 295: STATISTICAL NLP (WINTER 2017) 9
Challenges: Idioms As Cool As a Cucumber Why in the world Blue in the Face Lend me your ears Hold Your Horses Dead As A Doornail Kick the bucket Head In The Clouds Bob's Your Uncle Storm in a Teacup CS 295: STATISTICAL NLP (WINTER 2017) 10
Rules for Machine Translation Rules for translating much or many into Russian: if preceding word is how return skol’ko else if preceding word is as return stol’ko zhe else if word is much if preceding word is very return nil else if following word is a noun return mnogo else (word is many) if preceding word is a preposition and following word is noun return mnogii else return mnogo CS 295: STATISTICAL NLP (WINTER 2017) 11 Panov (1960)
The Vauquios Triangle CS 295: STATISTICAL NLP (WINTER 2017) 12
Outline Machine Translation Introduction to Statistical MT IBM Translation Models CS 295: STATISTICAL NLP (WINTER 2017) 13
Statistical Machine Translation CS 295: STATISTICAL NLP (WINTER 2017) 14
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 15
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 16
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 17
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 18
The Rosetta Stone CS 295: STATISTICAL NLP (WINTER 2017) 19
Warren Weaver (1949) CS 295: STATISTICAL NLP (WINTER 2017) 20
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 21
Parallel Corpus: Examples CS 295: STATISTICAL NLP (WINTER 2017) 22
Noisy Channel Model “Noisy Decoder Channel” CS 295: STATISTICAL NLP (WINTER 2017) 23
Noisy Channel Model “Noisy Decoder Channel” CS 295: STATISTICAL NLP (WINTER 2017) 24
Example: Noisy Channel CS 295: STATISTICAL NLP (WINTER 2017) 25
Example: Noisy Channel CS 295: STATISTICAL NLP (WINTER 2017) 26
Components of an MT system Language Model Translation Model Decoding Algo CS 295: STATISTICAL NLP (WINTER 2017) 27
Components of an MT system CS 295: STATISTICAL NLP (WINTER 2017) 28
Evaluating MT CS 295: STATISTICAL NLP (WINTER 2017) 29
Human Evaluation Fluency Adequacy A : furious nAgA on wednesday , the tribal minimum pur of ten schools also was burnt B : furious nAgA on wednesday the tribal pur mini ten schools of them was also burnt CS 295: STATISTICAL NLP (WINTER 2017) 30
Automated Evaluation Fluency Adequacy CS 295: STATISTICAL NLP (WINTER 2017) 31
BLEU Score CS 295: STATISTICAL NLP (WINTER 2017) 32
BLEU Score: Example ‘ extension of isi in uttar pradesh ’ ‘ isi ’s expansion in uttar pradesh ’ ‘ the spread of isi in uttar pradesh ’ ‘ isi spreading in uttar pradesh ’ the spread of isi in uttar pradesh CS 295: STATISTICAL NLP (WINTER 2017) 33
BLEU Score: Example ‘ extension of isi in uttar pradesh ’ ‘ isi ’s expansion in uttar pradesh ’ ‘ the spread of isi in uttar pradesh ’ ‘ isi spreading in uttar pradesh ’ the spread of isi in uttar pradesh CS 295: STATISTICAL NLP (WINTER 2017) 34
BLEU’s not bad… CS 295: STATISTICAL NLP (WINTER 2017) 35 G. Doddington, NIST
Outline Machine Translation Introduction to Statistical MT IBM Translation Models CS 295: STATISTICAL NLP (WINTER 2017) 36
Statistical Translation Model And the program was implemented La programmation a été mise en application CS 295: STATISTICAL NLP (WINTER 2017) 37
Word Alignment: Direct CS 295: STATISTICAL NLP (WINTER 2017) 38
Word Alignment: 1-to-Many CS 295: STATISTICAL NLP (WINTER 2017) 39
Word Alignment: Reordering CS 295: STATISTICAL NLP (WINTER 2017) 40
Word Alignment: Inserting CS 295: STATISTICAL NLP (WINTER 2017) 41
Word Alignment: Dropping CS 295: STATISTICAL NLP (WINTER 2017) 42
Translating with Alignments CS 295: STATISTICAL NLP (WINTER 2017) 43
Example: Translation Prob CS 295: STATISTICAL NLP (WINTER 2017) 44
IBM Models Model 1 Model 2 Model 3/4/5 CS 295: STATISTICAL NLP (WINTER 2017) 45
Word Alignment Algorithm CS 295: STATISTICAL NLP (WINTER 2017) 46
Recommend
More recommend