Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein - PDF document

Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein – UC Berkeley Machine Translation: Examples 1

Levels of Transfer World-Level MT: Examples � la politique de la haine . (Foreign Original) � politics of hate . (Reference Translation) � the policy of the hatred . (IBM4+N-grams+Stack) � nous avons signé le protocole . (Foreign Original) � we did sign the memorandum of agreement . (Reference Translation) � we have signed the protocol . (IBM4+N-grams+Stack) � où était le plan solide ? (Foreign Original) � but where was the solid plan ? (Reference Translation) � where was the economic base ? (IBM4+N-grams+Stack) 2

Phrasal / Syntactic MT: Examples MT: Evaluation � Human evaluations: subject measures, fluency/adequacy � Automatic measures: n-gram match to references � NIST measure: n-gram recall (worked poorly) � BLEU: n-gram precision (no one really likes it, but everyone uses it) � BLEU: � P1 = unigram precision � P2, P3, P4 = bi-, tri-, 4-gram precision � Weighted geometric mean of P1-4 � Brevity penalty (why?) � Somewhat hard to game… 3

Automatic Metrics Work (?) Corpus-Based MT Modeling correspondences between languages Sentence-aligned parallel corpus: Yo lo haré mañana Hasta pronto Hasta pronto I will do it tomorrow See you soon See you around Machine translation system: Model of Yo lo haré pronto I will do it soon translation I will do it around See you tomorrow 4

Phrase-Based Systems cat ||| chat ||| 0.9 the cat ||| le chat ||| 0.8 dog ||| chien ||| 0.8 house ||| maison ||| 0.6 my house ||| ma maison ||| 0.9 language ||| langue ||| 0.9 … Phrase table Sentence-aligned Word alignments (translation model) corpus Many slides and examples from Philipp Koehn or John DeNero 5

Phrase-Based Decoding 这 7 人中包括来自法国和俄罗斯的宇航员 . Decoder design is important: [Koehn et al. 03] The Pharaoh “Model” [Koehn et al, 2003] Segmentation Translation Distortion 6

The Pharaoh “Model” Where do we get these counts? Phrase Weights 7

Phrase-Based Decoding Monotonic Word Translation � Cost is LM * TM � It’s an HMM? […. slap to, 6] � P(e|e -1 ,e -2 ) 0.00000016 � P(f|e) […. a slap, 5] � State includes 0.00001 � Exposed English […. slap by, 6] 0.00000001 � Position in foreign � Dynamic program loop? for (fPosition in 1…|f|) for (eContext in allEContexts) for (eOption in translations[fPosition]) score = scores[fPosition-1][eContext] * LM(eContext+eOption) * TM(eOption, fWord[fPosition]) scores[fPosition][eContext[2]+eOption] = max score 8

Beam Decoding � For real MT models, this kind of dynamic program is a disaster (why?) � Standard solution is beam search: for each position, keep track of only the best k hypotheses for (fPosition in 1…|f|) for (eContext in bestEContexts[fPosition]) for (eOption in translations[fPosition]) score = scores[fPosition-1][eContext] * LM(eContext+eOption) * TM(eOption, fWord[fPosition]) bestEContexts.maybeAdd(eContext[2]+eOption, score) � Still pretty slow… why? � Useful trick: cube pruning (Chiang 2005) Example from David Chiang Phrase Translation � If monotonic, almost an HMM; technically a semi-HMM for (fPosition in 1…|f|) for (lastPosition < fPosition) for (eContext in eContexts) for (eOption in translations[fPosition]) … combine hypothesis for (lastPosition ending in eContext) with eOption � If distortion… now what? 9

Non-Monotonic Phrasal MT Pruning: Beams + Forward Costs � Problem: easy partial analyses are cheaper � Solution 1: use beams per foreign subset � Solution 2: estimate forward costs (A*-like) 10

The Pharaoh Decoder Hypotheis Lattices 11

Word Alignment Word Alignment �� 12

Unsupervised Word Alignment Input: a bitext : pairs of translated sentences � nous acceptons votre opinion . we accept your view . Output: alignments : pairs of � translated words � When words have unique sources, can represent as a (forward) alignment function a from French to English positions 1-to-Many Alignments 13

Many-to-Many Alignments IBM Model 1 (Brown 93) � Alignments: a hidden vector called an alignment specifies which English source is responsible for each French target word. 14

IBM Models 1/2 1 2 3 4 5 6 7 8 9 E : Thank you , I shall do so gladly . A : 1 3 7 6 8 8 8 8 9 F : Gracias , lo haré de muy buen grado . Model Parameters Emissions: P( F 1 = Gracias | E A1 = Thank ) Transitions : P( A 2 = 3) 15

Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein - PDF document

Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein UC Berkeley Machine Translation: Examples 1 Levels of Transfer World-Level MT: Examples la politique de la haine . (Foreign Original) politics of hate .

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Circular dichroism and other spectroscopies Lecture 8 EMBO Global Exchange Lecture Course

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 [1. This implies that there were

An Improved Approach for Glycan Structure Identification from HCD MS/MS Spectra Weiping Sun, Yi

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: & Previous

V. Water Vapour in Air V. Water Vapour in Air So far we have indicated the presence of water

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Applications of Lattices in Telecommunications Amin Sakzad Dept of Electrical and Computer

Maximum-Likelihood Estimation The EM algorithm based on a presentation by Dan Klein We have

Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein - PDF document

Statistical NLP Spring 2011 Lecture 7: Phrase-Based MT Dan Klein UC Berkeley Machine Translation: Examples 1 Levels of Transfer World-Level MT: Examples la politique de la haine . (Foreign Original) politics of hate .

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

NLP: Two pictures Wordnet and Word Sense Problem NLP Disambiguation Semantics NLP Trinity

Recurrent Neural Networks Graham Neubig Site https://phontron.com/class/nn4nlp2017/ NLP and

Ontologies for NLP NLP for Ontologies FOIS 2014 - LogOnto Workshop on Logics and Ontologies for

Statistical Significance Tests in NLP Natural Language Processing VU (706.230) - Andi Rexha

Maximum Entropy Model (I) LING 572 Advanced Statistical Methods for NLP January 28, 2020 1

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Facing NLP German Rigau i Claramunt http://adimen.si.ehu.es/~rigau IXA group Departamento de

IXA pipes: Efficient and Ready to Use Multilingual NLP tools Rodrigo Agerri IXA NLP Group,

Prominent Research Directions in NLP Alexander Panchenko Assistant Professor for NLP About

Natural Language Processing (NLP) In 11-711 Algorithms for NLP we take an

Deep Learning for NLP Kiran Vodrahalli Feb 11, 2015 Overview What is NLP? Natural

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

NLP Programming Tutorial 4 - Word Segmentation Graham Neubig Nara Institute of Science and

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

Circular dichroism and other spectroscopies Lecture 8 EMBO Global Exchange Lecture Course

MOL2NET, 2018 , 4, http://sciforum.net/conference/mol2net-04 2 [1. This implies that there were

An Improved Approach for Glycan Structure Identification from HCD MS/MS Spectra Weiping Sun, Yi

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: &amp; Previous

V. Water Vapour in Air V. Water Vapour in Air So far we have indicated the presence of water

Natural Language Processing and Information Retrieval Indexing and Vector Space Models

Applications of Lattices in Telecommunications Amin Sakzad Dept of Electrical and Computer

Maximum-Likelihood Estimation The EM algorithm based on a presentation by Dan Klein We have

Microbiology Risk factors predictive of ulcers and amputation 1-4 Infections: & Previous