Inducing a Discriminative Parser to Optimize Machine Translation - PowerPoint PPT Presentation

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Preordering ● Long-distance reordering is a weak point of SMT ● Preordering first reorders, then translates kare wa gohan o tabeta F= kare wa tabeta gohan o F’= E= he ate rice ● A good preorderer will effectively find F' given F 2

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Syntactic Preordering ● Define rules over a syntactic parse of the source S VP D= waP oP S D'= VP PRN wa N o V waP oP F= kare wa gohan o tabeta PRN wa V N o F'= kare wa tabeta gohan o E= he ate rice ● What if we don't have a parser in the source language? 3

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Bracketing Transduction Grammars [Wu 97] ● Binary CFGs with only straight (S) and inverted (I) non-terminals, and pre-terminals (T) S I D= S S S D'= I T T T T T S S F= kare wa gohan o tabeta T T T T T F'= kare wa tabeta gohan o ● Language independent ● BTG tree uniquely defines a reordering 4

Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Training for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Training A= he ate rice E= 1) Unsupervised Induction Bilingual (Several Hand-Tuned Features) Grammar S Induction I S S T T T T T kare wa gohan o tabeta Supervised Training Supervised Training (Max Label Accuracy) 2) 3) (Max Tree Accuracy) Parser Reorderer Reordering Model Training Training Parsing Model 5

Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Induction for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Testing 1) Parsing Parsing Model X X X X T T T T T kare wa gohan o tabeta 2) Reordering Model Reordering S I S S T T T T T 6 kare wa tabeta gohan o

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Our Work: Inducing a Parser to Optimize Reordering ● What if we can reduce three steps to one, and directly maximize ordering accuracy? Testing Training F= kare wa gohan o tabeta F= kare wa gohan o tabeta A= he ate rice E= Parsing/Reordering Model Supervised Learning (Max Reordering Accuracy) S I S S Parsing/Reordering T T T T T Model kare wa tabeta gohan o 7

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework 8

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework ● Input: Source sentence F F= kare wa gohan o tabeta ● Output: Reordered source sentence F' F'= kare wa tabeta gohan o ● Latent: Bracketing transduction grammar derivation D S I D= S S T T T T T 9

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Scores and Losses ● Define a score over source sentences and derivations S ( F ,D ; w )= ∑ i w i ∗ϕ i ( F , D ) ● Optimization finds a weight vector that minimizes loss * , argmax ∑ F ,F ' L ( F' argmin F ' ← F ,D S ( F ,D ; w )) w 10

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Note: Latent Variable Ambiguity S S S I I S S S T T T T T T T T T T kare wa gohan o tabeta kare wa gohan o tabeta kare wa tabeta gohan o kare wa tabeta gohan o ● Out of these, we want easy-to-reproduce trees ● [DeNero+ 11] finds trees with bilingual parsing model ● Our model discovers trees during training 11

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Training: Latent Online Learning ● Find ● model parse of maximal score and ● oracle parse of maximal score among parses of minimal loss BTG Trees oracle S ( F , ̃ ̂ D ∈ argmin D L ( F ,D ) S ( F , ̃ D = argmax D ) D = argmax D ) ̃ ̃ D ● Adjust weights (example: perceptron) w ← w +ϕ( F , ̂ D )−ϕ( F ,D ) 12

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Considering Loss in Online Learning ● Consider loss (how bad is the mistake?) kare wa tabeta gohan o reference (L=0) kare wa gohan tabeta o L=1 L=8 o gohan tabeta wa kare ● Make it easy to choose trees with high loss in training →To avoid high-loss trees, must give a large penalty S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D 13

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parser 14

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parsing Setup: Standard Discriminative Parser ● Features independent with respect to each node ● Parsing, reordering possible in O(n 3 ) with CKY ● Multi-word pre-terminals allowed S I S T T T T kare wa gohan o tabeta 15

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Independent Features ● No linguistic analysis necessary I kare wa gohan o tabeta 7 12 39 12 5 ● Lexical: Left, right, inside, outside, boundary words ● Class: Same as lexical but induced classes ● Phrase Table: Whether span exists in phrase table ● Balance: Left branching or right branching? 16

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Dependent Features I I kare wa gohan o tabeta kare wa gohan o tabeta PRN wa N o V N wa N o V waP oP VP VP ● POS Features: Same as ● CFG Features: Whether lexical, but over POSs nodes match supervised parser's spans 17

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses 18

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Chunk Fragmentation ● How many chunks are necessary to reproduce reference? System <s> kare wa gohan o tabeta </s> Reordering: Reference <s> </s> kare wa tabeta gohan o Reordering: Loss: L chunk ( F , ̃ D )= Number of Chunks - 1 Accuracy: A chunk ( F , ̃ D )= 1 - (Number of Chunks - 1)/(J+1) 19

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Kendall's Tau ● How many pairs of reversed words? System kare wa gohan o tabeta Reordering: Reference kare wa tabeta gohan o Reordering: Loss: L tau ( F , ̃ D )= Reversed Words Accuracy: A tau ( F , ̃ D )= 1 - Reversed Word/Potential Reversed Words 20

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Calculating Loss by Node ● Large-margin training, must calculate loss efficiently S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D ● Can factor loss by node as well (detail in paper) S Tau Chunk S wa gohan kare wa gohan o tabeta or or * kare o L left * * L right or tabeta L left L right L between 21 L between

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experiments 22

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● English-Japanese and Japanese-English translation ● Data from the Kyoto Free Translation Task sent. word (ja) word (en) RM-train 602 14.5k 14.3k Manually RM-test 555 11.2k 10.4k Aligned LM/TM 329k 6.08M 5.91M tune 1166 26.8k 24.3k test 1160 28.5k 26.7k 23

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● Reordering Model Training: ● 500 iterations ● Using Pegasos with regularization constant 10 -3 ● Default: chunk fragmentation loss, standard features ● Translation: Moses with lexicalized reordering ● Compare: Original order, 3-step training, the proposed method 24

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Reordering ● Results for chunk fragmentation/Kendall's Tau Orig Orig Tau Chunk 100 3-Step 100 3-Step Proposed Proposed 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 25

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Translation ● Results for BLEU and RIBES: Orig Orig BLEU RIBES 25 75 3-Step 3-Step Proposed Proposed 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 26

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Adding Linguistic Info (Generally) Helps Orig Orig BLEU RIBES Standard Standard 25 75 +POS +POS +CFG +CFG 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 27

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Reordering ● Optimized criterion is higher on test set as well Orig Orig Chunk Tau Chunk Chunk 100 100 Tau Tau Chunk+Tau Chunk+Tau 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 28

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Translation ● Optimizing chunk fragmentation generally gives best results Orig Orig BLEU RIBES Chunk Chunk 25 75 Tau Tau Chunk+Tau Chunk+Tau 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 29

Inducing a Discriminative Parser to Optimize Machine Translation - PowerPoint PPT Presentation

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1 Inducing a

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Three models for discriminative machine Three models for discriminative machine translation using

Inducing Efficiently Inducing Efficiently optimizi optimizing outpati ng outpatient i ent

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Parser Larissa von Witte Institut fr Softwaretechnik und Programmiersprachen 11. Januar 2016

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

MINUTE OPTIMIZE YOUR PH MONITORING OPTIMIZE WITH HAVING CHALLENGES MEASURING

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus

Inducing Source Definitions for Web Service Composition Mark Carman Craig Knoblock Overview of

We Crashed, Now What? Lorenzo Cavallaro Cristiano Giuffrida Andrew S. Tanenbaum Vrije

Clinical Trials in OSA Samuel T. Kuna, MD Department of Medicine Center for Sleep and Circadian

Git as a HIT Dan Licata Wesleyan University 1 1 Darcs Git as a HIT Dan Licata Wesleyan

Machine Learning II DS 4420 - Spring 2020 MLE, MAP, & Graphical models Byron C. Wallace

Warming up Storage-level Caches with Bonfire Yiying Zhang Gokul Soundararajan Mark W. Storer

A Low Power Asynchronous GPS Baseband Processor Benjamin Z. Tang, Stephen Longfield, Jr., Sunil

3. Data Structure and Algorithm 3.1 Proplets for Coding Propositional Content 3.1.1 C ONTEXT

ADVENTURES IN TIME & SPACE Jim Royer Syracuse University Joint work with Norman Danner

Inducing a Discriminative Parser to Optimize Machine Translation - PowerPoint PPT Presentation

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1 Inducing a

https://bazel.build/ Inputs /usr/bin/cc Action Outputs ./parser.h cc -I. -c parser.c -o

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Building a Predictive Parser I.e., How to build the parse table for a recursive-descent parser 1

Tasks of a Parser Tasks of a Parser Document Parser Interfaces Document Parser Interfaces

Three models for discriminative machine Three models for discriminative machine translation using

Inducing Efficiently Inducing Efficiently optimizi optimizing outpati ng outpatient i ent

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Ensemble Models for Dependency Parsing: Cheap and Good? Mihai Surdeanu and Christopher D. Manning

Parser Larissa von Witte Institut fr Softwaretechnik und Programmiersprachen 11. Januar 2016

Generative vs. discriminative Generative Discriminative Belief network A is more More

Discriminative word alignment by learning the Discriminative word alignment by learning the

MINUTE OPTIMIZE YOUR PH MONITORING OPTIMIZE WITH HAVING CHALLENGES MEASURING

Inducing Interpretable Word Senses for WSD and Enrichment of Lexical Resources Overview Jan 11,

Low-Rank Inducing Norms with Optimality Interpretations LU Christian Grussler 2017 Pontus

Inducing Source Definitions for Web Service Composition Mark Carman Craig Knoblock Overview of

We Crashed, Now What? Lorenzo Cavallaro Cristiano Giuffrida Andrew S. Tanenbaum Vrije

Clinical Trials in OSA Samuel T. Kuna, MD Department of Medicine Center for Sleep and Circadian

Git as a HIT Dan Licata Wesleyan University 1 1 Darcs Git as a HIT Dan Licata Wesleyan

Machine Learning II DS 4420 - Spring 2020 MLE, MAP, &amp; Graphical models Byron C. Wallace

Warming up Storage-level Caches with Bonfire Yiying Zhang Gokul Soundararajan Mark W. Storer

A Low Power Asynchronous GPS Baseband Processor Benjamin Z. Tang, Stephen Longfield, Jr., Sunil

3. Data Structure and Algorithm 3.1 Proplets for Coding Propositional Content 3.1.1 C ONTEXT

ADVENTURES IN TIME &amp; SPACE Jim Royer Syracuse University Joint work with Norman Danner

Machine Learning II DS 4420 - Spring 2020 MLE, MAP, & Graphical models Byron C. Wallace

ADVENTURES IN TIME & SPACE Jim Royer Syracuse University Joint work with Norman Danner