 
              Inducing a Discriminative Parser to Optimize Machine Translation Reordering Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 , Taro Watanabe 2 , Shinsuke Mori 1 1 2 3 now at 1
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Preordering ● Long-distance reordering is a weak point of SMT ● Preordering first reorders, then translates kare wa gohan o tabeta F= kare wa tabeta gohan o F’= E= he ate rice ● A good preorderer will effectively find F' given F 2
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Syntactic Preordering ● Define rules over a syntactic parse of the source S VP D= waP oP S D'= VP PRN wa N o V waP oP F= kare wa gohan o tabeta PRN wa V N o F'= kare wa tabeta gohan o E= he ate rice ● What if we don't have a parser in the source language? 3
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Bracketing Transduction Grammars [Wu 97] ● Binary CFGs with only straight (S) and inverted (I) non-terminals, and pre-terminals (T) S I D= S S S D'= I T T T T T S S F= kare wa gohan o tabeta T T T T T F'= kare wa tabeta gohan o ● Language independent ● BTG tree uniquely defines a reordering 4
Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Training for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Training A= he ate rice E= 1) Unsupervised Induction Bilingual (Several Hand-Tuned Features) Grammar S Induction I S S T T T T T kare wa gohan o tabeta Supervised Training Supervised Training (Max Label Accuracy) 2) 3) (Max Tree Accuracy) Parser Reorderer Reordering Model Training Training Parsing Model 5
Inducing a Discriminative Parser to Optimize Machine Translation Reordering 3-Step BTG Grammar Induction for Reordering [DeNero+ 11] F= kare wa gohan o tabeta Testing 1) Parsing Parsing Model X X X X T T T T T kare wa gohan o tabeta 2) Reordering Model Reordering S I S S T T T T T 6 kare wa tabeta gohan o
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Our Work: Inducing a Parser to Optimize Reordering ● What if we can reduce three steps to one, and directly maximize ordering accuracy? Testing Training F= kare wa gohan o tabeta F= kare wa gohan o tabeta A= he ate rice E= Parsing/Reordering Model Supervised Learning (Max Reordering Accuracy) S I S S Parsing/Reordering T T T T T Model kare wa tabeta gohan o 7
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework 8
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Optimization Framework ● Input: Source sentence F F= kare wa gohan o tabeta ● Output: Reordered source sentence F' F'= kare wa tabeta gohan o ● Latent: Bracketing transduction grammar derivation D S I D= S S T T T T T 9
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Scores and Losses ● Define a score over source sentences and derivations S ( F ,D ; w )= ∑ i w i ∗ϕ i ( F , D ) ● Optimization finds a weight vector that minimizes loss * , argmax ∑ F ,F ' L ( F' argmin F ' ← F ,D S ( F ,D ; w )) w 10
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Note: Latent Variable Ambiguity S S S I I S S S T T T T T T T T T T kare wa gohan o tabeta kare wa gohan o tabeta kare wa tabeta gohan o kare wa tabeta gohan o ● Out of these, we want easy-to-reproduce trees ● [DeNero+ 11] finds trees with bilingual parsing model ● Our model discovers trees during training 11
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Training: Latent Online Learning ● Find ● model parse of maximal score and ● oracle parse of maximal score among parses of minimal loss BTG Trees oracle S ( F , ̃ ̂ D ∈ argmin D L ( F ,D ) S ( F , ̃ D = argmax D ) D = argmax D ) ̃ ̃ D ● Adjust weights (example: perceptron) w ← w +ϕ( F , ̂ D )−ϕ( F ,D ) 12
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Considering Loss in Online Learning ● Consider loss (how bad is the mistake?) kare wa tabeta gohan o reference (L=0) kare wa gohan tabeta o L=1 L=8 o gohan tabeta wa kare ● Make it easy to choose trees with high loss in training →To avoid high-loss trees, must give a large penalty S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D 13
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parser 14
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Parsing Setup: Standard Discriminative Parser ● Features independent with respect to each node ● Parsing, reordering possible in O(n 3 ) with CKY ● Multi-word pre-terminals allowed S I S T T T T kare wa gohan o tabeta 15
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Independent Features ● No linguistic analysis necessary I kare wa gohan o tabeta 7 12 39 12 5 ● Lexical: Left, right, inside, outside, boundary words ● Class: Same as lexical but induced classes ● Phrase Table: Whether span exists in phrase table ● Balance: Left branching or right branching? 16
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Language Dependent Features I I kare wa gohan o tabeta kare wa gohan o tabeta PRN wa N o V N wa N o V waP oP VP VP ● POS Features: Same as ● CFG Features: Whether lexical, but over POSs nodes match supervised parser's spans 17
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses 18
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Chunk Fragmentation ● How many chunks are necessary to reproduce reference? System <s> kare wa gohan o tabeta </s> Reordering: Reference <s> </s> kare wa tabeta gohan o Reordering: Loss: L chunk ( F , ̃ D )= Number of Chunks - 1 Accuracy: A chunk ( F , ̃ D )= 1 - (Number of Chunks - 1)/(J+1) 19
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Reordering Losses [Talbot+ 11]: Kendall's Tau ● How many pairs of reversed words? System kare wa gohan o tabeta Reordering: Reference kare wa tabeta gohan o Reordering: Loss: L tau ( F , ̃ D )= Reversed Words Accuracy: A tau ( F , ̃ D )= 1 - Reversed Word/Potential Reversed Words 20
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Calculating Loss by Node ● Large-margin training, must calculate loss efficiently S ( F , ̃ D )+ L ( F , ̃ D = argmax D ) ̃ D ● Can factor loss by node as well (detail in paper) S Tau Chunk S wa gohan kare wa gohan o tabeta or or * kare o L left * * L right or tabeta L left L right L between 21 L between
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experiments 22
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● English-Japanese and Japanese-English translation ● Data from the Kyoto Free Translation Task sent. word (ja) word (en) RM-train 602 14.5k 14.3k Manually RM-test 555 11.2k 10.4k Aligned LM/TM 329k 6.08M 5.91M tune 1166 26.8k 24.3k test 1160 28.5k 26.7k 23
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Experimental Setup ● Reordering Model Training: ● 500 iterations ● Using Pegasos with regularization constant 10 -3 ● Default: chunk fragmentation loss, standard features ● Translation: Moses with lexicalized reordering ● Compare: Original order, 3-step training, the proposed method 24
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Reordering ● Results for chunk fragmentation/Kendall's Tau Orig Orig Tau Chunk 100 3-Step 100 3-Step Proposed Proposed 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 25
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Proposed Model Improves Translation ● Results for BLEU and RIBES: Orig Orig BLEU RIBES 25 75 3-Step 3-Step Proposed Proposed 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 26
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Adding Linguistic Info (Generally) Helps Orig Orig BLEU RIBES Standard Standard 25 75 +POS +POS +CFG +CFG 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 27
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Reordering ● Optimized criterion is higher on test set as well Orig Orig Chunk Tau Chunk Chunk 100 100 Tau Tau Chunk+Tau Chunk+Tau 90 90 80 80 70 70 60 60 50 50 en-ja ja-en en-ja ja-en 28
Inducing a Discriminative Parser to Optimize Machine Translation Reordering Result: Training Loss Affects Translation ● Optimizing chunk fragmentation generally gives best results Orig Orig BLEU RIBES Chunk Chunk 25 75 Tau Tau Chunk+Tau Chunk+Tau 23 70 21 19 65 17 15 60 en-ja ja-en en-ja ja-en 29
Recommend
More recommend