An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation Spence Green with Daniel Cer and Chris Manning Stanford University WMT // 27 June 2014
Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection 2
Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 2
Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 SGD 48.1 2
Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 SGD 48.1 SGD + Features + 1.5 49.9 :-) 2
Motivation #1: WMT13 Shared Task :-( ● 32 ● ● ● ● ● BLEU newtest2008−2011 ● ● ● 31 ● ● ● ● ● ● ● ● 30 ● 29 Model ● dense ● ● feature−rich ● 1 2 3 4 5 6 7 8 9 10 Epoch 3
Motivation #1: WMT13 Shared Task En–Fr news2012 (dev) BLEU Dense 31.1 SGD + Features 31.5 + 0.4 4
Motivation #2: Practical Issues Q1 : Which phrase-based features should I use? 5
Motivation #2: Practical Issues Q1 : Which phrase-based features should I use? Q2 : Why don’t my features help? 5
My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 6
My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 6
My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 3. Representation: Features 6
My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 3. Representation: Features This paper as a pain reliever... 6
Loss Function
ACL13: Online PRO Sensitive to length Doesn’t optimize top- k Slow to compute (sampling) 8
This work: Online Expected Error Expected BLEU ℓ t ( t − 1 ) = E p t − 1 [ − BLEU ( d )] � = − p t − 1 ( d ) · BLEU ( d ) d ∈ H 9
This work: Online Expected Error Expected BLEU ℓ t ( t − 1 ) = E p t − 1 [ − BLEU ( d )] � = − p t − 1 ( d ) · BLEU ( d ) d ∈ H Smooth, non-convex Fast , less sensitive to length ...but still doesn’t prefer top- k 9
References and Scoring
Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference 11
Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference Baseline MT system 11
Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference Baseline MT system Ar–En NIST MT05 has five (5) references 11
MT05: Max. vs. Min. BLEU + 1 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Maximum ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 25 50 75 100 Minimum 12
Recommend
More recommend