An Empirical Comparison of Features and Tuning for Phrase-based - PowerPoint PPT Presentation

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation Spence Green with Daniel Cer and Chris Manning Stanford University WMT // 27 June 2014

Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection 2

Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 2

Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 SGD 48.1 2

Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection BOLT-scale Zh–En on NIST data: BLEU Δ MERT 48.4 SGD 48.1 SGD + Features + 1.5 49.9 :-) 2

Motivation #1: WMT13 Shared Task :-( ● 32 ● ● ● ● ● BLEU newtest2008−2011 ● ● ● 31 ● ● ● ● ● ● ● ● 30 ● 29 Model ● dense ● ● feature−rich ● 1 2 3 4 5 6 7 8 9 10 Epoch 3

Motivation #1: WMT13 Shared Task En–Fr news2012 (dev) BLEU Dense 31.1 SGD + Features 31.5 + 0.4 4

Motivation #2: Practical Issues Q1 : Which phrase-based features should I use? 5

Motivation #2: Practical Issues Q1 : Which phrase-based features should I use? Q2 : Why don’t my features help? 5

My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 6

My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 6

My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 3. Representation: Features 6

My Frustrating Summer... What’s wrong with feature-rich MT? 1. Loss Function 2. References and scoring functions 3. Representation: Features This paper as a pain reliever... 6

Loss Function

ACL13: Online PRO Sensitive to length Doesn’t optimize top- k Slow to compute (sampling) 8

This work: Online Expected Error Expected BLEU ℓ t (  t − 1 ) = E p t − 1 [ − BLEU ( d )] � = − p  t − 1 ( d ) · BLEU ( d ) d ∈ H 9

This work: Online Expected Error Expected BLEU ℓ t (  t − 1 ) = E p t − 1 [ − BLEU ( d )] � = − p  t − 1 ( d ) · BLEU ( d ) d ∈ H Smooth, non-convex Fast , less sensitive to length ...but still doesn’t prefer top- k 9

References and Scoring

Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference 11

Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference Baseline MT system 11

Single vs. Multiple References Experiment : Compute BLEU + 1 for each reference Baseline MT system Ar–En NIST MT05 has five (5) references 11

MT05: Max. vs. Min. BLEU + 1 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 75 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Maximum ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0 25 50 75 100 Minimum 12

An Empirical Comparison of Features and Tuning for Phrase-based - PowerPoint PPT Presentation

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation Spence Green with Daniel Cer and Chris Manning Stanford University WMT // 27 June 2014 Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Translation Model Parallel corpus source target translation e f phrase phrase features

A Cache-conscious Profitability A Cache-conscious Profitability Model for Empirical Tuning of

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

FODO + Space Charge around the 90 deg stop-band Simulation Set-up Consider a proton beam in a

Abstract Stone Duality Paul Taylor University of Manchester Funded by UK EPSRC GR/S58522

Controllability implies ergodicity Armen Shirikyan Department of Mathematics University of

Divergence The gradient of a scalar field f is defined as f ( x, y, z ) = f x ( x, y, z )

Diagnosis and Treatment of Osteoporosis: Whats New and Controversial in 2020? Douglas C.

Limitlessly Scalable Storage for Capacity-Intensive Computing Meet Cloudian S3-compatible

Big Table Indexing, session 9 CS6200: Information Retrieval Slides by: Jesse Anderton

Real-time monitoring of growing pigs Thomas Nejsum Madsen IQinAbox www.iqinabox.com IQinAbox

Sambuz

Useful Links

Newsletter

Mail Us

An Empirical Comparison of Features and Tuning for Phrase-based - PowerPoint PPT Presentation

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation Spence Green with Daniel Cer and Chris Manning Stanford University WMT // 27 June 2014 Recap: ACL13 Results SGD-based, n -best learning L 1 feature selection

Phrase Weights Statistical NLP Spring 2011 Lecture 10: Phrase Alignment Dan Klein UC

Building a Phrase-based SMT System Graham Neubig &amp; Kevin Duh Nara Institute of Science and

What Is an Expanded Noun Phrase? An expanded noun phrase gives much more detail than a simple

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Translation Model Parallel corpus source target translation e f phrase phrase features

A Cache-conscious Profitability A Cache-conscious Profitability Model for Empirical Tuning of

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Southern Pinghua and its Noun Southern Pinghua and its Noun Southern Pinghua and its Noun

Phrasal Rank-Encoding Exploiting Phrase Redundancy and Translational Relations for Phrase Table

NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science

FODO + Space Charge around the 90 deg stop-band Simulation Set-up Consider a proton beam in a

Abstract Stone Duality Paul Taylor University of Manchester Funded by UK EPSRC GR/S58522

Controllability implies ergodicity Armen Shirikyan Department of Mathematics University of

Divergence The gradient of a scalar field f is defined as f ( x, y, z ) = f x ( x, y, z )

Diagnosis and Treatment of Osteoporosis: Whats New and Controversial in 2020? Douglas C.

Limitlessly Scalable Storage for Capacity-Intensive Computing Meet Cloudian S3-compatible

Big Table Indexing, session 9 CS6200: Information Retrieval Slides by: Jesse Anderton

Real-time monitoring of growing pigs Thomas Nejsum Madsen IQinAbox www.iqinabox.com IQinAbox

Sambuz

Useful Links

Newsletter

Mail Us

Building a Phrase-based SMT System Graham Neubig & Kevin Duh Nara Institute of Science and