lti Introduction Two trends in machine translation research Many - PowerPoint PPT Presentation

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith lti

Introduction � Two trends in machine translation research � Many approaches to decoding � Phrase-based � Hierarchical phrase-based � Tree-to-string � String-to-tree � Tree-to-tree � Regardless of decoding approach, addition of richer features can improve translation quality Decoding algorithms are strongly tied to features permitted lti

Introduction � Two trends in machine translation research � Many approaches to decoding � Phrase-based � Hierarchical phrase-based � Tree-to-string � String-to-tree � Tree-to-tree � Regardless of decoding approach, addition of richer features can improve translation quality � Decoding algorithms are strongly tied to features permitted lti

Phrase-Based Decoding konnten sie es übersetzen ? could you translate it ? lti

Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... lti

Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... 4 could 6 1 3 could you translate it ? could you translate it 2 5 could you 3 could you it translate it lti

Phrase-Based Decoding Phrase Table konnten sie es übersetzen ? 1 konnten → could 2 konnten sie → could you 3 es übersetzen → translate it 4 sie es übersetzen → you translate it could you translate it ? 5 es → it 6 ? → ? ... 4 could 6 1 3 could you translate it ? could you translate it 2 5 could you 3 could you it Phrase pairs N-gram language model translate it Phrase distortion/reordering Coverage constraints lti

Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... lti

Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... (2, 4) 1 translate it 4 (2, 3) 2 (0, 5) it could you translate it ? 5 3 (3, 4) translate lti

Hierarchical Phrase-Based Decoding SCFG Rules 1 X → es übersetzen / translate it konnten sie es übersetzen ? 0 1 2 3 4 5 2 X → es / it 3 X → übersetzen / translate could you translate it ? 4 X → konnten sie X ? / could you X ? 5 X → konnten sie X 1 X 2 ? / could you X 2 X 1 ? ... (2, 4) 1 translate it 4 (2, 3) 2 (0, 5) it could you translate it ? 5 3 (3, 4) SCFG rules translate N-gram language model Coverage constraints lti

Our goal: An MT framework that allows as many features as possible without committing to any particular decoding approach lti

Overview � Initial step towards a “universal decoder” that can permit any feature of source and target words/trees/alignments � Experimental platform for comparison of formalisms, feature sets, and training methods � Building blocks: � Quasi-synchronous grammar (Smith & Eisner 2006) � Generic approximate inference methods for non-local features (Chiang 2007; Gimpel & Smith 2009) lti

Outline � Introduction � Model � Quasi-Synchronous Grammar � Training and Decoding � Experiments � Conclusions and Future Work lti

� t ∗ , τ ∗ � , a ∗ � � �� p � t , τ � , a | s , τ � � � � ,τ � , � � target source alignment of source target words tree target tree words tree nodes to source tree nodes lti

Parameterization � t ∗ , τ ∗ � , a ∗ � � �� p � t , τ � , a | s , τ � � � � ,τ � , � � � We use a single globally-normalized log-linear model: �� { θ ⊤ g � s , τ � , a , t , τ � � } p � t , τ � , a | s , τ � � � � � �� { θ ⊤ g � s , τ � , a ′ , t ′ , τ ′ � � } � ′ , � ′ ,τ ′ � Features can look at any part of any structure lti

Features � Log-linear models allow “arbitrary” features, but in practice inference algorithms must be developed to support feature sets � Many types of features appear in MT: � lexical word and phrase mappings � N -gram and syntactic language models � distortion/reordering � hierarchical phrase mappings � syntactic transfer rules � We want to use all of these! lti

Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � To model target trees, any monolingual formalism can be used We use a quasi-synchronous dependency grammar (QDG) Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) By placing constraints on the alignments, we obtain synchronous grammars lti

Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � τ � � � To model target trees, any monolingual formalism can be used � We use a quasi-synchronous dependency grammar (QDG) Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) By placing constraints on the alignments, we obtain synchronous grammars lti

Quasi-Synchronous Grammar (Smith & Eisner 06) � A quasi-synchronous grammar (QG) is a model of p � t , τ � , a | s , τ � � τ � � � To model target trees, any monolingual formalism can be used � We use a quasi-synchronous dependency grammar (QDG) a � � Each node in the target tree is aligned to zero or more nodes in the source tree (for a QDG, nodes = words) � Constraints on the alignments → synchronous grammar � In QG, departures from synchrony are penalized softly using features lti

$ konnten sie es übersetzen ? $ could you translate it ? lti

For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? $ konnten sie es übersetzen ? $ could you translate it ? lti

For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? $ konnten sie es übersetzen ? Parent-child $ could you translate it ? lti

For every parent-child pair in the target sentence, what is the relationship of the source words they are linked to? All “parent-child ” configurations → synchronous dependency grammar $ konnten sie es übersetzen ? $ could you translate it ? lti

Many other configurations are possible: $ wo kann ich untergrundbahnkarten kaufen ? Same node $ where can i buy subway tickets ? lti

Many other configurations are possible: Parent-child Child-parent Same node Sibling Grandparent/child Grandchild/parent C-Command Parent null Child null Both null Other lti

Coverage Features � There are no hard constraints to ensure that all source words get translated � While QG has been used for several tasks, it has not previously been used for generation � We add coverage features and learn their weights lti

Coverage Feature Weight -2.21 Word never translated Word translated that was translated at least N times already: N = 0 1.48 N = 1 -3.04 N = 2 -0.22 N = 3 -0.05 lti

Coverage Feature Weight -2.21 Word never translated Word translated that was translated at least N times already: N = 0 1.48 N = 1 -3.04 N = 2 -0.22 N = 3 -0.05 2 1 0 -1 -2 Score -3 -4 -5 -6 0 1 2 3 4 5 Number of times word is translated lti

Decoding � A QDG induces a monolingual grammar for a source sentence whose language consists of all possible translations � Decoding: � Build a weighted lattice encoding the language of this grammar � Perform lattice parsing with a dependency grammar � Extension of dependency parsing algs for strings (Eisner 97) � Integrate non-local features via cube pruning/decoding (Chiang 07, Gimpel & Smith 09) lti

$ konnten sie es übersetzen ? could you translate it ? lti

lti Introduction Two trends in machine translation research Many - PowerPoint PPT Presentation

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith lti Introduction Two trends in machine translation research Many approaches to decoding Phrase-based Hierarchical phrase-based

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt < | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt < | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

Polar Coding Part 1 - Background Erdal Arkan Electrical-Electronics Engineering Department,

Practical Code Design for Compute-and-Forward Or Ordentlich Joint work with Jiening Zhan, Uri

Efcient Design Of Multi-ormat Video Decoders Dr Doug Ridge Agenda The Increasing Challenge

Chapter 4 MARIE: An Introduction to a Simple Computer 4.8 MARIE This is the MARIE architecture

Decoding of Linear Codes Arman Fazeli Alexander Vardy Hanwen Yao afazelic@ucsd.edu

On List Decoding of Alternant Codes in the Hamming and Lee metrics Ido Tal Ron M. Roth Computer

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Video Analysis using CUDA and OpenCV Sam Radhakrishnan Alphonso Labs Session Overview This

lti Introduction Two trends in machine translation research Many - PowerPoint PPT Presentation

Feature-Rich Translation by Quasi-Synchronous Lattice Parsing Kevin Gimpel and Noah A. Smith lti Introduction Two trends in machine translation research Many approaches to decoding Phrase-based Hierarchical phrase-based

Models for LTI systems LTI system stands for linear time invariant system Model describing LTI

Topic 2: LTI Systems and Convolution Response of LTI Systems Impulse response and unit

SIMPLE &amp; LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

lti 1 (typically) Unsupervised learning in NLP non-convex optimization lti 2

lti The Goal Input: educational text Output: quiz lti The Goal Input:

C. H. Perez &amp; Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

CMU LTI @ KBP 2015 Event Track Zhengzhong Liu Dheeru Dua Jun Araki Teruko Mitamura Eduard Hovy

dt &lt; | ( ) | h t (this has to do with system stability system stability)

INC 212 Signals and systems Lecture#4: Frequency response of LTI systems Assoc. Prof. Benjamas

M u lti v ariable logistic regression G E N E R AL IZE D L IN E AR MOD E L S IN P YTH ON Ita

Textual Predictors of Bill Survival in Congressional Committees Tae Yano , LTI, CMU Noah Smith ,

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties Dipanjan Das , LTI, CMU Google

Signal and Systems Chapter 2: LTI Systems Representation of DT signals in terms of shifted unit

dt &lt; | ( ) | h t (this has to do with system stability system stability (ECE

Representation of LTI Systems Prof. Seungchul Lee Industrial AI Lab. Transfer Function

VASCO (VAcuum Stability COde) : multi-gas code to calculate gas density lti d t l l t d it

Polar Coding Part 1 - Background Erdal Arkan Electrical-Electronics Engineering Department,

Practical Code Design for Compute-and-Forward Or Ordentlich Joint work with Jiening Zhan, Uri

Efcient Design Of Multi-ormat Video Decoders Dr Doug Ridge Agenda The Increasing Challenge

Chapter 4 MARIE: An Introduction to a Simple Computer 4.8 MARIE This is the MARIE architecture

Decoding of Linear Codes Arman Fazeli Alexander Vardy Hanwen Yao afazelic@ucsd.edu

On List Decoding of Alternant Codes in the Hamming and Lee metrics Ido Tal Ron M. Roth Computer

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Video Analysis using CUDA and OpenCV Sam Radhakrishnan Alphonso Labs Session Overview This

SIMPLE & LEAN PRODUCER Expanding Production and Reducing Costs Health and Safety Update: No

C. H. Perez & Associates C Consulting E lti E ngineers, Inc. i I FDOT District Four

dt < | ( ) | h t (this has to do with system stability system stability)

dt < | ( ) | h t (this has to do with system stability system stability (ECE