A Discriminative Model for Semantics-to-String Translation s - PowerPoint PPT Presentation

A Discriminative Model for Semantics-to-String Translation s Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 Aleˇ 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 1 / 14

Introduction State-of-the-art MT models still use a simplistic view of the data ◮ words typically treated as independent, unrelated units ◮ relations between words only captured through linear context Unified semantic representations, such as Abstract Meaning Representation (AMR, Banarescu et al. 2013), (re)gaining popularity Abstraction from surface words, semantic relations made explicit, related words brought together (possibly distant in the surface realization) Possible uses: ◮ Richer models of source context ← our work ◮ Target-side (or joint) models to capture semantic coherence ◮ Semantic transfer followed by target-side generation Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 2 / 14

Semantic Representation Logical Form transformed into an AMR-style representation (Vanderwende et al., 2015) Labeled directed graph, not necessarily acyclic (e.g. coreference) Nodes ∼ content words, edges ∼ semantic relations Function words (mostly) not represented as nodes “Bits” capture various linguistic properties Figure 1 : Logical Form (computed tree) for the sentence: I would like to give you a sandwich taken from the fridge. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 3 / 14

Graph-to-String Translation Translation = generation of target-side surface words in order, conditioned on source semantic nodes and previously generated words. Start in the (virtual) root At each step, transition to a semantic node and emit a target word A single node can be visited multiple times One transition can move anywhere in the LF Source-side semantic graph: G = ( V , E ), V = { n 1 , ..., n S } , E ⊂ V × V Target string E = ( e 1 , ..., e T ), alignment A = ( a 1 , ..., a T ), a i ∈ 0 ... S . T � P ( a i | a i − 1 , e i − 1 , G ) P ( e i | a i 1 , e i − 1 P ( A , E | G ) = , G ) 1 1 1 i =1 Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 4 / 14

Translation Example ... Ich möchte dir einen Sandwich "" "Dsub^-1" "Dobj->Dind" "Dind^-1->Dobj" ... I like you sandwich sandwich Figure 2 : An example of the translation process illustrating several first steps of translating the sentence into German (“ Ich m¨ ochte dir einen Sandwich... ”). Labels in italics correspond to the shortest undirected paths between the nodes. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 5 / 14

Alignment of Graph Nodes How do we align source-side semantic nodes to target-side words? Evaluated approaches: 1 Gibbs sampling 2 Direct GIZA++ 3 Alignment composition Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 6 / 14

Alignment of Graph Nodes – Gibbs Sampling Alignment ( ∼ transition) distribution P ( a i | · · · ) modeled as a categorical distribution: P ( a i | a i − 1 , G ) ∝ c( label ( a i − 1 , a i )) Translation ( ∼ emission) distribution modeled as a set of categorical distributions, one for each source semantic node: P ( e i | n a i ) ∝ c( lemma ( n a i ) → e i ) Sample from the following distribution: P ( t | n i ) ∝ c( lemma ( n i ) → t ) + α c( lemma ( n i )) + α L × c( label ( n i , n i − 1 )) + β T + β P × c( label ( n i +1 , n i )) + β T + β P Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 7 / 14

Alignment of Graph Nodes – Evaluation 2 Direct GIZA++ ◮ Linearize the LF, run GIZA++ (standard word alignment) ◮ Heuristic linearization, try to preserve source surface word order 3 Alignment composition ◮ Source-side nodes to source-side tokens – Parser-provided alignment – GIZA++ ◮ Source-target word alignment – GIZA++ Manual inspection of alignments Alignment composition clearly superior Not much difference between GIZA++ and parser alignments Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 8 / 14

Discriminative Translation Model A maximum-entropy classifier � � w · � f ( e i , n a i , n a i − 1 , G , e i − 1 exp i − k +1 ) � P ( e i | n a i , n a i − 1 , G , e i − 1 i − k +1 ) = Z � w · � f ( e ′ , n a i , n a i − 1 , G , e i − 1 Z = exp( � i − k +1 )) e ′ ∈ GEN ( n ai ) Possible classes: top 50 translations observed with given lemma Online learning with stochastic gradient descent Learning rate 0.05, cumulative L1 regularization with weight 1, batch size 1, 22 hash bits Early stopping when held-out perplexity increases Parallelized (multi-threading) and distributed learning for tractability Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 9 / 14

Feature Set ... Ich möchte dir einen Sandwich "" "Dobj->Dind" "Dind^-1->Dobj" "Dsub^-1" ... I like you sandwich sandwich Current node, previous node, parent node – lemma, POS, bits Path from previous node – path length, path description Bag of lemmas – capture overall topic of the sentence Graph context – features from nodes close in the graph (limited by the length of shortest undirected path) Generated tokens – “fertility”; some nodes should generate a function word first (e.g. an article) and then the content word Previous tokens – target-side context Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 10 / 14

Experiments Evaluated in a n -best re-ranking experiment ◮ Generate 1000-best translations of devset sentences ◮ Add scores from our model ◮ Re-run MERT on the enriched n -best lists Basic phrase-based system, French → English 1 million parallel training sentences Obtained small but consistent improvements Differences would most likely be larger after integration in decoding Dataset Baseline +Semantics WMT 2009 = devset 17.44 17.55 WMT 2010 17.59 17.64 WMT 2013 17.41 17.55 Table 1 : BLEU scores of n -best reranking in French → English translation. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 11 / 14

Conclusion Initial attempt at including semantic features in statistical MT Feature set comprising morphological, syntactic and semantic properties Small but consistent improvement of BLEU Future work: Integrate directly in the decoder Parser accuracy limited – use multiple analyses Explore other ways of integration ◮ Target-side models of semantic plausibility ◮ Semantic transfer and generation Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 12 / 14

Thank You! Questions? Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 13 / 14

References Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. Abstract meaning representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse , pages 178–186, Sofia, Bulgaria, August 2013. Association for Computational Linguistics. URL http://www.aclweb.org/anthology/W13-2322 . Lucy Vanderwende, Arul Menezes, and Chris Quirk. An AMR parser for English, French, German, Spanish and Japanese and new AMR-annotated corpus. In Proceedings of the 2015 NAACL HLT Demonstration Session , Denver, Colorado, June 2015. Association for Computational Linguistics. Tamchyna, Quirk, Galley Semantics-to-String Translation July 30, 2015 14 / 14

A Discriminative Model for Semantics-to-String Translation s - PowerPoint PPT Presentation

A Discriminative Model for Semantics-to-String Translation s Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 Ale 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley Semantics-to-String Translation July

The String Class Trace Code Constructing a String String s = "Java"; String

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Three models for discriminative machine Three models for discriminative machine translation using

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

String Objectives Discuss string handling System.String class

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Generative vs. discriminative Generative Discriminative Belief network A is more More

Machine Translation at Edinburgh Factored Translation Models and Discriminative Training Philipp

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Discriminative word alignment by learning the Discriminative word alignment by learning the

Debunking Design Flaws in PHP Code using Static Call Graphs Berlin PHP Usergroup Falko Menge

Binary optimization models for computing the frustration index in signed graphs Mark C. Wilson

Efficient Exploration of Anonymous Undirected Graphs Ralf Klasing CNRS LaBRI Universit

SANTANDER CONSUMER USA HOLDINGS INC . 2015 Analyst and Investor Day 11.19.2015 IMPORTANT

Behavioral Health Collaboration One Year Later July 25, 2013 Mary Beth Bonaventura, Director

Unfolding an Indoor Origami World David Fouhey, Abhinav Gupta, Martial Hebert 1 2 3 Local

Model-free Approach to Garments Unfolding Based on Detection of Folded Layers Jan Stria, Vladim

Uses of f Multilingualism in Language Education: An Unfolding Story ry Constant Leung Shifting

A Discriminative Model for Semantics-to-String Translation s - PowerPoint PPT Presentation

A Discriminative Model for Semantics-to-String Translation s Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 Ale 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley Semantics-to-String Translation July

The String Class Trace Code Constructing a String String s = &quot;Java&quot;; String

1 2 3+4 2 type Parser = String Tree type Parser = String ( Tree, String) type Parser =

Three models for discriminative machine Three models for discriminative machine translation using

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

String Matching Inge Li Grtz CLRS 32 String Matching String matching problem: string

String Matching String matching problem: string T (text) and string P (pattern) over an

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Inducing a Discriminative Parser to Optimize Machine Translation Reordering Graham Neubig 1,2,3 ,

String Objectives Discuss string handling System.String class

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

Generative vs. discriminative Generative Discriminative Belief network A is more More

Machine Translation at Edinburgh Factored Translation Models and Discriminative Training Philipp

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Discriminative word alignment by learning the Discriminative word alignment by learning the

Debunking Design Flaws in PHP Code using Static Call Graphs Berlin PHP Usergroup Falko Menge

Binary optimization models for computing the frustration index in signed graphs Mark C. Wilson

Efficient Exploration of Anonymous Undirected Graphs Ralf Klasing CNRS LaBRI Universit

SANTANDER CONSUMER USA HOLDINGS INC . 2015 Analyst and Investor Day 11.19.2015 IMPORTANT

Behavioral Health Collaboration One Year Later July 25, 2013 Mary Beth Bonaventura, Director

Unfolding an Indoor Origami World David Fouhey, Abhinav Gupta, Martial Hebert 1 2 3 Local

Model-free Approach to Garments Unfolding Based on Detection of Folded Layers Jan Stria, Vladim

Uses of f Multilingualism in Language Education: An Unfolding Story ry Constant Leung Shifting

The String Class Trace Code Constructing a String String s = "Java"; String