Decoding and Inference with Syntactic Translation Models Machine - PowerPoint PPT Presentation

Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn

jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta CFGs S S → NP VP NP V VP → NP VP V → jon-ga NP V NP → NP → Output:

ringo-o jon-ga tabeta Synchronous CFGs S → NP VP NP V VP → V → NP → NP →

ringo-o jon-ga tabeta Synchronous CFGs : (monotonic) S → NP VP 2 1 : (inverted) NP V VP → 2 1 : ate V → : John NP → : an apple NP →

tabeta jon-ga ringo-o Synchronous generation S S NP VP NP VP V NP John NP V ate an apple ( ) Output: jon-ga ringo-o tabeta : John ate an apple

jon-ga ringo-o tabeta Translation as parsing Parse source Project to target S S VP VP NP V NP NP NP V John ate an apple

A closer look at parsing • Parsing is usually done with dynamic programming • Share common computations and structure • Represent exponential number of alternatives in polynomial space • With SCFGs there are two kinds of ambiguity • source parse ambiguity • translation ambiguity • parse forests can represent both!

A closer look at parsing • Any monolingual parser can be used (most often: CKY or variants on the CKY algorithm) • Parsing complexity is O( |n 3 |) • cubic in the length of the sentence ( n 3 ) • cubic in the number of non-terminals ( |G| 3 ) • adding nonterminal types increases parsing complexity substantially! • With few NTs, exhaustive parsing is tractable

Parsing as deduction Antecedents Side conditions A : u B : v φ C : w Consequent “If A and B are true with weights u and v , and phi is also true, then C is true with weight w .”

Example: CKY Inputs: f = h f 1 , f 2 , . . . , f ` i Context-free grammar in Chomsky G normal form. Item form: A subtree rooted with NT type X [ X, i, j ] spanning i to j has been recognized.

Example: CKY Goal: [ S, 0 , � ] Axioms: ( X → f i ) ∈ G w [ X, i − 1 , i ] : w Inference rules: [ X, i, k ] : u [ Y, k, j ] : v ( Z → XY ) ∈ G w [ Z, i, j ] : u × v × w

duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

duck S → PRP VP VP → V NP S,0,4 VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

S,0,4 What is this object? VP,1,4 SBAR 2,4 NP,2,4 V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

Semantics of hypergraphs • Generalization of directed graphs • Special node designated the “goal” • Every edge has a single head and 0 or more tails (the arity of the edge is the number of tails) • Node labels correspond to LHS’s of CFG rules • A derivation is the generalization of the graph concept of path to hypergraphs • Weights multiply along edges in the derivation, and add at nodes (cf. semiring parsing )

Edge labels • Edge labels may be a mix of terminals and substitution sites (non-terminals) • In translation hypergraphs, edges are labeled in both the source and target languages • The number of substitution sites must be equal to the arity of the edge and must be the same in both languages • The two languages may have different orders of the substitution sites • There is no restriction on the number of terminal symbols

la lectura la lectura Edge labels la lectura : reading de : 's 1 2 2 1 X S ayer : yesterday X de : from 1 2 1 2 { , ( ) de ayer : yesterday ’s reading } ( ) de ayer : reading from yesterday

Inference algorithms • Viterbi O ( | E | + | V | ) • Find the maximum weighted derivation • Requires a partial ordering of weights • Inside - outside O ( | E | + | V | ) • Compute the marginal (sum) weight of all derivations passing through each edge/node • k -best derivations O ( | E | + | D max | k log k ) • Enumerate the k -best derivations in the hypergraph • See IWPT paper by Huang and Chiang (2005)

Things to keep in mind Bound on the number of edges: | E | ∈ O ( n 3 | G | 3 ) Bound on the number of nodes: | V | ∈ O ( n 2 | G | )

Decoding Again • Translation hypergraphs are a “lingua franca” for translation search spaces • Note that FST lattices are a special case • Decoding problem: how do I build a translation hypergraph?

Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1

jon-ga ringo-o tabeta Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1 “Lexical” rules: : ate S → : S John → : an apple S →

Representational limits • Phrase-based decoding runs in exponential time • All permutations of the source are modeled (traveling salesman problem!) • Typically distortion limits are used to mitigate this • But parsing is polynomial...what’s going on?

Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C

Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar?

Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar? No . Synchronous CFGs cannot generally be binarized!

Does this matter? • The “forbidden” pattern is observed in real data (Melamed, 2003) • Does this matter? • Learning • Phrasal units and higher rank grammars can account for the pattern • Sentences can be simplified or ignored • Translation • The pattern does exist, but how often must it exist (i.e., is there a good translation that doesn’t violate the SCFG matching property)?

Tree-to-string • How do we generate a hypergraph for a tree-to- string translation model? • Simple linear-time (given a fixed translation model) top-down matching algorithm • Recursively cover “uncovered” sites in tree • Each node in the input tree becomes a node in the translation forest • For details, Huang et al. (AMTA, 2006) and Huang et al. (EMNLP , 2010)

} S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 Tree-to-string grammar tabeta → ate ringo-o → an apple jon-ga → John

jon-ga ringo-o tabeta S VP NP NP V S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 tabeta → ate ringo-o → an apple jon-ga → John

jon-ga ringo-o tabeta S S VP 1 2 NP NP V NP VP 1 John 2 1 NP V 2 S( x 1 :NP x 2 :VP) → x 1 x 2 an apple VP( x 1 :NP x 2 :V) → x 2 x 1 ate tabeta → ate ringo-o → an apple jon-ga → John

Decoding and Inference with Syntactic Translation Models Machine - PowerPoint PPT Presentation

Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Natural Language Processing Syntactic Models Machine Translation III Dan Klein UC Berkeley 1

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Natural Language Processing Machine Translation III Dan Klein UC Berkeley 1 Syntactic Models

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

EUROPE- DESCENT into the DARK AGES A. Falcone Battle of the Romans & Barbarians The

Liquefaction phenomena, Stress strain behavior of sands, Evaluation of liquefaction

Do we need better synchrotrons we need better synchrotrons? ? Do A. Magerl Kristallographie

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Anbang Yao

Care Plans Best Practices for Development and Implementation Tuesday January 8, 2013 8:00am

The Self-Sufficiency Standard for Colorado Presented by Diana Pearce, Director, Center For

Academic Health Department Learning Community Meeting February 3, 2015 Housekeeping Items

Combinatorial semigroups and induced/deduced operators G. Stacey Staples Department of

Decoding and Inference with Syntactic Translation Models Machine - PowerPoint PPT Presentation

Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta

Decoding Philipp Koehn 17 September 2020 Philipp Koehn Machine Translation: Decoding 17

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

4CSLL5 IBM Translation Models Martin Emms October 22, 2020 4CSLL5 IBM Translation Models IBM

Natural Language Processing Syntactic Models Machine Translation III Dan Klein UC Berkeley 1

4CSLL5 IBM Translation Models IBM models Probabilities and Translation Alignments Martin Emms

Syntax-Based Decoding Philipp Koehn 9 November 2017 Philipp Koehn Machine Translation:

Syntax-Based Decoding 2 Philipp Koehn 14 November 2017 Philipp Koehn Machine Translation:

Neural Machine Translation Decoding Philipp Koehn 8 October 2020 Philipp Koehn Machine

By et al Siegfried Engelmann Decoding Strategies: Decoding B1- Teacher's Presentation Book

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Natural Language Processing Machine Translation III Dan Klein UC Berkeley 1 Syntactic Models

Observation Decoding with Sensor Models: Recognition Tasks via Classical Planning Diego Aineto,

Decoding in SMT Nitin Madnani February 8, 2006 The Decoding Problem Search Inputs:

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Community Translation By Willem Stoeller Examples Community Translation Virtual Teams Powering

List Decoding of Algebraic Codes Peter Beelen, Kristian Brander and Johan S.R. Nielsen DTU

EUROPE- DESCENT into the DARK AGES A. Falcone Battle of the Romans &amp; Barbarians The

Liquefaction phenomena, Stress strain behavior of sands, Evaluation of liquefaction

Do we need better synchrotrons we need better synchrotrons? ? Do A. Magerl Kristallographie

PSConv: Squeezing Feature Pyramid into One Compact Poly-Scale Convolutional Layer Anbang Yao

Care Plans Best Practices for Development and Implementation Tuesday January 8, 2013 8:00am

The Self-Sufficiency Standard for Colorado Presented by Diana Pearce, Director, Center For

Academic Health Department Learning Community Meeting February 3, 2015 Housekeeping Items

Combinatorial semigroups and induced/deduced operators G. Stacey Staples Department of

EUROPE- DESCENT into the DARK AGES A. Falcone Battle of the Romans & Barbarians The