decoding and inference with syntactic translation models
play

Decoding and Inference with Syntactic Translation Models Machine - PowerPoint PPT Presentation

Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta


  1. Decoding and Inference with Syntactic Translation Models Machine Translation Lecture 15 Instructor: Chris Callison-Burch TAs: Mitchell Stern, Justin Chiu Website: mt-class.org/penn

  2. jon-ga ringo-o tabeta jon-ga ringo-o tabeta ringo-o tabeta CFGs S S → NP VP NP V VP → NP VP V → jon-ga NP V NP → NP → Output:

  3. ringo-o jon-ga tabeta Synchronous CFGs S → NP VP NP V VP → V → NP → NP →

  4. ringo-o jon-ga tabeta Synchronous CFGs : (monotonic) S → NP VP 2 1 : (inverted) NP V VP → 2 1 : ate V → : John NP → : an apple NP →

  5. tabeta jon-ga ringo-o Synchronous generation S S NP VP NP VP V NP John NP V ate an apple ( ) Output: jon-ga ringo-o tabeta : John ate an apple

  6. jon-ga ringo-o tabeta Translation as parsing Parse source Project to target S S VP VP NP V NP NP NP V John ate an apple

  7. A closer look at parsing • Parsing is usually done with dynamic programming • Share common computations and structure • Represent exponential number of alternatives in polynomial space • With SCFGs there are two kinds of ambiguity • source parse ambiguity • translation ambiguity • parse forests can represent both!

  8. A closer look at parsing • Any monolingual parser can be used (most often: CKY or variants on the CKY algorithm) • Parsing complexity is O( |n 3 |) • cubic in the length of the sentence ( n 3 ) • cubic in the number of non-terminals ( |G| 3 ) • adding nonterminal types increases parsing complexity substantially! • With few NTs, exhaustive parsing is tractable

  9. Parsing as deduction Antecedents Side conditions A : u B : v φ C : w Consequent “If A and B are true with weights u and v , and phi is also true, then C is true with weight w .”

  10. Example: CKY Inputs: f = h f 1 , f 2 , . . . , f ` i Context-free grammar in Chomsky G normal form. Item form: A subtree rooted with NT type X [ X, i, j ] spanning i to j has been recognized.

  11. Example: CKY Goal: [ S, 0 , � ] Axioms: ( X → f i ) ∈ G w [ X, i − 1 , i ] : w Inference rules: [ X, i, k ] : u [ Y, k, j ] : v ( Z → XY ) ∈ G w [ Z, i, j ] : u × v × w

  12. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  13. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  14. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  15. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  16. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  17. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  18. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  19. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  20. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  21. duck S → PRP VP VP → V NP VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  22. duck S → PRP VP VP → V NP S,0,4 VP → V SBAR SBAR → PRP V NP → PRP NN VP,1,4 V → saw NN → SBAR 2,4 V → duck NP,2,4 PRP → I PRP → her V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  23. S,0,4 What is this object? VP,1,4 SBAR 2,4 NP,2,4 V,3,4 PRP,0,1 V,1,2 PRP,2,3 NN,3,4 I saw her duck 0 1 2 3 4

  24. Semantics of hypergraphs • Generalization of directed graphs • Special node designated the “goal” • Every edge has a single head and 0 or more tails (the arity of the edge is the number of tails) • Node labels correspond to LHS’s of CFG rules • A derivation is the generalization of the graph concept of path to hypergraphs • Weights multiply along edges in the derivation, and add at nodes (cf. semiring parsing )

  25. Edge labels • Edge labels may be a mix of terminals and substitution sites (non-terminals) • In translation hypergraphs, edges are labeled in both the source and target languages • The number of substitution sites must be equal to the arity of the edge and must be the same in both languages • The two languages may have different orders of the substitution sites • There is no restriction on the number of terminal symbols

  26. la lectura la lectura Edge labels la lectura : reading de : 's 1 2 2 1 X S ayer : yesterday X de : from 1 2 1 2 { , ( ) de ayer : yesterday ’s reading } ( ) de ayer : reading from yesterday

  27. Inference algorithms • Viterbi O ( | E | + | V | ) • Find the maximum weighted derivation • Requires a partial ordering of weights • Inside - outside O ( | E | + | V | ) • Compute the marginal (sum) weight of all derivations passing through each edge/node • k -best derivations O ( | E | + | D max | k log k ) • Enumerate the k -best derivations in the hypergraph • See IWPT paper by Huang and Chiang (2005)

  28. Things to keep in mind Bound on the number of edges: | E | ∈ O ( n 3 | G | 3 ) Bound on the number of nodes: | V | ∈ O ( n 2 | G | )

  29. Decoding Again • Translation hypergraphs are a “lingua franca” for translation search spaces • Note that FST lattices are a special case • Decoding problem: how do I build a translation hypergraph?

  30. Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1

  31. jon-ga ringo-o tabeta Representational limits Consider this very simple SCFG translation model: “Glue” rules: S : S S → 2 1 S : S S → 2 1 “Lexical” rules: : ate S → : S John → : an apple S →

  32. Representational limits • Phrase-based decoding runs in exponential time • All permutations of the source are modeled (traveling salesman problem!) • Typically distortion limits are used to mitigate this • But parsing is polynomial...what’s going on?

  33. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C

  34. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar?

  35. Representational limits Binary SCFGs cannot model this (however, ternary SCFGs can): A B C D B D A C But can’t we binarize any grammar? No . Synchronous CFGs cannot generally be binarized!

  36. Does this matter? • The “forbidden” pattern is observed in real data (Melamed, 2003) • Does this matter? • Learning • Phrasal units and higher rank grammars can account for the pattern • Sentences can be simplified or ignored • Translation • The pattern does exist, but how often must it exist (i.e., is there a good translation that doesn’t violate the SCFG matching property)?

  37. Tree-to-string • How do we generate a hypergraph for a tree-to- string translation model? • Simple linear-time (given a fixed translation model) top-down matching algorithm • Recursively cover “uncovered” sites in tree • Each node in the input tree becomes a node in the translation forest • For details, Huang et al. (AMTA, 2006) and Huang et al. (EMNLP , 2010)

  38. } S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 Tree-to-string grammar tabeta → ate ringo-o → an apple jon-ga → John

  39. jon-ga ringo-o tabeta S VP NP NP V S( x 1 :NP x 2 :VP) → x 1 x 2 VP( x 1 :NP x 2 :V) → x 2 x 1 tabeta → ate ringo-o → an apple jon-ga → John

  40. jon-ga ringo-o tabeta S S VP 1 2 NP NP V NP VP 1 John 2 1 NP V 2 S( x 1 :NP x 2 :VP) → x 1 x 2 an apple VP( x 1 :NP x 2 :V) → x 2 x 1 ate tabeta → ate ringo-o → an apple jon-ga → John

  41. jon-ga ringo-o tabeta S S VP 1 2 NP NP V NP VP 1 John 2 1 NP V 2 S( x 1 :NP x 2 :VP) → x 1 x 2 an apple VP( x 1 :NP x 2 :V) → x 2 x 1 ate tabeta → ate ringo-o → an apple jon-ga → John

Recommend


More recommend