inversion transduction grammars
play

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - PowerPoint PPT Presentation

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap the green witch Mary no di una bofetada a la bruja verde Every French word is generated by an English word (or null) 2 Generative


  1. Inversion Transduction Grammars Wilker Aziz 3/5/17

  2. Word-based Translation Mary did not slap the green witch Mary no dió una bofetada a la bruja verde Every French word is generated by an English word (or null) 2

  3. Generative Story IBM ≥ 3 : Given E Mary did not slap the green witch 3

  4. Generative Story IBM ≥ 3 : Fertility Mary did not slap the green witch Mary did not slap slap slap the green witch 4

  5. Generative Story IBM ≥ 3 : NULL insertion Mary did not slap the green witch Mary did not slap slap slap the green witch NULL 5

  6. Generative Story IBM ≥ 3 : Translation Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja 6

  7. Generative Story IBM ≥ 3 : Distortion Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja Mary no dió una bofetada a la bruja verde 7

  8. Discussion • IBM models do not constrain divergence with respect to word order • Distortion step must consider all the m ! permutations of m French words 8

  9. All permutations: sensible or not? If we do not impose structural constraints (yet they do exist) • the model will have to learn (rather implicitly ) how not to violate them • which ought to require more data 9

  10. Practical consequences 10

  11. Practical consequences Estimation • modelling outcomes that even though possible are not plausible (unlikely to be observed) 10

  12. Practical consequences Estimation • modelling outcomes that even though possible are not plausible (unlikely to be observed) Generation • NP-completeness! 10

  13. NP-completeness

  14. NP-completeness NP-complete problem

  15. NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009]

  16. NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009] • Perfect matching [DeNero and Klein, 2008]

  17. NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009] • Perfect matching [DeNero and Klein, 2008] • All permutations [Asveld, 2006; 2008]

  18. All permutations Let Σ n = { a 1 , ..., a n } • S ➝ A Σ n • A X ➝ a A X -{ a } for X ⊆ Σ n , # X ≥ 2, a ∈ X • A { a } ➝ a Regular grammar (there is an equivalent FSA) Asveld (2006, 2008) 12

  19. Complexity Note that nonterminals are indexed by subsets of Σ n i.e. power set of Σ • 2 n nonterminals (states) • n ⨉ 2 n productions (transitions) • n ! strings (paths) 13

  20. Example: 3 elements S ➝ A 123 A 123 ➝ a 1 A 23 | a 2 A 13 | a 3 A 12 A 12 ➝ a 1 A 2 | a 2 A 1 A 13 ➝ a 1 A 3 | a 3 A 1 A 23 ➝ a 2 A 3 | a 3 A 2 A 1 ➝ a 1 A 2 ➝ a 2 A 3 ➝ a 3 14

  21. "IBM constraint" Distortion limit in generation but not in estimation • any reasons why that may be unsatisfactory? 15

  22. Constraining permutations without a distortion limit Inversion Transduction Grammars (ITGs) [Wu, 1995; 1997] • Binarizable permutations • two streams are simultaneously generated • context-free backbone 16

  23. [Wu, 1997] 17

  24. Number of Permutations [Wu, 1997]

  25. ITG 19

  26. ITG English French 19

  27. ITG English French S ➝ X X copy 19

  28. ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy 19

  29. ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert 19

  30. ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce 19

  31. ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce X ➝ e ε delete 19

  32. ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce X ➝ e ε delete X ➝ ε f insert 19

  33. ITG Trees I really miss you I really miss you Sinto tanto sua falta Sinto tanto sua falta

  34. ITG Trees B E I really miss you I really miss you A F Sinto tanto sua falta Sinto tanto sua falta

  35. Model Joint probability model P(T) = P(A, B, E, F) t = h r 1 , . . . , r n i e = yield 1 ( t ) f = yield 2 ( t ) a = alignment( t ) b = bracketing( t ) P ( T = t ) = P ( A = a, B = b, E = e, F = f ) N Y = θ r i i =1 22

  36. Parametrisation 23

  37. Parametrisation Multinomial: one parameter per rule 23

  38. Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone 23

  39. Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap 23

  40. Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair 23

  41. Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair • θ e/ ε one parameter per deleted English word 23

  42. Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair • θ e/ ε one parameter per deleted English word • θ ε /f one parameter per inserted French word 23

  43. MLE We do not typically construct treebanks of ITG trees • potential counts instead of observed counts h n ( X ! α ) i P ( A,B | F,E ) θ X ! α = P α 0 h n ( X ! α 0 ) i P ( A,B | F,E ) Expectations from parse forests • Inside-Outside [Baker, 1979; Lari and Young, 1990; Goodman, 1999] Typically initialised with IBM1 24

  44. Difficulties Inference: complexity O(l 3 m 3 ) Model: too few reordering parameters Decisions: ambiguity • Disambiguation problem is NP-complete [Sima'an, 1996] X arg max P ( A | F, E ) = arg max P ( A, B | F, E ) A A B ≈ arg max P ( A, B | F, E ) A,B 25

  45. Bibliography • Knight, Kevin. 1999. Decoding complexity in word-replacement translation models. In Computational Linguistics . MIT Press. • Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola. 2009. Phrase-based statistical machine translation as a traveling salesman problem. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1. • DeNero, John and Klein, Dan. 2008. The Complexity of Phrase Alignment Problems. In Proceedings of ACL-08: HLT . • Asveld, Peter R. J. 2006. Generating All Permutations by Context-free Grammars in Chomsky Normal Form. In Theoretical Computer Science . Elsevier Science Publishers Ltd. • Asveld, Peter R. J. 2008. Generating All Permutations by Context-free Grammars in Greibach Normal Form. In Theoretical Computer Science . Elsevier Science Publishers Ltd. • Wu, D. 1995. An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics . ACL.

  46. Bibliography Wu, D. 1997. Stochastic Inversion Transduction Grammars and Bilingual • Parsing of Parallel Corpora. In Computational Linguistics . MIT Press. James K. Baker. 1979. Trainable grammars for speech recognition. In • Proceedings of the Spring Conference of the Acoustical Society of America . Karim Lari and Steve J. Young. 1990. The estimation of stochastic context- • free grammars using the inside--outside algorithm. In Computer Speech and Language . Goodman, Joshua. 1999. Semiring parsing. In Computational Linguistics. • Sima'an, Khalil. 1996. Computational complexity of probabilistic • disambiguation by means of tree-grammars. In Proceedings of the 16th conference on Computational linguistics - Volume 2 .

Recommend


More recommend