Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - PowerPoint PPT Presentation

Inversion Transduction Grammars Wilker Aziz 3/5/17

Word-based Translation Mary did not slap the green witch Mary no dió una bofetada a la bruja verde Every French word is generated by an English word (or null) 2

Generative Story IBM ≥ 3 : Given E Mary did not slap the green witch 3

Generative Story IBM ≥ 3 : Fertility Mary did not slap the green witch Mary did not slap slap slap the green witch 4

Generative Story IBM ≥ 3 : NULL insertion Mary did not slap the green witch Mary did not slap slap slap the green witch NULL 5

Generative Story IBM ≥ 3 : Translation Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja 6

Generative Story IBM ≥ 3 : Distortion Mary did not slap the green witch Mary did not slap slap slap the green witch NULL Mary no dió una bofetada a la verde bruja Mary no dió una bofetada a la bruja verde 7

Discussion • IBM models do not constrain divergence with respect to word order • Distortion step must consider all the m ! permutations of m French words 8

All permutations: sensible or not? If we do not impose structural constraints (yet they do exist) • the model will have to learn (rather implicitly ) how not to violate them • which ought to require more data 9

Practical consequences 10

Practical consequences Estimation • modelling outcomes that even though possible are not plausible (unlikely to be observed) 10

Practical consequences Estimation • modelling outcomes that even though possible are not plausible (unlikely to be observed) Generation • NP-completeness! 10

NP-completeness

NP-completeness NP-complete problem

NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009]

NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009] • Perfect matching [DeNero and Klein, 2008]

NP-completeness NP-complete problem • Generalised TSP [Knight, 1999; Zaslavskiy et al, 2009] • Perfect matching [DeNero and Klein, 2008] • All permutations [Asveld, 2006; 2008]

All permutations Let Σ n = { a 1 , ..., a n } • S ➝ A Σ n • A X ➝ a A X -{ a } for X ⊆ Σ n , # X ≥ 2, a ∈ X • A { a } ➝ a Regular grammar (there is an equivalent FSA) Asveld (2006, 2008) 12

Complexity Note that nonterminals are indexed by subsets of Σ n i.e. power set of Σ • 2 n nonterminals (states) • n ⨉ 2 n productions (transitions) • n ! strings (paths) 13

"IBM constraint" Distortion limit in generation but not in estimation • any reasons why that may be unsatisfactory? 15

Constraining permutations without a distortion limit Inversion Transduction Grammars (ITGs) [Wu, 1995; 1997] • Binarizable permutations • two streams are simultaneously generated • context-free backbone 16

[Wu, 1997] 17

Number of Permutations [Wu, 1997]

ITG 19

ITG English French 19

ITG English French S ➝ X X copy 19

ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy 19

ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert 19

ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce 19

ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce X ➝ e ε delete 19

ITG English French S ➝ X X copy X ➝ X 1 X 2 X 1 X 2 copy X 2 X 1 invert X ➝ e f transduce X ➝ e ε delete X ➝ ε f insert 19

ITG Trees I really miss you I really miss you Sinto tanto sua falta Sinto tanto sua falta

ITG Trees B E I really miss you I really miss you A F Sinto tanto sua falta Sinto tanto sua falta

Model Joint probability model P(T) = P(A, B, E, F) t = h r 1 , . . . , r n i e = yield 1 ( t ) f = yield 2 ( t ) a = alignment( t ) b = bracketing( t ) P ( T = t ) = P ( A = a, B = b, E = e, F = f ) N Y = θ r i i =1 22

Parametrisation 23

Parametrisation Multinomial: one parameter per rule 23

Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone 23

Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap 23

Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair 23

Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair • θ e/ ε one parameter per deleted English word 23

Parametrisation Multinomial: one parameter per rule • θ [] one parameter for monotone • θ <> one parameter for swap • θ e/f one parameter per word pair • θ e/ ε one parameter per deleted English word • θ ε /f one parameter per inserted French word 23

MLE We do not typically construct treebanks of ITG trees • potential counts instead of observed counts h n ( X ! α ) i P ( A,B | F,E ) θ X ! α = P α 0 h n ( X ! α 0 ) i P ( A,B | F,E ) Expectations from parse forests • Inside-Outside [Baker, 1979; Lari and Young, 1990; Goodman, 1999] Typically initialised with IBM1 24

Difficulties Inference: complexity O(l 3 m 3 ) Model: too few reordering parameters Decisions: ambiguity • Disambiguation problem is NP-complete [Sima'an, 1996] X arg max P ( A | F, E ) = arg max P ( A, B | F, E ) A A B ≈ arg max P ( A, B | F, E ) A,B 25

Bibliography • Knight, Kevin. 1999. Decoding complexity in word-replacement translation models. In Computational Linguistics . MIT Press. • Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola. 2009. Phrase-based statistical machine translation as a traveling salesman problem. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1. • DeNero, John and Klein, Dan. 2008. The Complexity of Phrase Alignment Problems. In Proceedings of ACL-08: HLT . • Asveld, Peter R. J. 2006. Generating All Permutations by Context-free Grammars in Chomsky Normal Form. In Theoretical Computer Science . Elsevier Science Publishers Ltd. • Asveld, Peter R. J. 2008. Generating All Permutations by Context-free Grammars in Greibach Normal Form. In Theoretical Computer Science . Elsevier Science Publishers Ltd. • Wu, D. 1995. An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics . ACL.

Bibliography Wu, D. 1997. Stochastic Inversion Transduction Grammars and Bilingual • Parsing of Parallel Corpora. In Computational Linguistics . MIT Press. James K. Baker. 1979. Trainable grammars for speech recognition. In • Proceedings of the Spring Conference of the Acoustical Society of America . Karim Lari and Steve J. Young. 1990. The estimation of stochastic context- • free grammars using the inside--outside algorithm. In Computer Speech and Language . Goodman, Joshua. 1999. Semiring parsing. In Computational Linguistics. • Sima'an, Khalil. 1996. Computational complexity of probabilistic • disambiguation by means of tree-grammars. In Proceedings of the 16th conference on Computational linguistics - Volume 2 .

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - PowerPoint PPT Presentation

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap the green witch Mary no di una bofetada a la bruja verde Every French word is generated by an English word (or null) 2 Generative

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Strengthening the inversion Tactic in Coq Dependent Types Inversion Lemmas Implications Anne

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Quantum effects in signal Quantum effects in signal transduction biology: perspectives

Pumping and population inversion - Laser amplification Gustav Lindgren 2015-02-12 Contents

Short range geoacoustic inversion Short range geoacoustic inversion with a vertical line array

Seismic Modeling, Migration and Velocity Inversion Full Waveform Inversion Bee Bednar Panorama

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Waveform tomography and inversion - Full Waveform Inversion (FWI) Unit 12 Slide #1 Slide #2

Boolean Algebra - Part 2 September 4, 2008 Typeset by Foil T EX Inversion Inversion or

Inversion Sequences and Generating Trees A. Bindi V. Guerrini S. Rinaldi University of Siena

Seismic Modeling, Migration and Velocity Inversion Full Waveform Inversion Bee Bednar Panorama

Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01

Fixing problems with grammars Informatics 2A: Lecture 13 John Longley School of Informatics

Usability and Small Screens SWEN-444 iPhone Android Windows Phone 8 The phrase mobile

Recent Results from Tommaso Lari INFN Milano SUSY searches in ATLAS On behalf of the ATLAS

Locating median paths on connected outerplanar graphs Andrea Scozzari University of Rome La

Inflation with Superstrings? Grant Mathews Univ. Notre Dame arXiv:1701.00577 Gravitation and

Probing fmavor-violating decays of squarks at the LHC Amit Chakraborty Theory Center, KEK

Semantic Analysis of Indonesian Image Description Khumaisa Nuraini 1,3 , Johanes Effendi 1 ,

Probabilistic Context-Free Grammars Based on Foundations of Statistical NLP by C. Manning

Interplay of dark matter and collider physics G. Blanger LAPTH- Annecy Plan Dark matter

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based - PowerPoint PPT Presentation

Inversion Transduction Grammars Wilker Aziz 3/5/17 Word-based Translation Mary did not slap the green witch Mary no di una bofetada a la bruja verde Every French word is generated by an English word (or null) 2 Generative

Grammars and Parsing Grammars and Sentence Structure What makes a good grammar A

Strengthening the inversion Tactic in Coq Dependent Types Inversion Lemmas Implications Anne

Formal Grammars Why Study Grammars? Whats a Grammar? August 24, 2014 Parsing Brian A.

Speech and Language Processing Formal Grammars Chapter 12 Today Formal Grammars

Quantum effects in signal Quantum effects in signal transduction biology: perspectives

Pumping and population inversion - Laser amplification Gustav Lindgren 2015-02-12 Contents

Short range geoacoustic inversion Short range geoacoustic inversion with a vertical line array

Seismic Modeling, Migration and Velocity Inversion Full Waveform Inversion Bee Bednar Panorama

Asteroid orbital inversion using Asteroid orbital inversion using Markov-chain Monte Carlo

Waveform tomography and inversion - Full Waveform Inversion (FWI) Unit 12 Slide #1 Slide #2

Boolean Algebra - Part 2 September 4, 2008 Typeset by Foil T EX Inversion Inversion or

Inversion Sequences and Generating Trees A. Bindi V. Guerrini S. Rinaldi University of Siena

Seismic Modeling, Migration and Velocity Inversion Full Waveform Inversion Bee Bednar Panorama

Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract

CSC 473 Automata, Grammars &amp; Languages 8/15/10 Automata, Grammars and Languages Discourse 01

Fixing problems with grammars Informatics 2A: Lecture 13 John Longley School of Informatics

Usability and Small Screens SWEN-444 iPhone Android Windows Phone 8 The phrase mobile

Recent Results from Tommaso Lari INFN Milano SUSY searches in ATLAS On behalf of the ATLAS

Locating median paths on connected outerplanar graphs Andrea Scozzari University of Rome La

Inflation with Superstrings? Grant Mathews Univ. Notre Dame arXiv:1701.00577 Gravitation and

Probing fmavor-violating decays of squarks at the LHC Amit Chakraborty Theory Center, KEK

Semantic Analysis of Indonesian Image Description Khumaisa Nuraini 1,3 , Johanes Effendi 1 ,

Probabilistic Context-Free Grammars Based on Foundations of Statistical NLP by C. Manning

Interplay of dark matter and collider physics G. Blanger LAPTH- Annecy Plan Dark matter

CSC 473 Automata, Grammars & Languages 8/15/10 Automata, Grammars and Languages Discourse 01