Compositionality in Recursive Neural Networks Martha Lewis ILLC University of Amsterdam SYCO3, March 2019 Oxford, UK M. Lewis Compositionality in TreeRNNs 1/25
Outline Compositional distributional semantics Pregroup grammars and how to map to vector spaces Recursive neural networks (TreeRNNs) Mapping pregroup grammars to TreeRNNs Implications M. Lewis Compositionality in TreeRNNs 2/25
Compositional Distributional Semantics The meaning of a complex expression is determined by the meanings of its parts and the rules used for combining them. Frege’s principle of compositionality M. Lewis Compositionality in TreeRNNs 3/25
Compositional Distributional Semantics Distributional hypothesis The meaning of a complex expression is determined by the Words that occur in similar contexts have similar meanings meanings of its parts and the rules used for combining them. [Harris, 1958] . Frege’s principle of compositionality M. Lewis Compositionality in TreeRNNs 3/25
Symbolic Structure A pregroup algebra is a partially ordered monoid, where each element p has a left and a right adjoint such that: p · p r ≤ 1 ≤ p r · p p l · p ≤ 1 ≤ p · p l Elements of the pregroup are basic (atomic) grammatical types, e.g. B = { n , s } . Atomic grammatical types can be combined to form types of higher order (e.g. n · n l or n r · s · n l ) A sentence w 1 w 2 . . . w n (with word w i to be of type t i ) is grammatical whenever: t 1 · t 2 · . . . · t n ≤ s M. Lewis Compositionality in TreeRNNs 4/25
Pregroup derivation: example p · p r ≤ 1 ≤ p r · p p l · p ≤ 1 ≤ p · p l S trembling shadows play hide-and-seek n r s n l n l n n n NP VP Adj N V N trembling shadows play hide-and-seek n · n l · n · n r · s · n l · n n · 1 · n r · s · 1 ≤ n · n r · s = ≤ 1 · s = s M. Lewis Compositionality in TreeRNNs 5/25
Distributional Semantics Words are represented as vectors Entries of the vector represent how often the target word co-occurs with the context word cuddly iguana Wilbur cuddly 1 smelly 10 scaly smelly 15 teeth 7 cute scaly 2 iguana Similarity is given by cosine distance: � v , w � sim ( v , w ) = cos( θ v , w ) = || v |||| w || M. Lewis Compositionality in TreeRNNs 6/25
The role of compositionality Compositional distributional models We can produce a sentence vector by composing the vectors of the words in that sentence. − → s = f ( − w 1 , − → w 2 , . . . , − → → w n ) Three generic classes of CDMs: Vector mixture models [ Mitchell and Lapata (2010)] Tensor-based models [ Coecke, Sadrzadeh, Clark (2010); Baroni and Zamparelli (2010)] Neural models [ Socher et al. (2012); Kalchbrenner et al. (2014)] M. Lewis Compositionality in TreeRNNs 7/25
A multi-linear model The grammatical type of a word defines the vector space in which the word lives: Nouns are vectors in N ; adjectives are linear maps N → N , i.e elements in N ⊗ N ; intransitive verbs are linear maps N → S , i.e. elements in N ⊗ S ; transitive verbs are bi-linear maps N ⊗ N → S , i.e. elements of N ⊗ S ⊗ N ; The composition operation is tensor contraction, i.e. elimination of matching dimensions by application of inner product. Coecke, Sadrzadeh, Clarke 2010 M. Lewis Compositionality in TreeRNNs 8/25
Diagrammatic calculus: Summary A f A V V W V W Z B morphisms tensors A r A A = A r A A A r A ǫ -map η -map ( ǫ r A ⊗ 1 A ) ◦ (1 A ⊗ η r A ) = 1 A M. Lewis Compositionality in TreeRNNs 9/25
Diagrammatic calculus: example S N r S N l N N l F ( ) = N N N VP Adj N V N trembling shadows play hide-and-seek F ( α )( trembling ⊗ − shadows ⊗ play ⊗ − − − − − → − − − − − − − − → hide-and-seek ) trembling shadows play hide-and-seek → − � w i �→ i N r N N l N S N l N F ( α ) �→ M. Lewis Compositionality in TreeRNNs 10/25
Recursive Neural Networks p 2 = g ( − − − − → → − Clowns , − → p 1 ) p 1 = g ( − tell , − → − → → − jokes ) − − − − → Clowns − → − − → tell jokes � − → � �� v 1 g RNN : R n × R n → R n :: ( − v 1 , − → → v 2 ) �→ f 1 M · − → v 2 g RNTN : R n × R n → R n :: ( − → v 1 , − → v 2 ) �→ g RNN ( − → v 1 , − → � − → v 1 ⊤ · T · − → � v 2 )+ f 2 v 2 M. Lewis Compositionality in TreeRNNs 11/25
How compositional is this? Successful Some element of grammatical structure The compositionality function has to do everything Does that help us understand what’s going on? M. Lewis Compositionality in TreeRNNs 12/25
Information-routing words − − − − → Clowns − − → who − → − − → tell jokes M. Lewis Compositionality in TreeRNNs 13/25
Information-routing words − − → John − − − − − − → − − − − → introduces himself M. Lewis Compositionality in TreeRNNs 13/25
Can we map pregroup grammar onto TreeRNNs? − − → − − − − → − → jokes Clowns tell p 1 = g ( − tell , − → − → − → jokes ) p 2 = g ( − − − − → − → Clowns , − → p 1 ) M. Lewis Compositionality in TreeRNNs 14/25
Can we map pregroup grammar onto TreeRNNs? jokes Clowns tell g LinTen cross , − − − → p 1 = g LinTen ( − → − − − → roads ) g LinTen p 2 = g LinTen ( − − − − → → − Clowns , − → p 1 ) M. Lewis Compositionality in TreeRNNs 15/25
Can we map pregroup grammar onto TreeRNNs? tell g LinTen jokes Clowns g LinTen M. Lewis Compositionality in TreeRNNs 16/25
Why? Opens up more possibilities to use tools from formal semantics in computational linguistics. We can immediately see possibilities for building alternative networks - perhaps different compositionality functions for different parts of speech Decomposing the tensors for functional words into repeated applications of a compositionality function gives options for learning representations. M. Lewis Compositionality in TreeRNNs 17/25
Why? who : n r ns l s dragons breathe fire dragons breathe fire who = M. Lewis Compositionality in TreeRNNs 18/25
Why? himself : ns r n rr n r s John loves himself John loves = M. Lewis Compositionality in TreeRNNs 19/25
Experiments? Not yet. But there are a number of avenues for exploration Examining performance of this kind of model with standard categorical compositional distributional models Different compositionality functions for different word types Testing the performance of TreeRNNs with formally analyzed information-routing words. Investigating the effects of switching between word types. Investigating meanings of logical words and quantifiers. Extending the analysis to other types of recurrent neural network such as long short-term memory networks or gated recurrent units. M. Lewis Compositionality in TreeRNNs 20/25
Summary We have shown how to interpret a simplification of recursive neural networks within a formal semantics framework We can then analyze ‘information routing’ words such as pronouns as specific functions rather than as vectors This also provides a simplification of tensor-based vector composition architectures, reducing the number of high order tensors to be learnt, and making representations more flexible and reusable. Plenty of work to do on both the experimental and the theoretical side! M. Lewis Compositionality in TreeRNNs 21/25
Thanks! NWO Veni grant ‘Metaphorical Meanings for Artificial Agents’ M. Lewis Compositionality in TreeRNNs 22/25
Category-Theoretic Background The category of pregroups Preg and the category of finite dimensional vector spaces FdVect are both compact closed This means that they share a structure, namely: Both have a tensor product ⊗ with a unit 1 Both have adjoints A r , A l Both have special morphisms ǫ r : A ⊗ A r → 1 , ǫ l : A l ⊗ A → 1 η r : 1 → A r ⊗ A , η l : 1 → A ⊗ A l These morphisms interact in a certain way. In Preg : p · p r ≤ 1 ≤ p r · p p l · p ≤ 1 ≤ p · p l M. Lewis Compositionality in TreeRNNs 23/25
A functor from syntax to semantics We define a functor F : Preg → FdVect such that: F ( p ) = P ∀ p ∈ B F (1) = R F ( p · q ) = F ( p ) ⊗ F ( q ) F ( p r ) = F ( p l ) = F ( p ) F ( p ≤ q ) = F ( p ) → F ( q ) F ( ǫ r ) = F ( ǫ l ) = inner product in FdVect F ( η r ) = F ( η l ) = identity maps in FdVect [Kartsaklis, Sadrzadeh, Pulman and Coecke, 2016] M. Lewis Compositionality in TreeRNNs 24/25
References I M. Lewis Compositionality in TreeRNNs 25/25
Recommend
More recommend