adaptive multi compositionality for recursive neural
play

Adaptive Multi-Compositionality for Recursive Neural Models with - PowerPoint PPT Presentation

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis July 31, 2014 Semantic Composition Principle of Compositionality The meaning of a complex expression is determined by the meanings of its


  1. Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis July 31, 2014

  2. Semantic Composition ▪ Principle of Compositionality ▪ The meaning of a complex expression is determined by the meanings of its constituent expressions and the rules used to combine them ▪ Compositional nature of natural language ▪ Go beyond words towards sentences ▪ Examples ▪ red car -> red + car ▪ not very good -> not + ( very + good ) ▪ eat food -> eat + food ▪ …

  3. Recursive Neural Models (RNMs) ▪ Utilize the recursive structures of sentences to obtain the semantic representations ▪ The vector representations are used as features and fed into a softmax classifier to predict their labels ▪ Learn to recursively perform semantic compositions in vector space ▪ One family of the popular deep learning models Negative Softmax not very good very good very good not

  4. Semantic Composition with Matrix/Tensor ▪ The main difference among the recursive neural models (RNMs) lies in 𝒘 semantic composition methods 𝒘 𝑚 𝒘 𝑠 intersection + + + + + + + + + ⋯ + 𝑤 = 𝑔 𝑤 = 𝑔 + + + + + + + + + ⋯ + 𝑈 𝒘 𝑚 𝑼 [1:𝐸] 𝒘 𝑚 + 𝑿 𝒘 𝑚 𝒘 = 𝑔 𝑿 𝒘 𝑚 𝒘 = 𝑔 + 𝒄 + 𝒄 𝒘 𝑠 𝒘 𝑠 𝒘 𝑠 𝒘 𝑠 RNN (Socher et al. RNTN (Yu et al. 2013, Socher et al. 2011) 2013) Problem : RNN and RNTN employ the same global composition function for all pair of input vectors

  5. Motivation of This Work ▪ Use different composition functions for different types of compositions ▪ Negation: not good, not bad ▪ Intensification: very good, pretty bad ▪ Contrast: the movie is good, but I love it ▪ Sentiment word + target/aspect: good movie, low price ▪ … ▪ Model the composition as a distribution over multiple composition functions, and adaptively select them

  6. One Global Composition Function Adaptive Multi-Compositionality

  7. Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors 𝐷 Output vector 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions Softmax Classifier g 1 g 2 g 3 g 4 Input vectors

  8. Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors 𝐷 Output vector 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions The ℎ -th composition function (Both the matrices and tensors Softmax can be used) Classifier g 1 g 2 g 3 g 4 Input vectors

  9. Adaptive Compositionality ▪ Use more than one composition functions and adaptively select them depending on the input vectors Output vector 𝐷 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 ℎ=1 Distribution of composition functions 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝜸𝑻 𝒘 𝑚 ⋮ Softmax 𝒘 𝑠 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Classifier g 1 g 2 g 3 g 4 The Boltzmann distribution is used to adaptively select 𝑕 ℎ . Input vectors 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑻 𝒘 𝑚 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = 1 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = ቊ1, 𝑛𝑏𝑦𝑛𝑣𝑛 𝑡𝑑𝑝𝑠𝑓 ⋮ 𝒘 𝑠 𝐷 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Avg-AdaMC Weighted-AdaMC Max-AdaMC

  10. Objective Function ▪ Minimize the cross-entropy error ▪ Target vector 𝒖 𝑘 = [0 … 1 … 0] ▪ Predicted distribution 𝒛 𝑘 = [0.07 … 0.69 … 0.15] 𝑗 log 𝑧 𝑘 𝑗 + ෍ 2 min Θ 𝐹 Θ = − ෍ ෍ 𝑢 𝑘 𝜇 𝜄 𝜄 2 𝑗 𝑘 𝜄∈Θ ▪ AdaGrad (Duchi, Hazan, and Singer 2011) 𝜄 𝑢 = 𝜄 𝑢−1 − 𝜃 1 𝜖𝐹 ቤ 𝜖𝜄 𝜄=𝜄 𝑢−1 𝐻 𝑢 2 𝜖𝐹 𝐻 𝑢 = 𝐻 𝑢−1 + ቤ 𝜖𝜄 𝜄=𝜄 𝑢−1

  11. Parameter Estimation 𝑗 − 𝒖 𝑛 𝑗 ) , 𝑠 = 𝑗 𝑗 ෍ 𝒛 𝑛 𝑽 𝑛𝑙 𝑔′(𝒃 𝑛 ▪ Back-propagation algorithm: 𝑗←𝑠 = 𝑙 𝜺 𝑛 𝑞𝑏𝑠(𝑗) 𝑞𝑏𝑠(𝑗)←𝑠 𝜖𝒃 𝑙 𝑗 ) , 𝑠 ∈ 𝑏𝑜𝑑(𝑗) ෍ 𝜺 𝑛 𝑔′(𝒃 𝑛 𝑗 𝜖𝒘 𝑛 𝑙 ▪ Classification: 𝜖𝐹 𝑗 (𝒛 𝑛 𝑗 − 𝒖 𝑛 𝑗 )] 𝜖𝑽 𝑛𝑜 = σ 𝑗 [𝒘 𝑜 𝑗←𝑠 ෍ 𝑗 𝛾𝑄 𝑕 ℎ |𝒘 𝑚 𝑗,𝑕 ℎ 𝒚 𝑜 𝑗 , 𝒘 𝑠 𝑗 𝑗 , 𝒘 𝑠 𝑗 ෍ ෍ ෍ 𝜺 𝑙 𝒃 𝑙 𝑄 𝑕 ℎ |𝒘 𝑚 − 1 , ℎ = 𝑛 𝜖𝐹 ▪ Composition selection: 𝑗 𝑠∈𝑐𝑞(𝑗) 𝑙 ℎ = 𝜖𝑻 𝑛𝑜 𝑗←𝑠 ෍ 𝑗,𝑕 ℎ 𝒚 𝑜 𝑗 𝛾𝑄 𝑕 ℎ |𝒘 𝑚 𝑗 𝑄 𝑕 𝑛 |𝒘 𝑚 𝑗 , 𝒘 𝑠 𝑗 , 𝒘 𝑠 𝑗 ෍ ෍ ෍ 𝜺 𝑙 𝒃 𝑙 , ℎ ≠ 𝑛 𝑗 𝑠∈𝑐𝑞(𝑗) 𝑙 ℎ 𝜖𝐹 𝑗 𝑄 𝑕 ℎ |𝒘 𝑚 ▪ Linear composition: 𝑗 , 𝒘 𝑠 𝑗←𝑠 𝒚 𝑜 𝑗 𝜖𝑿 𝑛𝑜 = σ 𝑗 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑛 𝜖𝐹 𝑗 𝒚 𝑜 𝑗 𝑄 𝑕 ℎ |𝒘 𝑚 ▪ Tensor Composition: 𝑗←𝑠 𝒚 m 𝑗 , 𝒘 𝑠 𝑗 [𝑒] = σ 𝑗 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑒 𝜖𝑾 ℎ𝑛𝑜 𝜖𝐹 ▪ Word Embedding: 𝑗←𝑠 𝑥 = σ 𝑗 =𝑥 σ 𝑠∈𝑐𝑞(𝑗) 𝜺 𝑒 𝜖𝑴 𝑒

  12. Stanford Sentiment Treebank ▪ 10,662 critic reviews in Rotten Tomatoes ▪ 215,154 phrases from results of Stanford Parser ▪ The workers in Amazon Mechanical Turk annotate polarity levels for all these phrases ▪ The sentiment scales are merged to five categories (very negative, negative, neutral, positive, very positive)

  13. Results of evaluation on the Sentiment Treebank. The top three methods are in bold. Our methods achieve best performances when \beta is set to 2.

  14. 𝐷 𝒘 = 𝑔 ෍ 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 𝑕 ℎ 𝒘 𝑚 , 𝒘 𝑠 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑻 𝒘 𝑚 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = 1 ℎ=1 𝑄 𝑕 ℎ |𝒘 𝑚 , 𝒘 𝑠 = ቊ 1, 𝑛𝑏𝑦 𝑡𝑑𝑝𝑠𝑓 ⋮ 𝒘 𝑠 𝐷 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝜸𝑻 𝒘 𝑚 ⋮ Avg-AdaMC Weighted-AdaMC Max-AdaMC 𝒘 𝑠 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠

  15. Vector Representations Word/Phrase Neighboring Words/Phrases in the Vector Space good cool, fantasy, classic, watchable, attractive boring dull, bad, disappointing, horrible, annoying ingenious extraordinary, inspirational, imaginative, thoughtful, creative soundtrack execution, animation, cast, colors, scene good actors good ideas, good acting, good looks, good sense, great cast thought-provoking film beautiful film, engaging film, lovely film, remarkable film, riveting story painfully bad how bad, too bad, really bad, so bad, very bad not a good movie isn’t much fun, isn’t very funny, nothing new , isn’t as funny of clichés

  16. Positive fancy, good, cool, Objective plot, near, promising, buy, surface, interested them, version Very negative failure, worst, disaster, horrible creative, great, problem, perfect, slow, sick, Negative superb, Very positive mess, poor, amazing wrong t-SNE

  17. Composition Pairs in the Composition Space ▪ For the composition pair (𝒘 𝑚 , 𝒘 𝑠 ) , we use the distribution of the composition functions 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 to query its neighboring pairs ⋮ 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 Composition Pair Neighboring Composition Pairs really bad very bad / only dull / much bad / extremely bad / (all that) bad (is n’t ) (is n’t ) (painfully bad) / not mean-spirited / not (too slow) / not (necessarily bad) well-acted / (have otherwise) (been bland) great great (cinematic innovation) / great subject / great performance (Broadway play) / energetic entertainment / great (comedy filmmaker) (arty and) jazzy (Smart and) fun / (verve and) fun / (unique and) entertaining / (gentle and) engrossing / (warmth and) humor

  18. these/this/the * Visualization: Composition Pairs * and for/with * 𝑄 𝑕 1 |𝒘 𝑚 , 𝒘 𝑠 and * adj noun ⋮ 𝑄 𝑕 𝐷 |𝒘 𝑚 , 𝒘 𝑠 (*) Entity Negation Intensification verb * * ‘s a/an/two * of * t-SNE

  19. these/this/the * ▪ Best films * and for/with * ▪ Riveting story and * adj noun ▪ Solid cast (*) ▪ Talented director NE ▪ Gorgeous visuals Negation Intensification verb * * ‘s a/an/two * of *

  20. these/this/the * ▪ Really good * and for/with * ▪ Quite funny and * adj noun ▪ Damn fine (*) ▪ Very good Entity ▪ Particularly funny Negation Intensification verb * * ‘s a/an/two * of *

  21. these/this/the * ▪ Is never dull * and for/with * ▪ Not smart and * adj noun ▪ Not a good movie (*) ▪ Is n’t much fun Entity ▪ Wo n’t be Negation Intensification disappointed verb * * ‘s a/an/two * of *

  22. these/this/the * ▪ Roberto Alagna ▪ Pearl Harbor * and for/with * ▪ Elizabeth Hurley and * ▪ Diane Lane adj noun ▪ Pauly Shore (*) Entity Negation Intensification verb * * ‘s a/an/two * of *

Recommend


More recommend