Compositionality in Semantic Vector Spaces CS224U: Natural Language - PowerPoint PPT Presentation

Compositionality in Semantic Vector Spaces CS224U: Natural Language Understanding Feb. 28, 2012 Richard Socher Joint work with Chris Manning, Andrew Ng Jeffrey Pennington, Eric Huang and Cliff Lin More information and code at www.socher.org 1 Socher, Manning, Ng

Word Vector Space Models Each word is associated with an n-dimensional vector. x 2 1 5 5 4 1.1 4 1 Germany 3 3 9 2 France 2 2 Monday 2.5 Tuesday 9.5 1 1.5 0 1 2 3 4 5 6 7 8 9 10 x 1 the country of my birth the place where I was born But how can we represent the meaning of longer phrases? By mapping them into the same vector space! 2 Socher, Manning, Ng

How should we map phrases into a vector space? Use the principle of compositionality! The meaning (vector) of a sentence is determined by x 2 (1)the meanings of its words and the country of my birth 5 (2)the rules that combine them. the place where I was born 4 Germany 3 France 1 Monday 2 5 Tuesday 1 0 1 2 3 4 5 6 7 8 9 10 x 1 5.5 6.1 Algorithm jointly learns compositional vector 1 2.5 3.5 3.8 representations (and 0.4 2.1 7 4 2.3 tree structure). 0.3 3.3 7 4.5 3.6 the country of my birth 3 Socher, Manning, Ng

Outline Goal: Algorithms that recover and learn semantic vector representations based on recursive structure for multiple language tasks. 1. Introduction s W score p W c 1 c 2 2. Word Vectors and Recursive Neural Networks 3. Recursive Autoencoders for Sentiment Analysis 4. Paraphrase Detection 4 Socher, Manning, Ng

Distributional Word Representations x 2 8 5 In 5 4 1 Germany 3 3 9 2 France 2 2 Monday 2.5 Tuesday 9.5 1 1.5 0 1 2 3 4 5 6 7 8 9 10 x 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 France Monday 5 Socher, Manning, Ng

Algorithms for finding word vector representations There are many well known algorithms that use cooccurrence statistics to compute a distributional representation for words • (Brown et al., 1992; Turney et al., 2003 and many others). • LSA (Landauer & Dumais, 1997). • Latent Dirichlet Allocation (LDA; Blei et al., 2003) Recent development: “Neural Language models.” • Bengio et al., (2003) introduced a language model to predict words given previous words which also learns vector representations. • Collobert & Weston (2008), Maas et al. (2011) from last lecture 6 Socher, Manning, Ng

Distributional Word Representations Recent development: “Neural language models” Collobert & Weston, 2008, Turian et al, 2010 7 Socher, Manning, Ng

Vectorial Sentence Meaning - Step 1: Parsing S VP AdjP NP AdjP 9 5 7 8 9 4 1 3 1 5 1 3 The movie was not really exciting. 8 Socher, Manning, Ng

Vectorial Sentence Meaning - Step 2: Vectors at each node 5 S 4 VP 7 3 8 AdjP 3 5 NP 2 3 AdjP 3 9 5 7 8 9 4 1 3 1 5 1 3 The movie was not really exciting. 9 Socher, Manning, Ng

Recursive Neural Networks for Structure Prediction Basic computational unit: Recursive Neural Network Inputs: two candidate children’s representations Outputs: 1. The semantic representation if the two nodes are merged. 2. Label that carries some information 8 3 about this node 8 label 3 3 3 Neural Network 8 9 4 5 1 3 not really exciting 8 3 5 3 10 Socher, Manning, Ng

Recursive Neural Network Definition 8 label 3 c 1 p = sigmoid ( W + b ) , c 2 Neural where sigmoid: Network 8 3 5 3 gives a distribution over a set of labels: c 1 c 2 11 Socher, Manning, Ng

Recursive Neural Network Definition 8 label Related Work: 3 • Previous RNN work (Goller & Küchler (1996), Costa et al. (2003)) Neural Network • assumed fixed tree structure and used one hot vectors. • No softmax classifiers 8 3 5 3 • Jordan Pollack (1990): Recursive auto- c 1 c 2 associative memories (RAAMs) • Hinton 1990 and Bottou (2011): Related ideas about recursive models. 12 Socher, Manning, Ng

Goal: Predict Pos/Neg Sentiment of Full Sentence 0.3 5 4 7 3 8 3 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 13 Socher, Manning, Ng

Predicting Sentiment with RNNs 0.5 0.5 0.5 0.3 0.5 0.7 9 5 7 8 9 4 1 3 1 5 1 3 The movie was not really exciting. 14 Socher, Manning, Ng

Predicting Sentiment with RNNs c 1 p = sigmoid ( W + b ) c 2 5 3 2 3 0.5 0.9 Neural Neural Network Network 9 5 7 8 9 4 1 3 1 5 1 3 The movie was not really exciting. 15 Socher, Manning, Ng

Predicting Sentiment with RNNs 8 3 0.3 Neural Network 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 16 Socher, Manning, Ng

Predicting Sentiment with RNNs 8 3 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 17 Socher, Manning, Ng

Predicting Sentiment with RNNs 8 3 0.3 Neural Network 7 3 8 3 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 18 Socher, Manning, Ng

Outline Goal: Algorithms that recover and learn semantic vector representations based on recursive structure for multiple language tasks. 1. Introduction s W score p W 2. Word Vectors and Recursive Neural Networks c 1 c 2 3. Recursive Autoencoders for Sentiment Analysis [Socher et al., EMNLP 2011] 4. Paraphrase Detection 19 Socher, Manning, Ng

Sentiment Detection and Bag-of-Words Models • Sentiment detection is crucial to business intelligence, stock trading, … 20 Socher, Manning, Ng

Sentiment Detection and Bag-of-Words Models • Sentiment detection is crucial to business intelligence, stock trading, … • Most methods start with a bag of words + linguistic features/processing/lexica • But such methods (including tf-idf ) can’t distinguish: + white blood cells destroying an infection - an infection destroying white blood cells 21 Socher, Manning, Ng

Single Scale Experiments: Movies Stealing Harvard doesn't care about cleverness, wit or any other kind of intelligent humor. A film of ideas and wry comic mayhem. 22 Socher, Manning, Ng

Recursive Autoencoders • Main Idea: A phrase vector is good, if it keeps as much information as possible about its children. 8 label 3 Neural Network 8 3 5 3 c 1 c 2 23 Socher, Manning, Ng

Recursive Autoencoders • Similar to RNN but with 2 differences: (1) Reconstruction error to keep as much information as possible Reconstruction error Softmax Classifier 8 label 3 W (2) W (label) Neural Network W (1) 8 3 5 3 c 1 c 2 c 1 p = sigmoid ( W + b ) c 2 24 Socher, Manning, Ng

Recursive Autoencoders • Reconstruction error details Reconstruction error Softmax Classifier W (2) W (label) W (1) 25 Socher, Manning, Ng

Recursive Autoencoders • Reconstruction error at every node • Important detail: normalization p 2 =f(W[x 1 ;p 1 ] + b) p 1 =f(W[x 2 ;x 3 ] + b) x 1 x 2 x 3 26 Socher, Manning, Ng

Recursive Autoencoders • Similar to RNN but with 2 differences: (2) Tree structure is determined by reconstruction error: – does not require a parser – get task dependent trees 1 5 0 2 3 0 2 1 5.4 0 3 0.6 2.3 3.1 0.7 Neural Neural Neural Neural Neural Network Network Network Network Network 9 5 7 8 9 4 1 3 1 5 1 3 The movie was not really exciting. 27 Socher, Manning, Ng

Recursive Autoencoders 2 1 0.9 Neural 1 2 3 Network 0 0 5.4 3 3.1 0.7 5 Neural Neural Neural 2 Network Network Network 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 28 Socher, Manning, Ng

Recursive Autoencoders 8 2 3 0.7 1 0.9 Neural Neural 2 Network Network 0 3.1 5 Neural 2 3 Network 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 29 Socher, Manning, Ng

Recursive Autoencoders 5 4 7 3 8 3 5 2 3 3 9 5 5 7 8 9 4 1 3 3 1 5 1 3 The movie was not really exciting. 30 Socher, Manning, Ng

RAE Training • Lower error over entire sentence x and its label t (+ regularization) • Error of a sentence is the error at all nodes in its tree: 31 Socher, Manning, Ng

RAE Training • Error at each node is a weighted combination of reconstruction error and cross-entropy (distribution likelihood) from softmax classifier Reconstruction error Cross-entropy error W (2) W (label) W (1) 32 Socher, Manning, Ng

Details for Training RNNs • Minimizing error by taking gradient steps computed from matrix derivatives • More efficient implementation via the backpropagation algorithm • Since we compute derivatives in a tree structure we can, we call it backpropagation through structure (Goller et al. 1996) 33 Socher, Manning, Ng

Compositionality in Semantic Vector Spaces CS224U: Natural Language - PowerPoint PPT Presentation

Compositionality in Semantic Vector Spaces CS224U: Natural Language Understanding Feb. 28, 2012 Richard Socher Joint work with Chris Manning, Andrew Ng Jeffrey Pennington, Eric Huang and Cliff Lin More information and code at www.socher.org 1

Compositionality in Semantic Spaces Martha Lewis ILLC University of Amsterdam 2nd Symposium on

Recursive Matrix-Vector Spaces COURSE PROJECT OF CS365A SONU AGARWAL VIVEKA KULHARIA Goal

Looking at Word Meaning An interactive visualization of Semantic Vector Spaces for Dutch synsets

RECURSIVE DEEP MODELS FOR SEMANTIC 1 COMPOSITIONALITY Zhicong Lu DGP Lab

Geometric methods in vector spaces Distributional Semantic Models Stefan Evert 1 & Alessandro

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-2. Vector Spaces - Examples and Basic

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-3. Vector Spaces - Linear Independence Le

Determining the Semantic Compositionality of Croatian Multiword Expressions c and Jan Petra

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment

RECURSIVE DEEP MODELS FOR SEMANTIC COMPOSITIONALITY OVER A SENTIMENT TREEBANK Richard Socher,

Appendix A Vectors and Vector Spaces A vector can be defined in many ways: it can be interpreted

The Geometry of Vector Spaces x E N : vector x belongs to an N -dimensional Euclidean space.

Lattice-valued bornological vector spaces and systems Jan Paseka 1 Sergejs Solovjovs 1 k 2 , 3

Beyond Fields: Vector Spaces and Algebras Bernd Schr oder logo1 Bernd Schr oder

Motivation We do not cover all the math Just the common basics (yellow triangle) IVA Modeling

Vector Spaces Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering

Rank and Range Vector Spaces Marco Chiarandini Department of Mathematics & Computer Science

Vector Spaces Linear Independence, Bases and Dimension Marco Chiarandini Department of

Math 221: LINEAR ALGEBRA 6-4. Vector Spaces - Finite Dimensional Spaces Le Chen 1 Emory

Distributional Compositionality Compositionality in DS Raffaella Bernardi University of Trento

ECS231 Mathematics Review I: Linear Algebra Reference: Chap.1 of Solomon 1 / 23 Vector spaces

Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Matrix Calculations: Vector Spaces and Linear Maps H. Geuvers (and A. Kissinger) Institute for

Compositionality in Semantic Vector Spaces CS224U: Natural Language - PowerPoint PPT Presentation

Compositionality in Semantic Vector Spaces CS224U: Natural Language Understanding Feb. 28, 2012 Richard Socher Joint work with Chris Manning, Andrew Ng Jeffrey Pennington, Eric Huang and Cliff Lin More information and code at www.socher.org 1

Compositionality in Semantic Spaces Martha Lewis ILLC University of Amsterdam 2nd Symposium on

Recursive Matrix-Vector Spaces COURSE PROJECT OF CS365A SONU AGARWAL VIVEKA KULHARIA Goal

Looking at Word Meaning An interactive visualization of Semantic Vector Spaces for Dutch synsets

RECURSIVE DEEP MODELS FOR SEMANTIC 1 COMPOSITIONALITY Zhicong Lu DGP Lab

Geometric methods in vector spaces Distributional Semantic Models Stefan Evert 1 &amp; Alessandro

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-2. Vector Spaces - Examples and Basic

Math 221: LINEAR ALGEBRA Chapter 6. Vector Spaces 6-3. Vector Spaces - Linear Independence Le

Determining the Semantic Compositionality of Croatian Multiword Expressions c and Jan Petra

Exploiting multilingual lexical resources to predict the compositionality of MWEs Paul Cook

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment

RECURSIVE DEEP MODELS FOR SEMANTIC COMPOSITIONALITY OVER A SENTIMENT TREEBANK Richard Socher,

Appendix A Vectors and Vector Spaces A vector can be defined in many ways: it can be interpreted

The Geometry of Vector Spaces x E N : vector x belongs to an N -dimensional Euclidean space.

Lattice-valued bornological vector spaces and systems Jan Paseka 1 Sergejs Solovjovs 1 k 2 , 3

Beyond Fields: Vector Spaces and Algebras Bernd Schr oder logo1 Bernd Schr oder

Motivation We do not cover all the math Just the common basics (yellow triangle) IVA Modeling

Vector Spaces Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering

Rank and Range Vector Spaces Marco Chiarandini Department of Mathematics &amp; Computer Science

Vector Spaces Linear Independence, Bases and Dimension Marco Chiarandini Department of

Math 221: LINEAR ALGEBRA 6-4. Vector Spaces - Finite Dimensional Spaces Le Chen 1 Emory

Distributional Compositionality Compositionality in DS Raffaella Bernardi University of Trento

ECS231 Mathematics Review I: Linear Algebra Reference: Chap.1 of Solomon 1 / 23 Vector spaces

Vector spaces DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis

Matrix Calculations: Vector Spaces and Linear Maps H. Geuvers (and A. Kissinger) Institute for

Geometric methods in vector spaces Distributional Semantic Models Stefan Evert 1 & Alessandro

Rank and Range Vector Spaces Marco Chiarandini Department of Mathematics & Computer Science