jointly learning word and phrase embeddings using neural
play

Jointly Learning Word and Phrase Embeddings Using Neural Networks - PowerPoint PPT Presentation

Jointly Learning Word and Phrase Embeddings Using Neural Networks and Implicit Tensor Factorization Kazuma Hashimoto Tsuruoka Laboratory, University of Tokyo 19/06/2015 Talk@UCL Machine Reading Lab. Self Introduction Name Kazuma


  1. Jointly Learning Word and Phrase Embeddings Using Neural Networks and Implicit Tensor Factorization Kazuma Hashimoto Tsuruoka Laboratory, University of Tokyo 19/06/2015 Talk@UCL Machine Reading Lab.

  2. Self Introduction • Name – Kazuma Hashimoto ( 橋本 和真 in Japanese) – http://www.logos.t.u-tokyo.ac.jp/~hassy/ • Belong – Tsuruoka Laboratory, University of Tokyo • April 2015 – present Ph.D. student • April 2013 – March 2015 Master’s student – National Centre for Text Mining (NaCTeM) • Research Interest – Word/phrase/document embeddings and their applications 2 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  3. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 3 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  4. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 4 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  5. Assigning Vectors to Words • Word: String  Index  Vector • Why vectors? – Word similarities can be measured using distance metrics of the vectors (e.g., the cosine similarity) cause trigger cause disorder trigger disease disorder disease mouse animal rat animal mouse rat Embedding words in a vector space 5 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  6. Approaches to Word Representations • Two approaches using large corpora : (systematic comparison of them in Baroni+ (2014) ) – Count-based approach • e.g.) Reducing the dimension of word co- occurrence matrix using SVD – Prediction-based approach • e.g.) Predicting words from their contexts using neural networks • We focus on prediction-based approach – Why? 6 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  7. Learning Word Embeddings • Prediction-based approaches usually – parameterize the word embeddings – learn them based on co-occurrence statistics • Word embeddings appearing in similar contexts get close to each other ------ word prediction using the word embedding ------ text data … the prevalence of drunken driving and accidents caused by drinking … target SkipGram model (Mikolov+, 2013) in word2vec 7 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  8. Task-Oriented Word Embeddings • Learning word embeddings for relation classification – To appear at CoNLL 2015 (just advertising) 8 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  9. Beyond Word Embeddings • Treating phrases and sentences as well as words – gaining much attention recently! make pay payment money make payment pay money make payment pay money Embedding phrases in a vector space 9 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  10. Approaches to Phrase Embeddings • Element-wise addition/multiplication (Lapata+, 2010) – 𝑤 sentnce = 𝑗 𝑤 𝑥 𝑗 • Recursive autoencoders (Socher+, 2011; Hermann+, 2013) – Using parse trees – 𝑤 parent = 𝑔(𝑤 left child , 𝑤 right child ) • Tensor/matrix-based methods – 𝑤 adj noun = 𝑁 adj 𝑤(noun) (Baroni+, 2010) – 𝑁 verb = 𝑗,𝑘 𝑤 subj 𝑗 T 𝑤 obj 𝑘 (Grefenstette+, 2011) • 𝑁 subj, verb, obj = {𝑤 subj T 𝑤 obj } ∗ 𝑁(verb) • 𝑤 subj, verb, obj = 𝑁 verb 𝑤 obj ∗ 𝑤 subj (Kartsaklis+, 2012) 10 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  11. Which Word Embeddings are the Best? • Co-occurrence matrix + SVD • C&W (Collobert+, 2011) • RNNLM (Mikolov+, 2013) • SkipGram/CBOW (Mikolov+, 2013) • vLBL/ivLBL (Mnih+, 2013) • Dependency-based SkipGram (Levy+, 2014) • Glove (Pennington+, 2014) Which word embeddings should we use for which composition methods? Joint leaning 11 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  12. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 12 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  13. Co-Occurrence Statistics of Phrases • Word co-occurrence statistics  word embeddings • How about phrase embeddings? – Phrase co-occurrence statistics! similar meanings? The businessman pays his monthly fee in yen similar contexts The importer made payment in his own domestic currency 13 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  14. How to Identify Phrase-Word Relations? • Using Predicate-Argument Structures (PAS) – Enju parer (Miyao+, 2008) • Analyzes relations between phrases and words NP arguments NP NP VP NP The importer made payment in his own domestic currency verb preposition predicates 14 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  15. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 15 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  16. Why Transitive Verb Phrases? • Meanings of transitive verbs are affected by their arguments – e.g.) run, make, etc.  Good target to test composition models earn make money pay use make payment make use (of) make 16 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  17. Possible Application: Semantic Search • Embedding subject-verb-object tuples in a vector space – Semantic similarities between SVOs can be used! 17 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  18. Training Data from Large Corpora • Focusing on the role of prepositional adjuncts – Prepositional adjuncts complement meanings of verb phrases  should be useful parse ------ ------ simplification English Wikipedia, BNC, etc. How to model the relationships between predicates and arguments? 18 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  19. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 19 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  20. Word Prediction Model (like word2vec) • Predicting words in predicate-argument tuples max(0, 1- s ( currency )+ s ( furniture ))  cost function currency furniture feature vector + for the word prediction prep ⊙ 𝐰 𝑏𝑠𝑕1 + 𝐪 = tanh(𝐢 𝑏𝑠𝑕1 prep prep arg1 pred 𝐢 𝑏𝑠𝑕1 𝐢 𝑞𝑠𝑓𝑒 prep ⊙ 𝐰 𝑞𝑠𝑓𝑒 ) 𝐢 𝑞𝑠𝑓𝑒 𝐰 𝑏𝑠𝑕1 𝐰 𝑞𝑠𝑓𝑒 PAS-CLBLM [importer make payment] in 20 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  21. How to Compute SVO Embeddings? • Two methods: – (a) assigning a vector to each SVO tuple – (b) composing SVO embeddings + - parameterized vectors - composed vectors subj verb obj [importer make payment] [importer make payment] (a) (b) 21 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  22. Today’s Agenda 1. Background – Word and Phrase Embeddings 2. Jointly Learning Word and Phrase Embeddings – General Idea 3. Our Methods Focusing on Transitive Verb Phrases – Word Prediction (EMNLP 2014) – Implicit Tensor Factorization (CVSC 2015) 4. Experiments and Results 5. Summary 22 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  23. Weakness of PAS-CLBLM • Only element-wise vector operations – Pros: Fast training – Cons: Poor interaction between predicates and arguments • Interactions between predicates and arguments are important for transitive verbs earn make money pay use make payment make use (of) make 23 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  24. Focusing on Tensor-Based Approaches • Tensor/matrix-based approaches (Noun: vector) – Adjective: matrix (Baroni+, 2010) – Transitive verb: matrix (Grefenstette+, 2011; Van de Cruys+, 2013) Given Given 𝑒 subject pre-trained 𝑒 ≅ subject verb 𝑒 verb Given 𝑄𝑁𝐽 (importer, make, payment) = 0.31 24 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  25. Implicit Tensor Factorization (1) • Parameterizing – Predicate matrices and – Argument embeddings Given Given argument 2 𝑒 𝑒 ≅ argument 2 predicate 𝑒 predicate Given 25 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  26. Implicit Tensor Factorization (2) • Calculating plausibility scores – Using predicate matrices & argument embeddings i k j 𝑈 ( i, j, k ) = Given Given argument 2 𝑒 𝑒 ≅ argument 2 predicate 𝑒 predicate Given 26 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

  27. Implicit Tensor Factorization (3) • Learning model parameters – Using plausibility judgment task • Observed tuple: ( i, j, k ) • Collapsed tuple: ( i ’ , j, k ), ( i, j’ , k ), ( i, j, k’ ) – Negative sampling (Mikolov+, 2013) Cost function 27 / 39 19/06/2015 Talk@UCL Machine Reading Lab.

Recommend


More recommend