analogies explained
play

Analogies Explained Towards Understanding Word Embeddings Carl - PowerPoint PPT Presentation

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13 2019 School of Informatics, University of Edinburgh The Problem: linking semantics to geometry from: man is to king as woman is to queen


  1. Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13 2019 School of Informatics, University of Edinburgh

  2. The Problem: linking semantics to geometry from: “man is to king as woman is to queen” explain: w king w man w woman w queen or rather: ? P 1 ? P 1 1

  3. The Problem: linking semantics to geometry from: “man is to king as woman is to queen” explain: or rather: ? P 1 ? P 1 1 w king − w man + w woman ≈ w queen

  4. The Problem: linking semantics to geometry from: P 1 ? P 1 ? 1 “man is to king as woman is to queen” explain: or rather: w king − w man + w woman ≈ w queen prince auxiliary sol permitting reign queen royal princess crown lord prince queen w K − w M w K − w M + w W king woman man woman

  5. p w i c j w i c j p w i p c j PMI w i c j W C use sigmoid with negative sampling ( k ) Levy and Goldberg (2014) Word2Vec: SkipGram with Negative Sampling Mikolov et al. (2013a,b) k k W C PMI k . . . . w 1 w 2 w 3 w n target c 1 c 2 c 3 c n context . . 2 • p ( c j | w i ) by softmax expensive words ( E ) words ( E )

  6. p w i c j w i c j p w i p c j PMI w i c j Word2Vec: SkipGram with Negative Sampling Mikolov et al. (2013a,b) k PMI W C k k Levy and Goldberg (2014) sampling ( k ) C W . . . c 2 w 1 w 2 w 3 w n target c 1 c 3 . c n context . . 2 • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative

  7. Word2Vec: SkipGram with Negative Sampling . k PMI W C sampling ( k ) C W . . Mikolov et al. (2013a,b) . . . 2 target w 1 w 2 w 3 context w n c 2 c 1 c 3 c n • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative • Levy and Goldberg (2014) p ( w i , c j ) w ⊤ i c j ≈ log p ( w i ) p ( c j ) − log k = PMI ( w i , c j ) − log k

  8. Word2Vec: SkipGram with Negative Sampling context sampling ( k ) C W . . . Mikolov et al. (2013a,b) . . . c n w n w 1 w 2 w 3 2 target c 1 c 2 c 3 • p ( c j | w i ) by softmax expensive words ( E ) words ( E ) • use sigmoid with negative • Levy and Goldberg (2014) p ( w i , c j ) w ⊤ i c j ≈ log p ( w i ) p ( c j ) − log k = PMI ( w i , c j ) − log k W ⊤ C ≈ PMI − log k

  9. Routemap PMI woman semantic geometric w queen w woman w man w king PMI queen PMI man “man is to king as woman is to queen” PMI king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3

  10. Routemap PMI woman semantic geometric w queen w woman w man w king PMI queen PMI man “man is to king as woman is to queen” PMI king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕

  11. Routemap PMI man semantic geometric w queen w woman w man w king PMI queen PMI woman PMI king “man is to king as woman is to queen” {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕ ⇕

  12. Routemap “man is to king as woman is to queen” semantic geometric w queen w woman w man w king {man, queen} paraphrases {woman, king} woman transforms to queen as man transforms to king 3 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen

  13. Routemap {woman, king} semantic geometric {man, queen} “man is to king as woman is to queen” paraphrases woman transforms to queen as man transforms to king 3 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ w king − w man + w woman ≈ w queen

  14. Routemap {woman, king} semantic geometric {man, queen} “man is to king as woman is to queen” paraphrases woman transforms to queen as man transforms to king 3 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ w king − w man + w woman ≈ w queen

  15. Routemap {woman, king} semantic geometric {man, queen} “man is to king as woman is to queen” paraphrases woman transforms to queen as man transforms to king 4 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ w king − w man + w woman ≈ w queen

  16. Routemap {woman, king} semantic geometric “man is to king as woman is to queen” paraphrases {man, queen} woman transforms to queen as man transforms to king 5 ⇕ ⇕ ⇓ PMI king − PMI man + PMI woman ≈ PMI queen ⇓ PMI i ≈ w ⊤ i C w king − w man + w woman ≈ w queen

  17. n is (element-wise) small: p c j w 6 l , if paraphrase error Inspired by Gittens et al. (2017) c j p c j j w w , paraphrases Definition (D1): w w n w 1 Paraphrase † of W by w ∗ Intuition: word w ∗ ∈E paraphrases word set W = { w 1 , ..., w m }⊆E , if w ∗ and W are semantically interchangeable . p ( E|W ) p ( E| w ∗ ) E

  18. 6 w n j w 1 Paraphrase † of W by w ∗ Intuition: word w ∗ ∈E paraphrases word set W = { w 1 , ..., w m }⊆E , if w ∗ and W are semantically interchangeable . p ( E|W ) p ( E| w ∗ ) E Definition (D1): w ∗ ∈E paraphrases W ⊆E , |W| < l , if paraphrase error ρ W , w ∗ ∈ R n is (element-wise) small: = log p ( c j | w ∗ ) W , w ∗ p ( c j |W ) , c j ∈E ρ † Inspired by Gittens et al. (2017)

  19. PMI w 1 c j PMI w 2 c j p w 1 c j p w 2 c j p w 1 p w 2 p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase error conditional independence error p independence , Lemma 1: For any word w and word set l : PMI w i PMI i w 1 j error c j p PMI w c j p w c j p w p c j c j p p p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ?

  20. p w 1 c j p w 2 c j p w 1 p w 2 p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase error conditional independence error p independence and word set Lemma 1: For any word w , l : PMI w i PMI i w 1 j error c j c j p w c j p w p c j p p p p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j )

  21. p c j w p w 1 c j p w 2 c j p w 1 p w 2 error j conditional independence error p independence Summing PMI vectors of a paraphrase c j and word set , l : PMI w i PMI i w 1 Lemma 1: For any word w error p p p p c j c j p p c j w j paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) p ( w ∗ ) p ( w 1 ) p ( w 2 )

  22. p c j w p w 1 c j p w 2 c j p w 1 p w 2 Summing PMI vectors of a paraphrase conditional independence error p independence error Lemma 1: For any word w and word set , l : PMI w i PMI i w 1 j c j 7 p error paraphrase j w p c j PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W )

  23. Summing PMI vectors of a paraphrase Lemma 1: For any word w j conditional independence error independence error and word set paraphrase , l : PMI w i PMI i w 1 error 7 j PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W ) = log p ( c j | w ∗ ) p ( W| c j ) p ( W ) + log − log p ( c j |W ) p ( w 1 | c j ) p ( w 2 | c j ) p ( w 1 ) p ( w 2 ) � �� � � �� � � �� � ρ W , w ∗ σ W τ W

  24. Summing PMI vectors of a paraphrase j error independence error independence conditional j error paraphrase 7 PMI 1 + PMI 2 ≈ PMI ∗ ? ( ) PMI ( w ∗ , c j ) − PMI ( w 1 , c j ) + PMI ( w 2 , c j ) = log p ( w ∗ | c j ) − log p ( w 1 | c j ) p ( w 2 | c j ) + log p ( W| c j ) p ( W| c j ) + log p ( W ) p ( w ∗ ) p ( w 1 ) p ( w 2 ) p ( W ) = log p ( c j | w ∗ ) p ( W| c j ) p ( W ) + log − log p ( c j |W ) p ( w 1 | c j ) p ( w 2 | c j ) p ( w 1 ) p ( w 2 ) � �� � � �� � � �� � ρ W , w ∗ σ W τ W Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W

  25. 8 p 1 PMI i w i PMI i w i l : , , , Lemma 2: For any word sets p w n w 1 : Replace word w with word set Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W

  26. 8 Lemma 2: For any word sets 1 PMI i w i PMI i w i l : , , , w 1 w n Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W Replace word w ∗ with word set W ∗ ⊆E : p ( E|W ) p ( E|W ∗ ) E

  27. 8 w 1 w n Generalised Paraphrase (of W by W ∗ ) Lemma 1: For any word w ∗ ∈E and word set W ⊆E , |W| < l : ∑ W − τ W , w ∗ + σ PMI ∗ = PMI i + ρ W 1 w i ∈ W Replace word w ∗ with word set W ∗ ⊆E : p ( E|W ) p ( E|W ∗ ) E Lemma 2: For any word sets W , W ∗ ⊆E , |W| , |W ∗ | < l : ∑ ∑ W − σ W − τ W , W∗ + σ W∗ − ( τ PMI i = PMI i + ρ W∗ ) 1 w i ∈ W ∗ w i ∈ W

Recommend


More recommend