kai wei chang
play

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek - PowerPoint PPT Presentation

Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text Support useful inferential tasks


  1. Kai-Wei Chang Joint work with Scott Wen-tau Yih, Chris Meek Microsoft Research

  2. Build an intelligent system that can interact with human using natural language Research challenge Meaning representation of text Support useful inferential tasks Semantic word representation is the foundation Language is compositional Word is the basic semantic unit

  3. A lot of popular methods for creating word vectors! Vector Space Model [Salton & McGill 83] Latent Semantic Analysis [Deerwester+ 90] Latent Dirichlet Allocation [Blei+ 01] Deep Neural Networks [Collobert & Weston 08] Encode term co-occurrence information Measure semantic similarity well

  4. sunny rainy cloudy windy car emotion cab sad wheel joy feeling

  5. Tomorrow will be Tomorrow rainy. will be sunny. π‘‘π‘—π‘›π‘—π‘šπ‘π‘ ( rainy, sunny ) ? π‘π‘œπ‘’π‘π‘œπ‘§π‘›( rainy, sunny ) ?

  6. Can’t we just use the existing linguistic resources? Knowledge in these resources is never complete Often lack of degree of relations Create a continuous semantic representation that Leverages existing rich linguistic resources Discovers new relations Enables us to measure the degree of multiple relations (not just similarity)

  7. Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments

  8. Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments

  9. Data representation Encode single-relational data in a matrix Co-occurrence (e.g., from a general corpus) Synonyms (e.g., from a thesaurus) Factorization Apply SVD to the matrix to find latent components Measuring degree of relation Cosine of latent vectors

  10. Input: Synonyms from a thesaurus Joyfulness: joy, gladden Sad: sorrow, sadden Target word: row- Term: column- vector vector joy gladden sorrow sadden goodwill Group 1: 1 1 0 0 0 β€œjoyfulness” Group 2: β€œsad” 0 0 1 1 0 Group 3: β€œaffection” 0 0 0 0 1 Cosine Score

  11. terms 𝚻 𝐖 3 β‰ˆ 𝐗 𝐕 𝑙×𝑙 π‘™Γ—π‘œ π‘’Γ—π‘œ 𝑒×𝑙 SVD generalizes the original data Uncovers relationships not explicit in the thesaurus Term vectors projected to 𝑙 -dim latent space Word similarity: cosine of two column vectors in πš»π– 0

  12. LSA cannot distinguish antonyms [Landauer 2002] β€œDistinguishing synonyms and antonyms is still perceived as a difficult open problem.” [Poon & Domingos 09]

  13. Data representation Encode two opposite relations in a matrix using β€œpolarity” Synonyms & antonyms (e.g., from a thesaurus) Factorization Apply SVD to the matrix to find latent components Measuring degree of relation Cosine of latent vectors

  14. Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Target word: row- vector joy gladden sorrow sadden goodwill Group 1: 1 1 -1 -1 0 β€œjoyfulness” Group 2: β€œsad” -1 -1 1 1 0 Group 3: β€œaffection” 0 0 0 0 1 Cosine Score: + π‘‡π‘§π‘œπ‘π‘œπ‘§π‘›π‘‘

  15. Joyfulness: joy, gladden; sorrow, sadden Sad: sorrow, sadden; joy, gladden Inducing polarity Target word: row- vector joy gladden sorrow sadden goodwill Group 1: 1 1 -1 -1 0 β€œjoyfulness” Group 2: β€œsad” -1 -1 1 1 0 Group 3: β€œaffection” 0 0 0 0 1 Cosine Score: βˆ’ π΅π‘œπ‘’π‘π‘œπ‘§π‘›π‘‘

  16. Limitation of the matrix representation Each entry captures a particular type of relation Encode multiple relations between two entities, or in a 3-way tensor (3- Two opposite relations with the polarity trick dim array)! Encoding other binary relations Is-A (hyponym) – ostrich is a bird Part-whole – engine is a part of car

  17. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  18. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  19. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  20. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  21. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  22. Represent word relations using a tensor Each slice encodes a relation between terms and target words. joyfulness 0 0 0 0 joyfulness 1 1 0 0 gladden 0 0 1 0 gladden 1 1 0 0 sad 1 0 0 0 sad 0 0 1 0 anger 0 0 0 0 anger 0 0 0 0 Construct a tensor with two slices Antonym layer Synonym layer

  23. Can encode multiple relations in the tensor 1 1 0 0 1 1 0 0 joyfulness 0 0 0 1 1 1 0 0 1 1 0 0 gladden 0 0 0 0 0 0 1 0 0 0 1 0 sad 0 0 0 1 0 0 0 0 0 0 0 0 anger 0 0 0 1 Hyponym layer

  24. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  25. Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results 𝑒 = , 𝑒 ? , … , 𝑒 B 𝑠 𝑠 π‘₯ = , π‘₯ ? , … , π‘₯ A 𝑒 = , 𝑒 ? , … , 𝑒 B π‘₯ = , π‘₯ ? , … , π‘₯ A ~ Γ— ~ 𝑠 Γ— 𝑠 latent representation of words

  26. Derive a low-rank approximation to generalize the data and to discover unseen relations Apply Tucker decomposition and reformulate the results latent representation of a relation 𝑒 = , 𝑒 ? , … , 𝑒 B 𝑠 𝑠 𝑠 π‘₯ = , π‘₯ ? , … , π‘₯ A 𝑒 = , 𝑒 ? , … , 𝑒 B ~ ~ Γ— Γ— ~ ~ 𝑠 𝑠 Γ— Γ— 𝑠 𝑠 latent representation of words

  27. Data representation Encode multiple relations in a tensor Synonyms, antonyms, hyponyms (is-a), … (e.g., from a linguistic knowledge base) Factorization Apply tensor decomposition to the tensor to find latent components Measuring degree of relation Cosine of latent vectors after projection

  28. Similarity Cosine of the latent vectors Other relation (both symmetric and asymmetric) Take the latent matrix of the pivot relation (synonym) Take the latent matrix of the relation Cosine of the latent vectors after projection

  29. π‘π‘œπ‘’ joy , sadden = cos 𝓧 :, joy ,IJB , 𝓧 :, sadden ,KBL joyfulness 0 0 0 0 joyfulness 1 1 0 0 gladden 0 0 1 0 gladden 1 1 0 0 sad 1 0 0 0 sad 0 0 1 0 anger 0 0 0 0 anger 0 0 0 0 Antonym layer Synonym layer

  30. π‘π‘œπ‘’ joy , sadden = cos 𝓧 :, joy ,IJB , 𝓧 :, sadden ,KBL joyfulness 0 0 0 0 joyfulness 1 1 0 0 gladden 0 0 1 0 gladden 1 1 0 0 sad 1 0 0 0 sad 0 0 1 0 anger 0 0 0 0 anger 0 0 0 0 Antonym layer Synonym layer

  31. πΌπ‘§π‘žπ‘“π‘  joy , feeling = cos 𝑿 :, joy ,IJB , 𝑿 :, feeling ,QJRST joyfulness 0 0 0 1 joyfulness 1 0 0 0 gladden 0 0 0 0 gladden 1 1 0 0 sad 0 0 0 1 sad 0 0 1 0 anger 0 0 0 1 anger 0 0 0 0 Hypernym layer Synonym layer

  32. π‘ π‘“π‘š w U , w V = cos 𝑋 : , w X ,IJB , 𝑋 : , w Y ,TSZ Synonym w V layer w U The slice of the specific relation

  33. 3 , 𝑻 :,:,TSZ 𝐖 3 π‘ π‘“π‘š w U , w V = cos 𝑻 :,:,IJB 𝐖 U,: V,: Γ— Γ— , ) Cos ( w V 𝑠 w U 𝑀 = , 𝑀 ? , … , 𝑀 B ~ ~ Γ— ~ 𝑠 Γ— ~ 𝑠 R TSZ v U v V R IJB

  34. Introduction Background Latent Semantic Analysis (LSA) Polarity Inducing LSA (PILSA) Multi-Relational Latent Semantic Analysis (MRLSA) Encoding multi-relational data in a tensor Tensor decomposition & measuring degree of a relation Experiments

  35. Encarta Thesaurus Record synonyms and antonyms of target words V ocabulary of 50k terms and 47k target words WordNet Has synonym, antonym, hyponym, hypernym relations V ocabulary of 149k terms and 117k target words Goals: MRLSA generalizes LSA to model multiple relations

Recommend


More recommend