distributional semantics
play

Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - PowerPoint PPT Presentation

Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Last week Q: what is understanding meaning? A: knowing the sense of words in context Requires word sense inventory


  1. Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

  2. Last week… • Q: what is understanding meaning? • A: knowing the sense of words in context – Requires word sense inventory – Requires a word sense disambiguation algorithm

  3. Last week… WordNet Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

  4. Last week… WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

  5. T oday • Q: what is understanding meaning? • A: knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

  6. WO WORD S D SIMI MILARIT ARITY

  7. Intuition of Semantic Similarity Semantically close Semantically distant – bank – money – doctor – beer – apple – fruit – painting – January – tree – forest – money – river – bank – river – apple – penguin – pen – paper – nurse – fruit – run – walk – pen – river – mistake – error – clown – tramway – car – wheel – car – algebra

  8. Why are 2 words similar? • Meaning – The two concepts are close in terms of their meaning • World knowledge – The two concepts have similar properties, often occur together, or occur in similar contexts • Psychology – We often think of the two concepts together

  9. Two Types of Relations • Synonymy: two words are (roughly) interchangeable • Semantic similarity (distance): somehow “related” – Sometimes, explicit lexical semantic relationship, often, not

  10. Validity of Semantic Similarity • Is semantic distance a valid linguistic phenomenon? • Experiment (Rubenstein and Goodenough, 1965) – Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs • Results: – Rank correlation between subjects is ~0.9 – People are consistent!

  11. Why do this? • Task: automatically compute semantic similarity between words • Can be useful for many applications: – Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation • Why? Because similarity gives us a way to generalize beyond word identities

  12. Evaluation: Correlation with Humans • Ask automatic method to rank word pairs in order of semantic distance • Compare this ranking with human-created ranking • Measure correlation

  13. Evaluation: Word-Choice Problems Identify that alternative which is closest in meaning to the target: accidental imprison wheedle incarcerate ferment writhe inadvertent meander abominate inhibit

  14. Evaluation: Malapropisms Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…

  15. Word Similarity: Two Approaches • Thesaurus-based – We’ve invested in all these resources… let’s exploit them! • Distributional – Count words in context

  16. TH THESAURUS RUS-BASED BASED SIMI MILARIT ARITY MOD MODELS

  17. Path-Length Similarity • Similarity based on length of path between concepts:   sim ( c , c ) log pathlen ( c , c ) path 1 2 1 2 How would you deal with ambiguous words?

  18. Path-Length Similarity Pros and Cons • Advantages – Simple, intuitive – Easy to implement • Major disadvantage: – Assumes each edge has same semantic distance

  19. Resnik Method • Probability that a randomly selected word in a corpus is an instance of concept c :   count ( w )  w words ( c ) P ( c ) N – words( c ) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus • Define “information content”:   IC ( c ) log P ( c ) • Define similarity:   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

  20. Resnik Method: Example   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

  21. Thesaurus Methods: Limitations • Measure is only as good as the resource • Limited in scope – Assumes IS-A relations – Works mostly for nouns • Role of context not accounted for • Not easily domain-adaptable • Resources not available in many languages

  22. Quick Aside: Thesauri Induction • Building thesauri automatically? • Pattern-based techniques work really well! – Co-training between patterns and relations – Useful for augmenting/adapting existing resources

  23. DI DISTR TRIBU IBUTIO TIONAL NAL WOR ORD D SIMI MILARIT ARITY MOD MODELS

  24. Distributional Approaches: Intuition “You shall know a word by the company it keeps!” (Firth, 1957) “ Differences of meaning correlates with differences of distribution” (Harris, 1970) • Intuition: – If two words appear in the same context, then they must be similar • Basic idea: represent a word w as a feature vector  w  (f , f , f ,... f ) 1 2 3 N

  25. Context Features • Word co-occurrence within a window: • Grammatical relations:

  26. Context Features • Feature values – Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf ) – Association values (next slide)

  27. Association Metric • Commonly-used metric: Pointwise Mutual Information P ( w , f )  associatio n ( w , f ) log PMI 2 P ( w ) P ( f ) • Can be used as a feature value or by itself

  28. Computing Similarity • Semantic similarity boils down to computing some measure on context vectors • Cosine distance: borrowed from information retrieval    N   v w   v w    i i i 1 sim ( v , w )   cosine   v w N N 2 2 v w  i  i i 1 i 1

  29. Distributional Approaches: Discussion • No thesauri needed: data driven • Can be applied to any pair of words • Can be adapted to different domains

  30. Distributional Profiles: Example

  31. Distributional Profiles: Example

  32. Problem?

  33. Distributional Profiles of Concepts

  34. Semantic Similarity: “Celebrity” Semantically distant…

  35. Semantic Similarity: “Celestial body” Semantically close!

  36. DI DIME MENS NSION IONALIT ALITY REDU DUCTIO TION Slides based on presentation by Christopher Potts

  37. Why dimensionality reduction? • So far, we’ve defined word representations as rows in F , a m x n matrix – m = vocab size – n = number of context dimensions / features • Problems: n is very large, F is very sparse • Solution: find a low rank approximation of F – Matrix of size m x d where d << n

  38. Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

  39. Latent Semantic Analysis • Based on Singular Value Decomposition

  40. LSA illustrated: SVD + select top k dimensions

  41. Before & After LSA (k=100)

  42. Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

  43. Recap: T oday • Q: what is understanding meaning? • A: meaning is knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

  44. Bonus… • Let’s try our hand at annotating word similarity

Recommend


More recommend