Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu
Last week… • Q: what is understanding meaning? • A: knowing the sense of words in context – Requires word sense inventory – Requires a word sense disambiguation algorithm
Last week… WordNet Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”
Last week… WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }
T oday • Q: what is understanding meaning? • A: knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction
WO WORD S D SIMI MILARIT ARITY
Intuition of Semantic Similarity Semantically close Semantically distant – bank – money – doctor – beer – apple – fruit – painting – January – tree – forest – money – river – bank – river – apple – penguin – pen – paper – nurse – fruit – run – walk – pen – river – mistake – error – clown – tramway – car – wheel – car – algebra
Why are 2 words similar? • Meaning – The two concepts are close in terms of their meaning • World knowledge – The two concepts have similar properties, often occur together, or occur in similar contexts • Psychology – We often think of the two concepts together
Two Types of Relations • Synonymy: two words are (roughly) interchangeable • Semantic similarity (distance): somehow “related” – Sometimes, explicit lexical semantic relationship, often, not
Validity of Semantic Similarity • Is semantic distance a valid linguistic phenomenon? • Experiment (Rubenstein and Goodenough, 1965) – Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs • Results: – Rank correlation between subjects is ~0.9 – People are consistent!
Why do this? • Task: automatically compute semantic similarity between words • Can be useful for many applications: – Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation • Why? Because similarity gives us a way to generalize beyond word identities
Evaluation: Correlation with Humans • Ask automatic method to rank word pairs in order of semantic distance • Compare this ranking with human-created ranking • Measure correlation
Evaluation: Word-Choice Problems Identify that alternative which is closest in meaning to the target: accidental imprison wheedle incarcerate ferment writhe inadvertent meander abominate inhibit
Evaluation: Malapropisms Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…
Word Similarity: Two Approaches • Thesaurus-based – We’ve invested in all these resources… let’s exploit them! • Distributional – Count words in context
TH THESAURUS RUS-BASED BASED SIMI MILARIT ARITY MOD MODELS
Path-Length Similarity • Similarity based on length of path between concepts: sim ( c , c ) log pathlen ( c , c ) path 1 2 1 2 How would you deal with ambiguous words?
Path-Length Similarity Pros and Cons • Advantages – Simple, intuitive – Easy to implement • Major disadvantage: – Assumes each edge has same semantic distance
Resnik Method • Probability that a randomly selected word in a corpus is an instance of concept c : count ( w ) w words ( c ) P ( c ) N – words( c ) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus • Define “information content”: IC ( c ) log P ( c ) • Define similarity: sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2
Resnik Method: Example sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2
Thesaurus Methods: Limitations • Measure is only as good as the resource • Limited in scope – Assumes IS-A relations – Works mostly for nouns • Role of context not accounted for • Not easily domain-adaptable • Resources not available in many languages
Quick Aside: Thesauri Induction • Building thesauri automatically? • Pattern-based techniques work really well! – Co-training between patterns and relations – Useful for augmenting/adapting existing resources
DI DISTR TRIBU IBUTIO TIONAL NAL WOR ORD D SIMI MILARIT ARITY MOD MODELS
Distributional Approaches: Intuition “You shall know a word by the company it keeps!” (Firth, 1957) “ Differences of meaning correlates with differences of distribution” (Harris, 1970) • Intuition: – If two words appear in the same context, then they must be similar • Basic idea: represent a word w as a feature vector w (f , f , f ,... f ) 1 2 3 N
Context Features • Word co-occurrence within a window: • Grammatical relations:
Context Features • Feature values – Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf ) – Association values (next slide)
Association Metric • Commonly-used metric: Pointwise Mutual Information P ( w , f ) associatio n ( w , f ) log PMI 2 P ( w ) P ( f ) • Can be used as a feature value or by itself
Computing Similarity • Semantic similarity boils down to computing some measure on context vectors • Cosine distance: borrowed from information retrieval N v w v w i i i 1 sim ( v , w ) cosine v w N N 2 2 v w i i i 1 i 1
Distributional Approaches: Discussion • No thesauri needed: data driven • Can be applied to any pair of words • Can be adapted to different domains
Distributional Profiles: Example
Distributional Profiles: Example
Problem?
Distributional Profiles of Concepts
Semantic Similarity: “Celebrity” Semantically distant…
Semantic Similarity: “Celestial body” Semantically close!
DI DIME MENS NSION IONALIT ALITY REDU DUCTIO TION Slides based on presentation by Christopher Potts
Why dimensionality reduction? • So far, we’ve defined word representations as rows in F , a m x n matrix – m = vocab size – n = number of context dimensions / features • Problems: n is very large, F is very sparse • Solution: find a low rank approximation of F – Matrix of size m x d where d << n
Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …
Latent Semantic Analysis • Based on Singular Value Decomposition
LSA illustrated: SVD + select top k dimensions
Before & After LSA (k=100)
Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …
Recap: T oday • Q: what is understanding meaning? • A: meaning is knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction
Bonus… • Let’s try our hand at annotating word similarity
Recommend
More recommend