Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - PowerPoint PPT Presentation

Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu

Last week… • Q: what is understanding meaning? • A: knowing the sense of words in context – Requires word sense inventory – Requires a word sense disambiguation algorithm

Last week… WordNet Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt”

Last week… WordNet { c o n v e y a n c e ; t r a n s p o r t } h y p e r o n y m { v e h i c l e } { h i n g e ; f l e x i b l e j o i n t } { b u m p e r } h y p e r o n y m { m o t o r v e h i c l e ; a u t o m o t i v e v e h i c l e } m e r o n y m { c a r d o o r } { d o o r l o c k } m e r o n y m m e r o n y m h y p e r o n y m { c a r w i n d o w } { c a r ; a u t o ; a u t o m o b i l e ; m a c h i n e ; m o t o r c a r } { a r m r e s t } m e r o n y m { c a r m i r r o r } h y p e r o n y m h y p e r o n y m { c r u i s e r ; s q u a d c a r ; p a t r o l c a r ; p o l i c e c a r ; p r o w l c a r } { c a b ; t a x i ; h a c k ; t a x i c a b ; }

T oday • Q: what is understanding meaning? • A: knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

WO WORD S D SIMI MILARIT ARITY

Intuition of Semantic Similarity Semantically close Semantically distant – bank – money – doctor – beer – apple – fruit – painting – January – tree – forest – money – river – bank – river – apple – penguin – pen – paper – nurse – fruit – run – walk – pen – river – mistake – error – clown – tramway – car – wheel – car – algebra

Why are 2 words similar? • Meaning – The two concepts are close in terms of their meaning • World knowledge – The two concepts have similar properties, often occur together, or occur in similar contexts • Psychology – We often think of the two concepts together

Two Types of Relations • Synonymy: two words are (roughly) interchangeable • Semantic similarity (distance): somehow “related” – Sometimes, explicit lexical semantic relationship, often, not

Validity of Semantic Similarity • Is semantic distance a valid linguistic phenomenon? • Experiment (Rubenstein and Goodenough, 1965) – Compiled a list of word pairs – Subjects asked to judge semantic distance (from 0 to 4) for each of the word pairs • Results: – Rank correlation between subjects is ~0.9 – People are consistent!

Why do this? • Task: automatically compute semantic similarity between words • Can be useful for many applications: – Detecting paraphrases (i.e., automatic essay grading, plagiarism detection) – Information retrieval – Machine translation • Why? Because similarity gives us a way to generalize beyond word identities

Evaluation: Correlation with Humans • Ask automatic method to rank word pairs in order of semantic distance • Compare this ranking with human-created ranking • Measure correlation

Evaluation: Word-Choice Problems Identify that alternative which is closest in meaning to the target: accidental imprison wheedle incarcerate ferment writhe inadvertent meander abominate inhibit

Evaluation: Malapropisms Jack withdrew money from the ATM next to the band. band is unrelated to all of the other words in its context…

Word Similarity: Two Approaches • Thesaurus-based – We’ve invested in all these resources… let’s exploit them! • Distributional – Count words in context

TH THESAURUS RUS-BASED BASED SIMI MILARIT ARITY MOD MODELS

Path-Length Similarity • Similarity based on length of path between concepts:   sim ( c , c ) log pathlen ( c , c ) path 1 2 1 2 How would you deal with ambiguous words?

Path-Length Similarity Pros and Cons • Advantages – Simple, intuitive – Easy to implement • Major disadvantage: – Assumes each edge has same semantic distance

Resnik Method • Probability that a randomly selected word in a corpus is an instance of concept c :   count ( w )  w words ( c ) P ( c ) N – words( c ) is the set of words subsumed by concept c – N is total number of words in corpus also in thesaurus • Define “information content”:   IC ( c ) log P ( c ) • Define similarity:   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

Resnik Method: Example   sim ( c , c ) log P ( LCS ( c , c )) Resnik 1 2 1 2

Thesaurus Methods: Limitations • Measure is only as good as the resource • Limited in scope – Assumes IS-A relations – Works mostly for nouns • Role of context not accounted for • Not easily domain-adaptable • Resources not available in many languages

Quick Aside: Thesauri Induction • Building thesauri automatically? • Pattern-based techniques work really well! – Co-training between patterns and relations – Useful for augmenting/adapting existing resources

DI DISTR TRIBU IBUTIO TIONAL NAL WOR ORD D SIMI MILARIT ARITY MOD MODELS

Distributional Approaches: Intuition “You shall know a word by the company it keeps!” (Firth, 1957) “ Differences of meaning correlates with differences of distribution” (Harris, 1970) • Intuition: – If two words appear in the same context, then they must be similar • Basic idea: represent a word w as a feature vector  w  (f , f , f ,... f ) 1 2 3 N

Context Features • Word co-occurrence within a window: • Grammatical relations:

Context Features • Feature values – Boolean – Raw counts – Some other weighting scheme (e.g., idf, tf.idf ) – Association values (next slide)

Association Metric • Commonly-used metric: Pointwise Mutual Information P ( w , f )  associatio n ( w , f ) log PMI 2 P ( w ) P ( f ) • Can be used as a feature value or by itself

Computing Similarity • Semantic similarity boils down to computing some measure on context vectors • Cosine distance: borrowed from information retrieval    N   v w   v w    i i i 1 sim ( v , w )   cosine   v w N N 2 2 v w  i  i i 1 i 1

Distributional Approaches: Discussion • No thesauri needed: data driven • Can be applied to any pair of words • Can be adapted to different domains

Distributional Profiles: Example

Problem?

Distributional Profiles of Concepts

Semantic Similarity: “Celebrity” Semantically distant…

Semantic Similarity: “Celestial body” Semantically close!

DI DIME MENS NSION IONALIT ALITY REDU DUCTIO TION Slides based on presentation by Christopher Potts

Why dimensionality reduction? • So far, we’ve defined word representations as rows in F , a m x n matrix – m = vocab size – n = number of context dimensions / features • Problems: n is very large, F is very sparse • Solution: find a low rank approximation of F – Matrix of size m x d where d << n

Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

Latent Semantic Analysis • Based on Singular Value Decomposition

LSA illustrated: SVD + select top k dimensions

Before & After LSA (k=100)

Methods • Latent Semantic Analysis • Also: – Principal component analysis – Probabilistic LSA – Latent Dirichlet Allocation – Word2vec – …

Recap: T oday • Q: what is understanding meaning? • A: meaning is knowing when words are similar or not • Topics – Word similarity – Thesaurus-based methods – Distributional word representations – Dimensionality reduction

Bonus… • Let’s try our hand at annotating word similarity

Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C - PowerPoint PPT Presentation

Word Similarity & Distributional Semantics CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu Last week Q: what is understanding meaning? A: knowing the sense of words in context Requires word sense inventory

Distributional Semantics The unsupervised modeling of meaning on a large scale Tim Van de Cruys

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Distributional Compositionality Intro to Distributional Semantics Raffaella Bernardi University

Logic and Natural Language Semantics: Distributional Semantics R affaella B ernardi DISI, U

Modelling constructional change with distributional semantics Florent Perek Overview o Applying

Synonymy in an approach to combined distributional and compositional semantics Ann Copestake and

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Distributional Semantics Crash Course September 11, 2018 CSCI 2952C: Computational Semantics

Distributional Semantics Joo Sedoc IntroHLT class November 4, 2019 Intuition of

JoBimText Framework for Distributional Semantics Alexander Panchenko TU Darmstadt FG

Natural Language Processing (CSEP 517): Distributional Semantics Roy Schwartz 2017 c

Combining distributional semantics and structured data to study lexical change Astrid van Aggelen ,

Linear mixed models with improper priors and flexible distributional assumptions for longitudinal

Statistics and Samples in Distributional Reinforcement Learning Mark Rowland, Robert Dadashi,

Statistics and Samples in Distributional Reinforcement Learning Rowland, Dadashi, Kumar, Munos,

FOSDEM 2019 Minimalistic Languages Devroom Why JSON when you can DSL? Creating fi le formats

What Is Bret Victor Trying To Do? Antti Halme Frequency 05 March 2020 Agenda Hello, Setup

What can online data tell us about the labour market? Pawel Adrjan Starting at 11.30AM ESCoE

FNAL EPEAT INITIATIVE Amy Pavnica, Jack Schmidt What Is EPEAT? Electronics Product

MARKETING FOR ENTREPRENEURS LECTURE 7 RECAP WHAT ? Course Structure DISCOVER CUSTOMER NEED

COMP 110 Introduction to Programming Fall 2015 Time: TR 9:30 10:45 Room: AR 121 (Hanes Art

ECEN 5032 Data Networks Introduction Peter Mathys mathys@colorado.edu University of Colorado,

Information Extraction Kristina Lerman University of Southern California Thanks to Andrew