SI425 : NLP Set 11 Distributional Similarity some slides adapted - PowerPoint PPT Presentation

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill MacCartney

Distributional methods • Firth (1957) “You shall know a word by the company it keeps!” • Example from Nida (1975) noted by Lin: A bottle of tezgüino is on the table Everybody likes tezgüino Tezgüino makes you tipsy We make tezgüino out of corn • Intuition : • Just from context, you can guess the meaning of tezgüino. • So we should look at surrounding contexts, and see what other words occur in similar context.

Fill-in-the-blank on Google You can get a quick & dirty impression of what words show up in a given context with Google queries:

Context vectors • Target word w • We have a boolean variable f i for each word v i in the vocabulary. • f i = “word v i occurs in the neighborhood of w ” w = (f 1 , f 2 , f 3 , …, f N ) If w = tezgüino , v 1 = bottle , v 2 = make , v 3 = matrix w = (1 , 1, 0, … )

Intuition • Define two words by these sparse vectors • Apply a vector distance metric • Call two words similar if their vectors are similar

Distributional similarity We need to define 3 things: 1. How the co-occurrence terms are defined • Vocabulary? N-Grams? 2. How terms are weighted • (Boolean? Frequency? Logs? Mutual information?) 3. What vector similarity metric should we use? • Euclidean distance? Cosine? Jaccard? Dice?

1. Defining co-occurrence vectors • Windows of neighboring words (n words to the left…) • Bag-of-words • We generally remove stop words • Con : we lose any sense of syntax • Solution: use the words occurring in particular grammatical relations

Defining co-occurrence vectors “The meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entitites relative to other entities.” Zellig Harris (1968) Idea : parse the sentence, extract grammatical dependencies

Vectors with grammatical dependencies For the word cell : vector of N * R features ( R is the number of dependency relations)

Group Exercise • Search “Naval Academy” and create a vector. • What other school is most similar? Most different? • Compare vectors 10

2. Weighting the counts • We have been using the frequency count of context as its weight/value • But we could use any function of this frequency • Instead : compute an association score Consider one feature f = (r , w’) = (obj-of, attack ) • • P(f | w) = count(f, w) / count(w) • Assoc prob (w, f) = p(f | w)

Frequency-based problems Objects of the verb drink : Water 7 Champagne 4 It 3 Much 3 Anything 3 Liquid 2 Wine 2 • Problem : “drink it” is more common than “drink wine” ! (“wine” is a better drinkable thing than “it”) • Need : We need to control for expected frequency • Solution: normalize by the expected frequency

Weighting: Mutual Information • Pointwise mutual information : measure of how often two events x and y occur, compared with what we would expect if they were independent: • PMI between a target word w and a feature f :

Mutual information intuition Objects of the verb drink

Summary: weightings • See Manning and Schuetze (1999) for more

3. Defining vector similarity

Summary of similarity measures

Evaluating similarity measures • Intrinsic evaluation • Correlation with word similarity ratings from humans • Extrinsic (task-based, end-to-end) evaluation • Malapropism (spelling error) detection • WSD • Essay grading • Plagiarism detection • Taking TOEFL multiple-choice vocabulary tests • Language modeling in some application

An example of detected plagiarism

SI425 : NLP Set 11 Distributional Similarity some slides adapted - PowerPoint PPT Presentation

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill MacCartney Distributional methods Firth (1957) You shall know a word by the company it keeps! Example from Nida (1975) noted by Lin: A

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Set 14 Neural NLP Fall 2020 : Chambers Why are these so different? Last time :

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney Three

SI425 : NLP Set 13 Information Extraction Information Extraction Yesterday GM released third

SI425 : NLP Set 4 Smoothing Language Models Fall 2017 : Chambers Review: evaluating n-gram

SI425 : NLP Set 3 Language Models Fall 2017 : Chambers Language Modeling Which sentence is

SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers Motivation We want to

SI425 Natural Language Processing Set 1 Intro to NLP Fall 2020: Chambers Assumptions about

SI425 : NLP Set 8 Words as Vectors (distributional similarity) Fall 2020 : Chambers some

SI425 : NLP Set 4 Smoothing Language Models Fall 2020 : Chambers Review: evaluating n-gram

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax:

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last

SI425 : NLP Set 2 Probability Review Fall 2020 : Chambers help me make a new rumor

SI425 : NLP Set 6 Logistic Regression Fall 2020 : Chambers Last time Naive Bayes Classifier

SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit

CS 162 Intro to Programming II Vectors 1 Vectors A

Whats Algebra? Note: This assumes you have not taken Math 351! If you have, you probably have

Z3strBV: A Solver for a Theory of Strings and Bit-vectors Murphy Berzish 1 , Sanu Subramanian 2 ,

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton & Rorres: Ch 3.3 Hefferon: Ch

Distributed Systems events vs. physical clocks : time of day Assume no central time source

Hedetniemi conjecture for strict vector chromatic number Robert mal (joint with C.Godsil,

ANLP Lecture 22 Lexical Semantics with Dense Vectors Shay Cohen (Based on slides by Henry

Conjugate Directions Powells method is based on a model quadratic objective function and

Sambuz

Useful Links

Newsletter

Mail Us

SI425 : NLP Set 11 Distributional Similarity some slides adapted - PowerPoint PPT Presentation

SI425 : NLP Set 11 Distributional Similarity some slides adapted from Dan Jurafsky and Bill MacCartney Distributional methods Firth (1957) You shall know a word by the company it keeps! Example from Nida (1975) noted by Lin: A

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Set 14 Neural NLP Fall 2020 : Chambers Why are these so different? Last time :

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI425 : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney Three

SI425 : NLP Set 13 Information Extraction Information Extraction Yesterday GM released third

SI425 : NLP Set 4 Smoothing Language Models Fall 2017 : Chambers Review: evaluating n-gram

SI425 : NLP Set 3 Language Models Fall 2017 : Chambers Language Modeling Which sentence is

SI425 : NLP Set 5 Nave Bayes Classification Fall 2020 : Chambers Motivation We want to

SI425 Natural Language Processing Set 1 Intro to NLP Fall 2020: Chambers Assumptions about

SI425 : NLP Set 8 Words as Vectors (distributional similarity) Fall 2020 : Chambers some

SI425 : NLP Set 4 Smoothing Language Models Fall 2020 : Chambers Review: evaluating n-gram

SI425 : NLP Set 10 Syntax and Parsing Fall 2020 : Chambers Syntax Grammar, or syntax:

SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers Why are these so different? Last

SI425 : NLP Set 2 Probability Review Fall 2020 : Chambers help me make a new rumor

SI425 : NLP Set 6 Logistic Regression Fall 2020 : Chambers Last time Naive Bayes Classifier

SI425 : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit

CS 162 Intro to Programming II Vectors 1 Vectors A

Whats Algebra? Note: This assumes you have not taken Math 351! If you have, you probably have

Z3strBV: A Solver for a Theory of Strings and Bit-vectors Murphy Berzish 1 , Sanu Subramanian 2 ,

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton &amp; Rorres: Ch 3.3 Hefferon: Ch

Distributed Systems events vs. physical clocks : time of day Assume no central time source

Hedetniemi conjecture for strict vector chromatic number Robert mal (joint with C.Godsil,

ANLP Lecture 22 Lexical Semantics with Dense Vectors Shay Cohen (Based on slides by Henry

Conjugate Directions Powells method is based on a model quadratic objective function and

Sambuz

Useful Links

Newsletter

Mail Us

Vectors III MA1S1 Tristan McLoughlin October 17, 2014 Anton & Rorres: Ch 3.3 Hefferon: Ch