Using PMI to identify Distributional (Vector-space) semantics: - PowerPoint PPT Presentation

CS447: Natural Language Processing Where we’re at http://courses.engr.illinois.edu/cs447 We have looked at how to obtain the meaning of sentences from the meaning of their words Lecture 17: (represented in predicate logic). Now we will look at how to represent the meaning of Vector-space semantics words (although this won’t be in predicate logic) (distributional similarities) We will consider different tasks: - Computing the semantic similarity of words   Julia Hockenmaier by representing them in a vector space - Finding groups of similar words by inducing word clusters juliahmr@illinois.edu - Identifying different meanings of words   3324 Siebel Center by word sense disambiguation � 2 CS447: Natural Language Processing (J. Hockenmaier) What we’re going to cover today Pointwise mutual information A very useful metric to identify events that frequency co-occur   Using PMI to identify Distributional (Vector-space) semantics: Measure the semantic similarity of words   words that “go in terms of the similarity of the contexts   in which the words appear   - The distributional hypothesis together” - Representing words as (sparse) vectors - Computing word similarities   � 3 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier) � 4

                Mutual information I ( X ; Y ) Discrete random variables A discrete random variable X can take on values   Two random variables X, Y are independent   {x 1 ,…, x n } with probability p(X = x i ) iff their joint distribution is equal to the product of their individual distributions:   A note on notation: p(X) refers to the distribution, while p(X = x i ) refers to the p ( X , Y ) = p ( X ) p ( Y ) probability of a specific value x i . p(X = x i ) also written as p(x i ) That is, for all outcomes x , y :   p ( X=x , Y=x ) = p ( X=x ) p ( Y=y )   In language modeling, the random variables correspond to words W or to sequences of words W (1) …W (n) . I ( X ; Y ) , the mutual information of two random Another note on notation: variables X and Y is defined as   We’re often sloppy in making the distinction between   p ( X = x, Y = y ) log p ( X = x, Y = y ) the i -th word [token] in a sequence/string, and   X I ( X ; Y ) = the i -th word [type] in the vocabulary clear. p ( X = x ) p ( Y = y ) X,Y � 5 � 6 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier) Pointwise mutual information (PMI) Using PMI to find related words Recall that two events x , y are independent   Find pairs of words w i , w j that have high pointwise mutual information:   if their joint probability is equal to the product of their individual probabilities:   PMI ( w i , w j ) = log p ( w i , w j ) x,y are independent iff p(x,y) = p(x)p(y) p ( w i ) p ( w j ) x,y are independent iff p(x,y) ∕ p(x)p(y) = 1 Different ways of defining p ( w i , w j )   give different answers. In NLP, we often use the pointwise mutual information (PMI) of two outcomes/events (e.g. words):   PMI ( x, y ) = log p ( X = x, Y = y ) p ( X = x ) p ( Y = y ) � 7 � 8 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier)

    Using PMI to find “sticky pairs” p ( w i , w j ): probability that w i , w j are adjacent Define p ( w i , w j ) = p (“ w i w j ”) High PMI word pairs under this definition: Back to lexical Humpty Dumpty, Klux Klan, Ku Klux, Tse Tung,   avant garde, gizzard shad, Bobby Orr, mutatis mutandis,   semantics… Taj Mahal, Pontius Pilate, ammonium nitrate,   jiggery pokery, anciens combattants, fuddle duddle,   helter skelter, mumbo jumbo   (and a few more) � 9 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier) � 10 Vector representations of words Different approaches to lexical semantics Lexicographic tradition: “Traditional” distributional similarity approaches - Use lexicons, thesauri, ontologies represent words as sparse vectors [today’s lecture] - Assume words have discrete word senses: - Each dimension represents one specific context - Vector entries are based on word-context co-occurrence bank1 = financial institution; bank2 = river bank, etc. - May capture explicit relations between word (senses):   statistics (counts or PMI values) “dog” is a “mammal”, etc. Alternative, dense vector representations: Distributional tradition: - We can use Singular Value Decomposition to turn these - Map words to (sparse) vectors that capture corpus statistics sparse vectors into dense vectors (Latent Semantic Analysis) - Contemporary variant: use neural nets to learn dense vector - We can also use neural models to explicitly learn a dense “embeddings” from very large corpora vector representation (embedding) (word2vec, Glove, etc.)   (this is a prerequisite for most neural approaches to NLP) - This line of work often ignores the fact that words have Sparse vectors = most entries are zero   multiple senses or parts-of-speech Dense vectors = most entries are non-zero � 11 � 12 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier)

Distributional Similarities Why do we care about word similarity? Measure the semantic similarity of words   Question answering: in terms of the similarity of the contexts   Q: “How tall is Mt. Everest?”   in which the words appear Candidate A: “The official height of Mount Everest is 29029 feet” Represent words as vectors “tall” is similar to “height” � 13 � 14 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier) Why do we care about word similarity? Why do we care about word contexts? Plagiarism detection What is tezgüino? A bottle of tezgüino is on the table.   Everybody likes tezgüino.   Tezgüino makes you drunk.   We make tezgüino out of corn.   (Lin, 1998; Nida, 1975) The contexts in which a word appears   tells us a lot about what it means.   � 15 � 16 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier)

The Distributional Hypothesis Exploiting context for semantics Zellig Harris (1954): Distributional similarities (vector-space semantics): “oculist and eye-doctor … occur in almost the same Use the set of contexts in which words (= word types) environments” appear to measure their similarity “If A and B have almost identical environments we say that Assumption: Words that appear in similar contexts ( tea, coffee ) they are synonyms.” have similar meanings.   John R. Firth 1957: Word sense disambiguation (future lecture)   You shall know a word by the company it keeps.   Use the context of a particular occurrence of a word (token) to identify which sense it has. The contexts in which a word appears   Assumption: If a word has multiple distinct senses   tells us a lot about what it means. (e.g. plant : factory or green plant ), each sense will appear in Words that appear in similar contexts have similar meanings different contexts. � 17 � 18 CS447: Natural Language Processing (J. Hockenmaier) CS447: Natural Language Processing (J. Hockenmaier) Distributional similarities Distributional similarities use the set of contexts   in which words appear to measure their similarity. They represent each word w as a vector w Distributional w = ( w 1 , …, w N ) ∈ R N   in an N-dimensional vector space. similarities - Each dimension corresponds to a particular context c n - Each element w n of w captures the degree to which   the word w is associated with the context c n . - w n depends on the co-occurrence counts of w and c n   The similarity of words w and u is given by the similarity of their vectors w and u � 20 CS447: Natural Language Processing (J. Hockenmaier) � 19 CS447: Natural Language Processing (J. Hockenmaier)

Using PMI to identify Distributional (Vector-space) semantics: - PowerPoint PPT Presentation

CS447: Natural Language Processing Where were at http://courses.engr.illinois.edu/cs447 We have looked at how to obtain the meaning of sentences from the meaning of their words Lecture 17: (represented in predicate logic). Now we will look

PMI 2015 Conference Fernando Fernandez,2015 PMI President and Director of Codes and Standards,

PMI UK Agile CoP November 2013 Event Shell, London 1 Evening Agenda Introductions from

HISTORY Planar Monolithics Industries, Inc. (PMI), an S Corporation, was founded on November

HISTORY Planar Monolithics Industries, Inc. (PMI), an S Corporation, was founded on November

STRATEGY, CULTURE, STRUCTURE No Magic Bullet PMI-SAC PDC: November 2012 Paul Robinson

PMI phase The integration phase is an ongoing process which is expected to last at least six

Japan Japan s economy slows as confidence slides ahead of sales tax rise PMI surveys point

Using Prevention through Design to Achieve Project Safety Goals PMI WLEC September 11, 2018 Mike

Using Metrics to Identify Design Using Metrics to Identify Design Patterns in Object-Oriented

Theme 1: Plant- Microbe Interaction- PMI S. No. NAME TITLE POSTER

Earned Value Management Prepared For: PMI-L.I. Chapter Prepared By: Jean Cronan Presentation

Harpreet Singh PMP, PMI-ACP, MBA ( Marketing and International Business) harpreet20oct@yahoo.com

Making Projects Critical 2017 PMI Research Achievement Award Johann Packendorff

PMI Houston Houston Project Management Hiring Trends 2013 James Del Monte CERS, CPC JDA

SERVICING EVAPORATIVE COOLERS Sleeve Bearing PMI uses a sintered bronze bearing on the blower

High-Velocity Productivity (HVP) Individual, Team, and Organizational Productivity Frameworks for

Monitoring & Evaluation ADEPT Autumn Conference Monitoring and Evalu luation Karen

HATTEN LAND LIMITED Shaping the Future, Melaka and Beyond Corporate Presentation | 1Q FY2018

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Automated Machine Learning 2020.4.16 Seung-Hoon Na Jeonbuk National University Contents

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

Design Patterns (II) AP 2005 Design Pattern Space Purpose Defer object creation to another

Frank Mittelbach I presume this is one of several dozen bugs that would arise over the years if

Monopoles, Periods and Problems H.W. Braden Bath 2010 Monopole Results in collaboration with

Sambuz

Useful Links

Newsletter

Mail Us

Using PMI to identify Distributional (Vector-space) semantics: - PowerPoint PPT Presentation

CS447: Natural Language Processing Where were at http://courses.engr.illinois.edu/cs447 We have looked at how to obtain the meaning of sentences from the meaning of their words Lecture 17: (represented in predicate logic). Now we will look

PMI 2015 Conference Fernando Fernandez,2015 PMI President and Director of Codes and Standards,

PMI UK Agile CoP November 2013 Event Shell, London 1 Evening Agenda Introductions from

HISTORY Planar Monolithics Industries, Inc. (PMI), an S Corporation, was founded on November

HISTORY Planar Monolithics Industries, Inc. (PMI), an S Corporation, was founded on November

STRATEGY, CULTURE, STRUCTURE No Magic Bullet PMI-SAC PDC: November 2012 Paul Robinson

PMI phase The integration phase is an ongoing process which is expected to last at least six

Japan Japan s economy slows as confidence slides ahead of sales tax rise PMI surveys point

Using Prevention through Design to Achieve Project Safety Goals PMI WLEC September 11, 2018 Mike

Using Metrics to Identify Design Using Metrics to Identify Design Patterns in Object-Oriented

Theme 1: Plant- Microbe Interaction- PMI S. No. NAME TITLE POSTER

Earned Value Management Prepared For: PMI-L.I. Chapter Prepared By: Jean Cronan Presentation

Harpreet Singh PMP, PMI-ACP, MBA ( Marketing and International Business) harpreet20oct@yahoo.com

Making Projects Critical 2017 PMI Research Achievement Award Johann Packendorff

PMI Houston Houston Project Management Hiring Trends 2013 James Del Monte CERS, CPC JDA

SERVICING EVAPORATIVE COOLERS Sleeve Bearing PMI uses a sintered bronze bearing on the blower

High-Velocity Productivity (HVP) Individual, Team, and Organizational Productivity Frameworks for

Monitoring &amp; Evaluation ADEPT Autumn Conference Monitoring and Evalu luation Karen

HATTEN LAND LIMITED Shaping the Future, Melaka and Beyond Corporate Presentation | 1Q FY2018

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Automated Machine Learning 2020.4.16 Seung-Hoon Na Jeonbuk National University Contents

I NTRODUCTION TO F REE -E NERGY C ALCULATIONS Chris Chipot Laboratoire International Associ

Design Patterns (II) AP 2005 Design Pattern Space Purpose Defer object creation to another

Frank Mittelbach I presume this is one of several dozen bugs that would arise over the years if

Monopoles, Periods and Problems H.W. Braden Bath 2010 Monopole Results in collaboration with

Sambuz

Useful Links

Newsletter

Mail Us

Monitoring & Evaluation ADEPT Autumn Conference Monitoring and Evalu luation Karen