An introduction to word embeddings W4705: Natural Language - PowerPoint PPT Presentation

Motivation Distributional semantics word2vec Analogies An introduction to word embeddings W4705: Natural Language Processing Fei-Tzin Lee September 23, 2019 Fei-Tzin Lee Word embeddings September 23, 2019 1 / 43

Motivation Distributional semantics word2vec Analogies Today 1 What are these word embedding things, anyway? 2 Distributional semantics 3 word2vec 4 Analogies with word embeddings Fei-Tzin Lee Word embeddings September 23, 2019 2 / 43

Motivation Distributional semantics word2vec Analogies Representing knowledge Humans have rich internal representations of words that let us do all sorts of intuitive operations, including (de)composition into other concepts. • “parent’s sibling” = ‘aunt’ - ‘woman’ = ‘uncle’ - ‘man’ • The attribute of a banana that is ‘yellow’ is the same attribute of an apple that is ‘red’. But this is obviously impossible for machines. There’s no numerical representation of words that encodes these sorts of abstract relationships. Fei-Tzin Lee Word embeddings September 23, 2019 4 / 43

Motivation Distributional semantics word2vec Analogies Representing knowledge Humans have rich internal representations of words that let us do all sorts of intuitive operations, including (de)composition into other concepts. • “parent’s sibling” = ‘aunt’ - ‘woman’ = ‘uncle’ - ‘man’ • The attribute of a banana that is ‘yellow’ is the same attribute of an apple that is ‘red’. But this is obviously impossible for machines. There’s no numerical representation of words that encodes these sorts of abstract relationships. ...Right? Fei-Tzin Lee Word embeddings September 23, 2019 4 / 43

Motivation Distributional semantics word2vec Analogies A bit of magic Figure: Output from the gensim package using word2vec vectors pretrained on Google News. • This is not a fancy language model • No external knowledge base • Just vector addition and subtraction with cosine similarity Fei-Tzin Lee Word embeddings September 23, 2019 5 / 43

Motivation Distributional semantics word2vec Analogies A bit of magic? math Where did these magical vectors come from? This trick works in a few different flavors: • SVD-based vectors • word2vec, from the example above, and other neural embeddings • GloVe, something akin to a hybrid method Fei-Tzin Lee Word embeddings September 23, 2019 6 / 43

Motivation Distributional semantics word2vec Analogies Word embeddings The semantic representations that have become the de facto standard in NLP are word embeddings , vector representations that are • Distributed: information is distributed throughout indices (rather than sparse) • Distributional: information is derived from a word’s distribution in a corpus (how it occurs in text) These can be viewed as an embedding from a discrete space of words into a continuous vector space. Fei-Tzin Lee Word embeddings September 23, 2019 7 / 43

Motivation Distributional semantics word2vec Analogies Applications • Language modeling • Machine translation • Sentiment analysis • Summarization • etc... Basically, anything that uses neural nets can use word embeddings too, and some other things besides. Fei-Tzin Lee Word embeddings September 23, 2019 8 / 43

Motivation Distributional semantics word2vec Analogies Words. What is a word? Fei-Tzin Lee Word embeddings September 23, 2019 10 / 43

Motivation Distributional semantics word2vec Analogies Words. What is a word? • A composition of characters or syllables? Fei-Tzin Lee Word embeddings September 23, 2019 10 / 43

Motivation Distributional semantics word2vec Analogies Words. What is a word? • A composition of characters or syllables? • A pair - usage and meaning. Fei-Tzin Lee Word embeddings September 23, 2019 10 / 43

Motivation Distributional semantics word2vec Analogies Words. What is a word? • A composition of characters or syllables? • A pair - usage and meaning. These are independent of representation. So we can choose what representation to use in our models. Fei-Tzin Lee Word embeddings September 23, 2019 10 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? • Bag-of-words? (no word order) Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? • Bag-of-words? (no word order) • Sequences of numerical indices? (relatively uninformative) Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? • Bag-of-words? (no word order) • Sequences of numerical indices? (relatively uninformative) • One-hot vectors? (space-inefficient; curse of dimensionality) Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? • Bag-of-words? (no word order) • Sequences of numerical indices? (relatively uninformative) • One-hot vectors? (space-inefficient; curse of dimensionality) • Scores from lexicons, or hand-engineered features? (expensive and not scalable) Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies So, how? How do we represent the words in some segment of text in a machine-friendly manner? • Bag-of-words? (no word order) • Sequences of numerical indices? (relatively uninformative) • One-hot vectors? (space-inefficient; curse of dimensionality) • Scores from lexicons, or hand-engineered features? (expensive and not scalable) Plus: none of these tell us how the word is used, or what it actually means. Fei-Tzin Lee Word embeddings September 23, 2019 11 / 43

Motivation Distributional semantics word2vec Analogies What does a word mean? Let’s try a hands-on exercise. Fei-Tzin Lee Word embeddings September 23, 2019 12 / 43

Motivation Distributional semantics word2vec Analogies What does a word mean? Let’s try a hands-on exercise. Obviously, word meaning is really hard for even humans to quantify. So how can we possibly generate representations of word meaning automatically? We approach it obliquely, using what is known as the distributional hypothesis. Fei-Tzin Lee Word embeddings September 23, 2019 12 / 43

Motivation Distributional semantics word2vec Analogies The distributional hypothesis Borrowed from linguistics: the meaning of a word can be determined from the contexts it appears in. 1 • Words with similar contexts have similar meanings (Harris, 1954) • “You shall know a word by the company it keeps” (Firth, 1957) Example: “My homework was no archteryx of academic perfection, but it sufficed.” 1 https://aclweb.org/aclwiki/Distributional Hypothesis Fei-Tzin Lee Word embeddings September 23, 2019 13 / 43

Motivation Distributional semantics word2vec Analogies Context? Most static word embeddings use a simple notion of context - a word is a “context” for another word when it appears close enough to it in the text. • But we can also use sentences or entire documents as contexts. In the most basic case, we fix some number of words as our ‘context window’ and count all pairs of words that are less than that many words away from each other as co-occurrences. Fei-Tzin Lee Word embeddings September 23, 2019 14 / 43

Motivation Distributional semantics word2vec Analogies Example time! Let’s say we have a context window size of 2. no raw meat pants, please. please do not send me some raw vegetarians. What are the co-occurrences of ‘send’? Fei-Tzin Lee Word embeddings September 23, 2019 15 / 43

Motivation Distributional semantics word2vec Analogies Example time! Let’s say we have a context window size of 2. no raw meat pants, please. please do not send me some raw vegetarians. What are the co-occurrences of ‘send’? • ‘do’ • ‘not’ • ‘me’ • ‘some’ Fei-Tzin Lee Word embeddings September 23, 2019 15 / 43

Motivation Distributional semantics word2vec Analogies The co-occurrence matrix We can collect the context occurrences in our corpus into a co-occurrence matrix M ij , where each row corresponds to a word and each column to a context word. Then the entry M ij represents how many times word i appeared within the context of word j . Fei-Tzin Lee Word embeddings September 23, 2019 16 / 43

An introduction to word embeddings W4705: Natural Language - PowerPoint PPT Presentation

Motivation Distributional semantics word2vec Analogies An introduction to word embeddings W4705: Natural Language Processing Fei-Tzin Lee September 23, 2019 Fei-Tzin Lee Word embeddings September 23, 2019 1 / 43 Motivation

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

MATHEMATICS 1 CONTENTS Mathematical programming Linear programming The LP-problem Old exam

Problem solving and search C h a p t e r 3 (Adapted from Stuart Russel, Dan Klein, and others.

Exploring the Feature Space Ash Asudeh ICS & SLALS, Carleton University Bilingual Workshop

CSCI 5582 Artificial Intelligence Lecture 18 Jim Martin CSCI 5582 Fall 2006 Today 11/2

Education & Training Committee Overview of purpose and structure Update on State

Local producers and vendors Market managers and lead personnel Local civic leaders,

9 ECTS Ilias Bergstrm iliasb@kth.se 1 Course Outline Media Convergence Print

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic

An introduction to word embeddings W4705: Natural Language - PowerPoint PPT Presentation

Motivation Distributional semantics word2vec Analogies An introduction to word embeddings W4705: Natural Language Processing Fei-Tzin Lee September 23, 2019 Fei-Tzin Lee Word embeddings September 23, 2019 1 / 43 Motivation

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Embeddings @ Twitter Making ML easy with Embeddings !!! Sept 2018 Agenda 1 Team 2 Whats an

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Memory Memory Decoders M bits M bits RWM NVRWM ROM S 0 S 0 Word 0 Word 0 S 1 Word 1 Word

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

MATHEMATICS 1 CONTENTS Mathematical programming Linear programming The LP-problem Old exam

Problem solving and search C h a p t e r 3 (Adapted from Stuart Russel, Dan Klein, and others.

Exploring the Feature Space Ash Asudeh ICS &amp; SLALS, Carleton University Bilingual Workshop

CSCI 5582 Artificial Intelligence Lecture 18 Jim Martin CSCI 5582 Fall 2006 Today 11/2

Education &amp; Training Committee Overview of purpose and structure Update on State

Local producers and vendors Market managers and lead personnel Local civic leaders,

9 ECTS Ilias Bergstrm iliasb@kth.se 1 Course Outline Media Convergence Print

Fast Convergence of Belief Propagation to Global Optima: Beyond Correlation Decay Frederic

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Exploring the Feature Space Ash Asudeh ICS & SLALS, Carleton University Bilingual Workshop

Education & Training Committee Overview of purpose and structure Update on State