Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP

Overview • Word types and tokens • Training contextual embeddings • Embeddings from Language Models (ELMo) 1

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country 3

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country Seventeen words ask, not, what, your, country, can, do, for, you, ask, what, you, can, do, for, your, country 4

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 5

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country When we say “words”, which interpretation do we mean? Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 6

How many words… How many words are in this sentence below? (Ignoring capitalization and the comma) Ask not what your country can do for you, ask what you can do for your country When we say “words”, which interpretation do we mean? Seventeen Only nine words words Which of these interpretations did use when we looked ask, not, what, your, country, can, ask, can, country, do, at word embeddings? do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 7

Word types Types are abstract and unique objects – Sets or concepts – e.g. there is only one thing called laptop – Think entries in a dictionary Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 8

Word tokens Tokens are instances of the types – Usage of a concept – this laptop , my laptop , your laptop Ask not what your country can do for you, ask what you can do for your country Seventeen Only nine words words ask, not, what, your, country, can, ask, can, country, do, do, for, you, ask, what, you, can, for not, what, your, you do, for, your, country 9

The type-token distinction • A larger philosophical discussion – See the Stanford Encyclopedia of Philosophy for a nuanced discussion • The distinction is broadly applicable and we implicitly reason about it "We got the same gift” We got the same gift type vs We got the same gift token 10

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word – The context may inform the embeddings 11

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word – The context may inform the embeddings 12

Word embeddings revisited • All the word embedding methods we saw so far trained embeddings for word types – Used word occurrences, but the final embeddings are type embeddings – Type embeddings = lookup tables • Can we embed word tokens instead? • What makes a word token different from a word type? – We have the context of the word to inform the embedding – We may be able to resolve word sense ambiguity 13

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar 15

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar • Capture information about how words can be used – go and went are similar, but slightly different from each other 16

Word embeddings should… • Unify superficially different words – bunny and rabbit are similar • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous The bank was robbed again We walked along the river bank – Sense embeddings 17

Word embeddings should… Type embeddings can • Unify superficially different words address the first two – bunny and rabbit are similar requirements • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous The bank was robbed again We walked along the river bank – Sense embeddings 18

Word embeddings should… Type embeddings can • Unify superficially different words address the first two – bunny and rabbit are similar requirements • Capture information about how words can be used – go and went are similar, but slightly different from each other • Separate accidentally similar looking words – Words are polysemous Word sense can be The bank was robbed again disambiguated using We walked along the river bank the context ⇒ – Sense embeddings contextual embeddings 19

Type embeddings vs token embeddings • Type embeddings can be thought of as a lookup table – Map words to vectors independent of any context – A big matrix • Token embeddings should be functions – Construct embeddings for a word on the fly – There is no fixed “bank” embedding, the usage decides what the word vector is 20

Contextual embeddings The big new thing in 2017-18 Two popular models ELMo BERT Peters et al 2018 Devlin et al 2018 Other work in this direction: ULMFit [Howard and Ruder 2018] 21

Contextual embeddings The big new thing in 2017-18 ELMo BERT We will look at ELMo now. We will visit BERT later in the semster 22

Embeddings from Language Models (ELMo) Two key insights 1. The embedding of a word type should depend on its context But the size of the context should not be fixed – No Markov assumption • Need arbitrary context – use an bidirectional RNN – 24

Embeddings from Language Models (ELMo) Two key insights 1. The embedding of a word type should depend on its context But the size of the context should not be fixed – No Markov assumption • Need arbitrary context – use an bidirectional RNN – 2. Language models are already encoding the contextual meaning of words Use the internal states of a language model as the word – embedding 25

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 26

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 27

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings – Two layers of BiLSTMs, but could be more • Loss = language model loss – Cross-entropy over probability of seeing the word in a context Specific training/modeling details in the paper 28

The ELMo model • Embed word types into a vector – Can use pre-trained embeddings (GloVe) – Can train a character-based model to get a context-independent embedding • Deep bidirectional LSTM language model over the embeddings Hidden state of each – Two layers of BiLSTMs, but could be more BiLSTM cell = embedding for the word • Loss = language model loss – Cross-entropy over probability of seeing the word in a context 29

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview Word types and tokens Training contextual embeddings Embeddings from Language Models (ELMo) 1 Overview Word types and tokens Training

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko

Pattern-based Solutions to Limitations of Leading Word Embeddings Roy Schwartz University of

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

What is best for spoken langage understanding: small but task-dependent embeddings or huge but

Learning Word Embeddings for Low-resource Languages by PU Learning Chao Jiang, Hsiang-Fu Yu,

Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops Limor Gultchin, Genevieve

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret

Generalizing Word Embeddings using Bag of Subwords Jinman Zhao , Sidharth Mudgal, Yingyu Liang

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Lecture 26 Word Embeddings and Recurrent Nets Julia Hockenmaier juliahmr@illinois.edu 3324

Lecture 4: Static word embeddings Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Retrofitting Contextualized Word Embeddings with Paraphrases Weijia Shi 1* , Muhao Chen 1 * , Pei

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep - PowerPoint PPT Presentation

Word Embeddings Revisited: Contextual Embeddings CS 6956: Deep Learning for NLP Overview Word types and tokens Training contextual embeddings Embeddings from Language Models (ELMo) 1 Overview Word types and tokens Training

Word Embeddings Natural Language Processing VU (706.230) - Andi Rexha 02/04/2020 Word Embeddings

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word Embeddings CS 6956: Deep Learning for NLP Overview Representing meaning Word

Word embeddings Rappel Embeddings ( pas Word Embeddings ) Est une lookup table Formalisme:

Mixed membership word embeddings: Corpus-specific embeddings without big data James Foulds

Word Embeddings Tutorial HILA GONEN PHD STUDENT AT YOAV GOLDBERGS LAB BAR ILAN UNIVERSITY

Word Embeddings through Hellinger PCA Rmi Lebret and Ronan Collobert Idiap Research Institute /

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky &amp; Martin How to

Searching for the X-Factor: Exploring Corpus Subjectivity for Word Embeddings Maksim Tkachenko

Pattern-based Solutions to Limitations of Leading Word Embeddings Roy Schwartz University of

Word, Sense and Contextualized Embeddings: Vector Representations of Meaning in NLP Jose

What is best for spoken langage understanding: small but task-dependent embeddings or huge but

Learning Word Embeddings for Low-resource Languages by PU Learning Chao Jiang, Hsiang-Fu Yu,

Humor in Word Embeddings: Cockamamie Gobbledegook for Nincompoops Limor Gultchin, Genevieve

Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction Roy Schwartz + ,

USING PSEUDO-SENSES FOR IMPROVING THE EXTRACTION OF SYNONYMS FROM WORD EMBEDDINGS Olivier Ferret

Generalizing Word Embeddings using Bag of Subwords Jinman Zhao , Sidharth Mudgal, Yingyu Liang

Lecture 8: NLP and Word Embeddings Alireza Akhavan Pour CLASS.VISION

Lecture 26 Word Embeddings and Recurrent Nets Julia Hockenmaier juliahmr@illinois.edu 3324

Lecture 4: Static word embeddings Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Retrofitting Contextualized Word Embeddings with Paraphrases Weijia Shi 1* , Muhao Chen 1 * , Pei

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to

Dense Word Embeddings CMSC 470 Marine Carpuat Slides credit: Jurasky & Martin How to