SI425 : NLP Set 9 Word2Vec - Neural Words Fall 2020 : Chambers
Why are these so different? Last time : words are vectors of observed counts
How big are these vectors? Big vectors: the size of your vocabulary How similar are two words? sim( eat, devour ) = cosine( ) = 0.72 Problem: Lots of zeros and huge vectors!
Other Problem Problem: Lots of zeros and huge vectors! - we’ll shrink them Other problem: counts still miss a lot of similarity “Apple” “Peach” “dice” “slice” “dice” “slice” Zero overlap on “cutting” counts!
Today’s goals • Shrink these vectors to a reasonable size • Optimize the vector values to be “useful” to NLP - word prediction! Rather than just counting with no goal… • Force synonyms to be similar to each other, don’t just “hope”. • Similar to our Lab 2 goal of generation, predict your neighbor. • 5
Why do we care? • Words as vectors let us represent any span of text 6
Why do we care? • Our input is now a vector representation Logistic Regression! “Dickens” weights “Dickens” score "The cat ate mice” 7
Word2Vec • Learn word embeddings (vectors) by predicting neighboring words • Step 1: create a random vector for each word 8
Word2Vec • Step 2: find a huge corpus of written text • Step 3: use each word to “predict” its neighbor “Alice ate dinner very quickly” P(“Alice”) 9
Word2Vec • How to compute probabilities? Score all the words! Normed Scores Probabilities 10
Word2Vec The loss function is again how far off your prediction • probability is from the correct word (“Alice”) How do you get high probabilities? High scores!! • How do you get high scores? • When the input word embedding is similar to the target word embedding. 11
Why it works All the “food words” need to score “eat” highly. They’ll thus • adjust weights to be similar to “eat”, which means similar to each other! All the “action verbs” need to score adverbs like “quickly” • higher. They’ll adjust weights to be similar to it! All the “people names” do people things, so need to score • words highly like “talk”, “walk”, “think”. Their vectors will slowly turn into each other! 12
An added detail… Make sure the training data includes negative examples • It helps to push weights away from wrong answers • Positive Negative Examples Examples (Alice, ate) (Table, ate) (Puppy, ate) (Idea, ate) (Baby, ate) (The, ate) (Peacock, ate) (Paint, ate) 13
Examples • Color-coded numbers, blue is negative, red positive. 14
Vector semantics? 15
Algebra with words? 16
Demo with Python’s gensim 17
Other Overviews of Word2Vec • Blog post by Adrian Colyer https://blog.acolyer.org/2016/04/21/the-amazing-power-of-word-vectors/ • The Illustrated Guide to Word2Vec http://jalammar.github.io/illustrated-word2vec/ • The original research paper! https://papers.nips.cc/paper/5021-distributed-representations-of-words-and- phrases-and-their-compositionality.pdf 18
Recommend
More recommend