OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for - PowerPoint PPT Presentation

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790

Word Embedding ■ Map words to vectors of real numbers ■ The earliest word representation is ”one hot representation”

Word Embedding ■ Distributed representation

Word2Vec ■ An unsupervised NLP method developed by Google in 2013 ■ Quantify the relationship between words Hierarchical Softmax Skip-gram Negative word2vec Sampling CBOW

Skip-gram ■ Input a vector representation of a specific word ■ Output the context word vector corresponding to this word

DNN (Deep Neural Network)

Huffman Tree ■ Leaf nodes denote all words in the vocabulary ■ The leaf nodes act as neurons in the output layer, and the internal nodes act as hidden neurons. ■ Input: n weights f1, f2, ..., fn (The frequency of each word in the corpus) ■ Output: The corresponding Huffman tree ■ Benefit: Common words have shorter Huffman code

■ (1) Treat f1, f2, ..., fn as a forest with n trees (Each tree has only one node); ■ (2) In the forest, select the two trees with the smallest weights to merge as the left and right subtrees of a new tree. And the weight of the root node of this new tree is the sum of the weights of the left and right child nodes; ■ (3) Delete the two selected trees from the forest and add the new trees to the forest; ■ (4) Repeat steps (2) and (3) until there is only one tree left in the forest

Hierarchical Softmax

HS Details

HS Details ■ use sigmoid function to decide whether to go left (+) or go right (-) ■ In the example above, w is "hierarchical".

HS Target Function

HS Gradient

Negative Sampling ■ Alternative method for training Skip-gram model ■ Subsampling frequent words to decrease the number of training examples. ■ Let each training sample to update only a small percentage of the model’s weights.

Negative sample ■ randomly select one word u from its surrounding words, so u and w compose one "positive sample". ■ The negative sample would be to use this same u, we randomly choose a word from the dictionary that is not w.

Sampling method ■ The unigram distribution is used to select negative words. ■ The probability of a word being selected as a negative sample is related to the frequency of its occurrence. The higher the frequency of occurrence, the easier it is to select as negative words

NS Details ■ Still use sigmoid function to train the model ■ Suppose through negative sampling, we get neg negative samples (context(w), w_i), I = 1, 2, …, neg. So each training sample is ( context(w), w, w_1, …w_neg ). ■ We expect our positive sample to satisfy: ■ Expect negative samples to satisfy:

NS Details ■ Want to maximize the following log-likelihood: ■ Similarly, compute gradient to update parameters.

Reference Mikolov et al., 2013, Distributed d Repres esentations of Words ds a and Phrases es a and t d thei eir Compositio ional ality http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases- and The code for this implementation can be found on my GItHub repo: https://github.com/cassie1102

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for - PowerPoint PPT Presentation

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding Map words to vectors of real numbers The earliest word representation is one hot representation Word Embedding Distributed

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

drop hum run If a word Yes! skip has only one syllable Yes! ends with a single consonant

The SDS Skip Subsea Deployment Systems Ltd. Subsea Deployment Systems Ltd. SUBSEA SKIP An

A Distributed Polylogarithmic Time Algorithm for Self-Stabilizing Skip Graphs Christian Decker

Skip Lists + S 3 + S 2 15 + S 1 15 23 + S 0 10

Skip Lists + S 3 S 2 + 15 S 1 + 15 23 S 0 + 10 15

Algorithms for NLP Automatic Speech Recognition Yulia Tsvetkov CMU Slides: Preethi Jyothi

Vector Semantics, Part 3 Re-cap: Skip-Gram Training Training sentence: ... lemon, a tablespoon of

Joshua Hartigan Supervisor: Judy-anne Osborn Heres a matrix And heres its Gram

Many words share the same root word This week we are focusing on words with the root gram.

Anaerobes Veillonella Gram positive bacilli Clostridium perfringens, tetani,

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Antimicrobial resistance and strategies for Gram-negative bacteria Y Glupczynski UCL

Language models Chapter 3 in Martin/Jurafsky Language model as a generative model Choose a

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13

Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural

CIS 530: Logistic Regression Wrap-up SPEECH AND LANGUAGE PROCESSING (3 RD EDITION DRAFT)

Language as an Interface Spencer Kelly introduction The pope is catholic. language as data

Identifying Relative Sizes of Measurement Units within the Customary & Metric Systems

Attack methods on privacy-preserving record linkage Peter Christen 1 , Rainer Schnell 2 , Dinusha

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for - PowerPoint PPT Presentation

OPTIMIZATION OF SKIP-GRAM MODEL Chenxi Wu Final Presentation for STA 790 Word Embedding Map words to vectors of real numbers The earliest word representation is one hot representation Word Embedding Distributed

21 st Century Antibiotics Gram Negative Antibiotic Gram Positive Antibiotic Plasmid Library

More microscopic slides of bacteria Gram stain Good example of bacilli gram stain that is

N-Gram Model Formulas Estimating Probabilities N-gram conditional probabilities can be

N-gram models Unsmoothed n-gram models (finish slides from last class) Smoothing

GOLD/SILVER/PLATINUM BARS &amp; COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

drop hum run If a word Yes! skip has only one syllable Yes! ends with a single consonant

The SDS Skip Subsea Deployment Systems Ltd. Subsea Deployment Systems Ltd. SUBSEA SKIP An

A Distributed Polylogarithmic Time Algorithm for Self-Stabilizing Skip Graphs Christian Decker

Skip Lists + S 3 + S 2 15 + S 1 15 23 + S 0 10

Skip Lists + S 3 S 2 + 15 S 1 + 15 23 S 0 + 10 15

Algorithms for NLP Automatic Speech Recognition Yulia Tsvetkov CMU Slides: Preethi Jyothi

Vector Semantics, Part 3 Re-cap: Skip-Gram Training Training sentence: ... lemon, a tablespoon of

Joshua Hartigan Supervisor: Judy-anne Osborn Heres a matrix And heres its Gram

Many words share the same root word This week we are focusing on words with the root gram.

Anaerobes Veillonella Gram positive bacilli Clostridium perfringens, tetani,

N-grams &amp; Language ID If N-gram models represent language models, can we use N-gram

Antimicrobial resistance and strategies for Gram-negative bacteria Y Glupczynski UCL

Language models Chapter 3 in Martin/Jurafsky Language model as a generative model Choose a

Analogies Explained Towards Understanding Word Embeddings Carl Allen, Tim Hospedales June 13

Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural

CIS 530: Logistic Regression Wrap-up SPEECH AND LANGUAGE PROCESSING (3 RD EDITION DRAFT)

Language as an Interface Spencer Kelly introduction The pope is catholic. language as data

Identifying Relative Sizes of Measurement Units within the Customary &amp; Metric Systems

Attack methods on privacy-preserving record linkage Peter Christen 1 , Rainer Schnell 2 , Dinusha

GOLD/SILVER/PLATINUM BARS & COINS RSBL 0.5 Gram 999 Purity Platinum Bar/Coin More Details

N-grams & Language ID If N-gram models represent language models, can we use N-gram

Identifying Relative Sizes of Measurement Units within the Customary & Metric Systems