Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 24, 2017 Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from.
Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 2
Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 3
Predict surrounding words A bottle of tezguino is on the table. u v CS 295: STATISTICAL NLP (WINTER 2017) 4
Negative Sampling CS 295: STATISTICAL NLP (WINTER 2017) 5
Neural View of Embeddings CS 295: STATISTICAL NLP (WINTER 2017) 6
Word embeddings Variations • Skip-gram: predict context from word CBOW: predict word from context bag of words • • Dependencies: a better description of context Uses • Similarity: Grammar: • • Analogies Gender: • Facts: • CS 295: STATISTICAL NLP (WINTER 2017) 7
Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 8
Language Models Probability of a Sentence Is a given sentence something you would expect to see? • Syntactically (grammar) and Semantically (meaning) • Probability of the Next Word Predict what comes next for a given sequence of words. • Think of it as V-way classification • CS 295: STATISTICAL NLP (WINTER 2017) 9
Task: Speech Recognition “eyes awe of an” OR “I saw a van” CS 295: STATISTICAL NLP (WINTER 2017) 10
Task: Machine Translation CS 295: STATISTICAL NLP (WINTER 2017) 11
Task: Handwriting Recognition http://www.cedar.buffalo.edu/handwriting/HRoverview.html CS 295: STATISTICAL NLP (WINTER 2017) 12
Task: Image Captioning CS 295: STATISTICAL NLP (WINTER 2017) 13
Task: Spelling Correction The office is about fifteen minuets from my house P(about fifteen minutes from) >> P(about fifteen minuets from) CS 295: STATISTICAL NLP (WINTER 2017) 14
Other Applications Summarization Question Answering Dialog Systems CS 295: STATISTICAL NLP (WINTER 2017) 15
Evaluating Language Models Best choice: Extrinsic 2nd choice: Intrinsic CS 295: STATISTICAL NLP (WINTER 2017) 16
Perplexity CS 295: STATISTICAL NLP (WINTER 2017) 17
Generating Text from an LM CS 295: STATISTICAL NLP (WINTER 2017) 18
Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 19
Direct Language Modeling P(“I do not like green eggs and ham”) P(w | “I do not like green eggs and ”) CS 295: STATISTICAL NLP (WINTER 2017) 20
Applying the Chain Rule CS 295: STATISTICAL NLP (WINTER 2017) 21
Markov Assumption 22 CS 295: STATISTICAL NLP (WINTER 2017)
Unigram Language Model CS 295: STATISTICAL NLP (WINTER 2017) 23
Bigram Language Model CS 295: STATISTICAL NLP (WINTER 2017) 24
Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 25
Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 26
N-Gram Language Models “The computer which I had just put into the “The computer which I had just put into the dining room on the fifth floor crashed.” dining room on the fifth floor had lunch.” CS 295: STATISTICAL NLP (WINTER 2017) 27
Shakespeare CS 295: STATISTICAL NLP (WINTER 2017) 28
Wall Street Journal CS 295: STATISTICAL NLP (WINTER 2017) 29
Implementation Tips Use Logs Prevent underflow • Sums, instead of products • Filter out n-grams • Rare n-grams are noisy/have low prob Use unigrams to filter bigrams… • CS 295: STATISTICAL NLP (WINTER 2017) 30
Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 31
Zero Probability Problem Training set: Test set • … denied the allegations … denied the offer … denied the reports … denied the loan … denied the claims … denied the request P(“offer” | denied the) = 0 Rare words/combinations Mispellings New words Because corpus is finite.. “minuets” Truthiness • • • #letalonethehashtags • bigly • CS 295: STATISTICAL NLP (WINTER 2017) 32
Laplace Smoothing CS 295: STATISTICAL NLP (WINTER 2017) 33
Intuition Behind Smoothing When we have sparse statistics: P(w | denied the) 3 allegations allegations 2 reports reports 1 claims outcome … 1 request request attack claims man 7 total Steal probability mass to generalize better P(w | denied the) 2.5 allegations allegations allegations 1.5 reports 0.5 claims outcome … reports 0.5 request attack man claims request 2 other 7 total CS 295: STATISTICAL NLP (WINTER 2017) 34
Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 35
Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 36
Backoff and Interpolation Backoff Use trigram, unless rare • Then use bigram, unless rare • Then use unigram.. • Interpolation Combine all three! • Linear function with parameters • Learn on held out data • CS 295: STATISTICAL NLP (WINTER 2017) 37
Upcoming… Homework 1 is due: January 26, 2017 • Homework Write-up, data, and code for Homework 2 is up • Homework 2 is due: February 9, 2017 • • Proposal is due: February 7, 2017 (~2 weeks) Make things more concrete: approach, metrics, baselines • Project • Mention progress, and address my concerns, if any Only 2 pages • CS 295: STATISTICAL NLP (WINTER 2017) 38
Recommend
More recommend