language modeling
play

Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP - PowerPoint PPT Presentation

Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 24, 2017 Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from. Outline Wrapup Word Embeddings Introduction to Language Models


  1. Language Modeling Prof. Sameer Singh CS 295: STATISTICAL NLP WINTER 2017 January 24, 2017 Based on slides from Dan Jurafsky, Noah Smith, and everyone else they copied from.

  2. Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 2

  3. Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 3

  4. Predict surrounding words A bottle of tezguino is on the table. u v CS 295: STATISTICAL NLP (WINTER 2017) 4

  5. Negative Sampling CS 295: STATISTICAL NLP (WINTER 2017) 5

  6. Neural View of Embeddings CS 295: STATISTICAL NLP (WINTER 2017) 6

  7. Word embeddings Variations • Skip-gram: predict context from word CBOW: predict word from context bag of words • • Dependencies: a better description of context Uses • Similarity: Grammar: • • Analogies Gender: • Facts: • CS 295: STATISTICAL NLP (WINTER 2017) 7

  8. Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 8

  9. Language Models Probability of a Sentence Is a given sentence something you would expect to see? • Syntactically (grammar) and Semantically (meaning) • Probability of the Next Word Predict what comes next for a given sequence of words. • Think of it as V-way classification • CS 295: STATISTICAL NLP (WINTER 2017) 9

  10. Task: Speech Recognition “eyes awe of an” OR “I saw a van” CS 295: STATISTICAL NLP (WINTER 2017) 10

  11. Task: Machine Translation CS 295: STATISTICAL NLP (WINTER 2017) 11

  12. Task: Handwriting Recognition http://www.cedar.buffalo.edu/handwriting/HRoverview.html CS 295: STATISTICAL NLP (WINTER 2017) 12

  13. Task: Image Captioning CS 295: STATISTICAL NLP (WINTER 2017) 13

  14. Task: Spelling Correction The office is about fifteen minuets from my house P(about fifteen minutes from) >> P(about fifteen minuets from) CS 295: STATISTICAL NLP (WINTER 2017) 14

  15. Other Applications Summarization Question Answering Dialog Systems CS 295: STATISTICAL NLP (WINTER 2017) 15

  16. Evaluating Language Models Best choice: Extrinsic 2nd choice: Intrinsic CS 295: STATISTICAL NLP (WINTER 2017) 16

  17. Perplexity CS 295: STATISTICAL NLP (WINTER 2017) 17

  18. Generating Text from an LM CS 295: STATISTICAL NLP (WINTER 2017) 18

  19. Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 19

  20. Direct Language Modeling P(“I do not like green eggs and ham”) P(w | “I do not like green eggs and ”) CS 295: STATISTICAL NLP (WINTER 2017) 20

  21. Applying the Chain Rule CS 295: STATISTICAL NLP (WINTER 2017) 21

  22. Markov Assumption 22 CS 295: STATISTICAL NLP (WINTER 2017)

  23. Unigram Language Model CS 295: STATISTICAL NLP (WINTER 2017) 23

  24. Bigram Language Model CS 295: STATISTICAL NLP (WINTER 2017) 24

  25. Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 25

  26. Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 26

  27. N-Gram Language Models “The computer which I had just put into the “The computer which I had just put into the dining room on the fifth floor crashed.” dining room on the fifth floor had lunch.” CS 295: STATISTICAL NLP (WINTER 2017) 27

  28. Shakespeare CS 295: STATISTICAL NLP (WINTER 2017) 28

  29. Wall Street Journal CS 295: STATISTICAL NLP (WINTER 2017) 29

  30. Implementation Tips Use Logs Prevent underflow • Sums, instead of products • Filter out n-grams • Rare n-grams are noisy/have low prob Use unigrams to filter bigrams… • CS 295: STATISTICAL NLP (WINTER 2017) 30

  31. Outline Wrapup Word Embeddings Introduction to Language Models N-Gram Based Language Models Smoothing Language Models CS 295: STATISTICAL NLP (WINTER 2017) 31

  32. Zero Probability Problem Training set: Test set • … denied the allegations … denied the offer … denied the reports … denied the loan … denied the claims … denied the request P(“offer” | denied the) = 0 Rare words/combinations Mispellings New words Because corpus is finite.. “minuets” Truthiness • • • #letalonethehashtags • bigly • CS 295: STATISTICAL NLP (WINTER 2017) 32

  33. Laplace Smoothing CS 295: STATISTICAL NLP (WINTER 2017) 33

  34. Intuition Behind Smoothing When we have sparse statistics: P(w | denied the) 3 allegations allegations 2 reports reports 1 claims outcome … 1 request request attack claims man 7 total Steal probability mass to generalize better P(w | denied the) 2.5 allegations allegations allegations 1.5 reports 0.5 claims outcome … reports 0.5 request attack man claims request 2 other 7 total CS 295: STATISTICAL NLP (WINTER 2017) 34

  35. Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 35

  36. Berkeley Restaurant Project CS 295: STATISTICAL NLP (WINTER 2017) 36

  37. Backoff and Interpolation Backoff Use trigram, unless rare • Then use bigram, unless rare • Then use unigram.. • Interpolation Combine all three! • Linear function with parameters • Learn on held out data • CS 295: STATISTICAL NLP (WINTER 2017) 37

  38. Upcoming… Homework 1 is due: January 26, 2017 • Homework Write-up, data, and code for Homework 2 is up • Homework 2 is due: February 9, 2017 • • Proposal is due: February 7, 2017 (~2 weeks) Make things more concrete: approach, metrics, baselines • Project • Mention progress, and address my concerns, if any Only 2 pages • CS 295: STATISTICAL NLP (WINTER 2017) 38

Recommend


More recommend