probability language modeling
play

Probability & Language Modeling CMSC 473/673 UMBC Some slides - PowerPoint PPT Presentation

Probability & Language Modeling CMSC 473/673 UMBC Some slides adapted from 3SLP, Jason Eisner VISION AUDIO prosody intonation orthography color morphology lexemes syntax semantics pragmatics discourse Three people have been


  1. Bayes Rule prior likelihood probability ๐‘ž ๐‘Œ ๐‘) = ๐‘ž ๐‘ ๐‘Œ) โˆ— ๐‘ž(๐‘Œ) ๐‘ž(๐‘) posterior probability marginal likelihood (probability)

  2. Changing the Left 1 p(A) 0

  3. Changing the Left 1 p(A) p(A, B) 0

  4. Changing the Left 1 p(A) p(A, B) p(A, B, C) 0

  5. Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) 0

  6. Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) p(A, B, C, D, E) 0

  7. Changing the Right 1 p(A) p(A | B) 0

  8. Changing the Right 1 p(A | B) p(A) 0

  9. Changing the Right 1 p(A | B) p(A) 0

  10. Changing the Right Bias vs. Variance Lower bias: More specific to what we care about Higher variance: For fixed observations, estimates become less reliable

  11. Probability Chain Rule ๐‘ž ๐‘ฆ 1 , ๐‘ฆ 2 = ๐‘ž ๐‘ฆ 1 ๐‘ž ๐‘ฆ 2 ๐‘ฆ 1 ) Bayes rule

  12. Probability Chain Rule ๐‘ž ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘‡ = ๐‘ž ๐‘ฆ 1 ๐‘ž ๐‘ฆ 2 ๐‘ฆ 1 )๐‘ž ๐‘ฆ 3 ๐‘ฆ 1 , ๐‘ฆ 2 ) โ‹ฏ ๐‘ž ๐‘ฆ ๐‘‡ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘—

  13. Probability Chain Rule ๐‘ž ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘‡ = ๐‘ž ๐‘ฆ 1 ๐‘ž ๐‘ฆ 2 ๐‘ฆ 1 )๐‘ž ๐‘ฆ 3 ๐‘ฆ 1 , ๐‘ฆ 2 ) โ‹ฏ ๐‘ž ๐‘ฆ ๐‘‡ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘— = ๐‘‡ เท‘ ๐‘ž ๐‘ฆ ๐‘— ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘—โˆ’1 ) ๐‘—

  14. Probability Takeaways Basic probability axioms and definitions Probabilistic Independence Definition of joint probability Definition of conditional probability Bayes rule Probability chain rule

  15. Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models

  16. What Are Words? Linguists donโ€™t agree (Human) Language-dependent White-space separation is a sometimes okay (for written English longform) Social media? Spoken vs. written? Other languages?

  17. What Are Words? bat http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

  18. What Are Words? bats http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

  19. What Are Words? Fledermaus flutter mouse http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png

  20. What Are Words? piลŸirdiler They cooked it. piลŸmiลŸlermiลŸlerdi They had it cooked it.

  21. What Are Words? ): my leg is hurting nasty ):

  22. Examples of Text Normalization Segmenting or tokenizing words Normalizing word formats Segmenting sentences in running text

  23. What Are Words? Tokens vs. Types The film got a great opening and the film went on to become a hit . Tokens : an instance of Types : an element of the that type in running text. vocabulary. โ€ข The โ€ข The โ€ข film โ€ข film โ€ข got โ€ข got โ€ข a โ€ข a โ€ข great โ€ข great โ€ข opening โ€ข opening โ€ข and โ€ข and โ€ข the โ€ข the โ€ข film โ€ข went โ€ข went โ€ข on โ€ข on โ€ข to โ€ข to โ€ข become โ€ข become โ€ข hit โ€ข a โ€ข . โ€ข hit โ€ข .

  24. Some Issues with Tokenization mph, MPH, M.D. MD, M.D. Baltimore โ€™s mayor I โ€™m , w onโ€™t state-of-the-art San Francisco

  25. CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning)

  26. CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning) Butโ€ฆ case can be useful Sentiment analysis, machine translation, information extraction US vs us

  27. cat โ‰Ÿ cats Lemma : same stem, part of speech, rough word sense cat and cats: same lemma Word form : the fully inflected surface form cat and cats: different word forms

  28. Lemmatization Reduce inflections or variant forms to base form am, are, is ๏‚ฎ be car, cars, car's , cars' ๏‚ฎ car the boy's cars are different colors ๏‚ฎ the boy car be different color

  29. Morphosyntax Morphemes: The small meaningful units that make up words Stems : The core meaning- bearing units Affixes : Bits and pieces that adhere to stems

  30. Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look ๏ƒ  (they) looked words (they) ran ๏ƒ  (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run ๏ƒ  running (of the Bulls) adhere to stems code ๏ƒ  codeable

  31. Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look ๏ƒ  (they) looked words (they) ran ๏ƒ  (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run ๏ƒ  running (of the Bulls) adhere to stems code ๏ƒ  codeable Syntax: Contractions can rewrite and reorder a sentence Baltimore โ€™s [mayor โ€™s {campaign} ] ๏ƒจ [ {the campaign} of the mayor] of Baltimore

  32. Words vs. Sentences !, ? are relatively unambiguous Period โ€œ.โ€ is quite ambiguous Sentence boundary Abbreviations like Inc. or Dr. Numbers like .02% or 4.3 Solution: write rules, build a classifier

  33. Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models

  34. Goal of Language Modeling p ฮธ ( ) [โ€ฆtext..] Learn a probabilistic model of text Accomplished through observing text and updating model parameters to make text more likely

  35. Goal of Language Modeling p ฮธ ( ) [โ€ฆtext..] Learn a probabilistic model of 0 โ‰ค ๐‘ž ๐œ„ [โ€ฆ ๐‘ข๐‘“๐‘ฆ๐‘ข โ€ฆ ] โ‰ค 1 text Accomplished through เท ๐‘ž ๐œ„ ๐‘ข = 1 observing text and updating model parameters to make ๐‘ข:๐‘ข is valid text text more likely

  36. โ€œThe Unreasonable Effectiveness of Recurrent Neural Networks โ€ http://karpathy.github.io/2015/05/21/rnn-effectiveness/

  37. โ€œThe Unreasonable Effectiveness of Recurrent Neural Networks โ€ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ โ€œThe Unreasonable Effectiveness of Character - level Language Modelsโ€ (and why RNNs are still cool) http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139

  38. Simple Count-Based ๐‘ž item

  39. Simple Count-Based โ€œproportional toโ€ ๐‘ž item โˆ ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(item)

  40. Simple Count-Based โ€œproportional toโ€ ๐‘ž item โˆ ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข item ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(item) = ฯƒ any other item ๐‘ง ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(y)

  41. Simple Count-Based โ€œproportional toโ€ ๐‘ž item โˆ ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข item ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(item) = ฯƒ any other item ๐‘ง ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(y) constant

  42. Simple Count-Based ๐‘ž item โˆ ๐‘‘๐‘๐‘ฃ๐‘œ๐‘ข(item) sequence of characters ๏ƒ  pseudo-words sequence of words ๏ƒ  pseudo-phrases

  43. Shakespearian Sequences of Characters

  44. Shakespearian Sequences of Words

  45. Novel Words, Novel Sentences โ€œColorless green ideas sleep furiouslyโ€ โ€“ Chomsky (1957) Letโ€™s observe and record all sentences with our big, bad supercomputer Red ideas? Read ideas?

  46. Probability Chain Rule ๐‘ž ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘‡ = ๐‘ž ๐‘ฆ 1 ๐‘ž ๐‘ฆ 2 ๐‘ฆ 1 )๐‘ž ๐‘ฆ 3 ๐‘ฆ 1 , ๐‘ฆ 2 ) โ‹ฏ ๐‘ž ๐‘ฆ ๐‘‡ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘—

  47. Probability Chain Rule ๐‘ž ๐‘ฆ 1 , ๐‘ฆ 2 , โ€ฆ , ๐‘ฆ ๐‘‡ = ๐‘ž ๐‘ฆ 1 ๐‘ž ๐‘ฆ 2 ๐‘ฆ 1 )๐‘ž ๐‘ฆ 3 ๐‘ฆ 1 , ๐‘ฆ 2 ) โ‹ฏ ๐‘ž ๐‘ฆ ๐‘‡ ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘— = ๐‘‡ เท‘ ๐‘ž ๐‘ฆ ๐‘— ๐‘ฆ 1 , โ€ฆ , ๐‘ฆ ๐‘—โˆ’1 ) ๐‘—

  48. N-Grams Maintaining an entire inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously)

  49. N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) *

  50. N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) *

  51. N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * p(ideas | Colorless green) * p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)

  52. N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * apply the p(ideas | Colorless green) * chain rule p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)

  53. N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โ€œsmallerโ€ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * apply the p(ideas | Colorless green) * chain rule p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)

  54. N-Grams p(furiously | Colorless green ideas sleep) How much does โ€œColorlessโ€ influence the choice of โ€œfuriously?โ€

Recommend


More recommend