Bayes Rule prior likelihood probability ๐ ๐ ๐) = ๐ ๐ ๐) โ ๐(๐) ๐(๐) posterior probability marginal likelihood (probability)
Changing the Left 1 p(A) 0
Changing the Left 1 p(A) p(A, B) 0
Changing the Left 1 p(A) p(A, B) p(A, B, C) 0
Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) 0
Changing the Left 1 p(A) p(A, B) p(A, B, C) p(A, B, C, D) p(A, B, C, D, E) 0
Changing the Right 1 p(A) p(A | B) 0
Changing the Right 1 p(A | B) p(A) 0
Changing the Right 1 p(A | B) p(A) 0
Changing the Right Bias vs. Variance Lower bias: More specific to what we care about Higher variance: For fixed observations, estimates become less reliable
Probability Chain Rule ๐ ๐ฆ 1 , ๐ฆ 2 = ๐ ๐ฆ 1 ๐ ๐ฆ 2 ๐ฆ 1 ) Bayes rule
Probability Chain Rule ๐ ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ = ๐ ๐ฆ 1 ๐ ๐ฆ 2 ๐ฆ 1 )๐ ๐ฆ 3 ๐ฆ 1 , ๐ฆ 2 ) โฏ ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐
Probability Chain Rule ๐ ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ = ๐ ๐ฆ 1 ๐ ๐ฆ 2 ๐ฆ 1 )๐ ๐ฆ 3 ๐ฆ 1 , ๐ฆ 2 ) โฏ ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐ = ๐ เท ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐โ1 ) ๐
Probability Takeaways Basic probability axioms and definitions Probabilistic Independence Definition of joint probability Definition of conditional probability Bayes rule Probability chain rule
Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models
What Are Words? Linguists donโt agree (Human) Language-dependent White-space separation is a sometimes okay (for written English longform) Social media? Spoken vs. written? Other languages?
What Are Words? bat http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png
What Are Words? bats http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png
What Are Words? Fledermaus flutter mouse http://www.freepngimg.com/download/bat/9-2-bat-png-hd.png
What Are Words? piลirdiler They cooked it. piลmiลlermiลlerdi They had it cooked it.
What Are Words? ): my leg is hurting nasty ):
Examples of Text Normalization Segmenting or tokenizing words Normalizing word formats Segmenting sentences in running text
What Are Words? Tokens vs. Types The film got a great opening and the film went on to become a hit . Tokens : an instance of Types : an element of the that type in running text. vocabulary. โข The โข The โข film โข film โข got โข got โข a โข a โข great โข great โข opening โข opening โข and โข and โข the โข the โข film โข went โข went โข on โข on โข to โข to โข become โข become โข hit โข a โข . โข hit โข .
Some Issues with Tokenization mph, MPH, M.D. MD, M.D. Baltimore โs mayor I โm , w onโt state-of-the-art San Francisco
CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning)
CaSE inSensitive? Replace all letters with lower case version Can be useful for information retrieval (IR), machine translation, language modeling cat vs Cat (there are other ways to signify beginning) Butโฆ case can be useful Sentiment analysis, machine translation, information extraction US vs us
cat โ cats Lemma : same stem, part of speech, rough word sense cat and cats: same lemma Word form : the fully inflected surface form cat and cats: different word forms
Lemmatization Reduce inflections or variant forms to base form am, are, is ๏ฎ be car, cars, car's , cars' ๏ฎ car the boy's cars are different colors ๏ฎ the boy car be different color
Morphosyntax Morphemes: The small meaningful units that make up words Stems : The core meaning- bearing units Affixes : Bits and pieces that adhere to stems
Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look ๏ (they) looked words (they) ran ๏ (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run ๏ running (of the Bulls) adhere to stems code ๏ codeable
Morphosyntax Morphemes: The small Inflectional : meaningful units that make up (they) look ๏ (they) looked words (they) ran ๏ (they) run Stems : The core meaning- bearing units Derivational : Affixes : Bits and pieces that (a) run ๏ running (of the Bulls) adhere to stems code ๏ codeable Syntax: Contractions can rewrite and reorder a sentence Baltimore โs [mayor โs {campaign} ] ๏จ [ {the campaign} of the mayor] of Baltimore
Words vs. Sentences !, ? are relatively unambiguous Period โ.โ is quite ambiguous Sentence boundary Abbreviations like Inc. or Dr. Numbers like .02% or 4.3 Solution: write rules, build a classifier
Outline Probability review Words Defining Language Models Breaking & Fixing Language Models Evaluating Language Models
Goal of Language Modeling p ฮธ ( ) [โฆtext..] Learn a probabilistic model of text Accomplished through observing text and updating model parameters to make text more likely
Goal of Language Modeling p ฮธ ( ) [โฆtext..] Learn a probabilistic model of 0 โค ๐ ๐ [โฆ ๐ข๐๐ฆ๐ข โฆ ] โค 1 text Accomplished through เท ๐ ๐ ๐ข = 1 observing text and updating model parameters to make ๐ข:๐ข is valid text text more likely
โThe Unreasonable Effectiveness of Recurrent Neural Networks โ http://karpathy.github.io/2015/05/21/rnn-effectiveness/
โThe Unreasonable Effectiveness of Recurrent Neural Networks โ http://karpathy.github.io/2015/05/21/rnn-effectiveness/ โThe Unreasonable Effectiveness of Character - level Language Modelsโ (and why RNNs are still cool) http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139
Simple Count-Based ๐ item
Simple Count-Based โproportional toโ ๐ item โ ๐๐๐ฃ๐๐ข(item)
Simple Count-Based โproportional toโ ๐ item โ ๐๐๐ฃ๐๐ข item ๐๐๐ฃ๐๐ข(item) = ฯ any other item ๐ง ๐๐๐ฃ๐๐ข(y)
Simple Count-Based โproportional toโ ๐ item โ ๐๐๐ฃ๐๐ข item ๐๐๐ฃ๐๐ข(item) = ฯ any other item ๐ง ๐๐๐ฃ๐๐ข(y) constant
Simple Count-Based ๐ item โ ๐๐๐ฃ๐๐ข(item) sequence of characters ๏ pseudo-words sequence of words ๏ pseudo-phrases
Shakespearian Sequences of Characters
Shakespearian Sequences of Words
Novel Words, Novel Sentences โColorless green ideas sleep furiouslyโ โ Chomsky (1957) Letโs observe and record all sentences with our big, bad supercomputer Red ideas? Read ideas?
Probability Chain Rule ๐ ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ = ๐ ๐ฆ 1 ๐ ๐ฆ 2 ๐ฆ 1 )๐ ๐ฆ 3 ๐ฆ 1 , ๐ฆ 2 ) โฏ ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐
Probability Chain Rule ๐ ๐ฆ 1 , ๐ฆ 2 , โฆ , ๐ฆ ๐ = ๐ ๐ฆ 1 ๐ ๐ฆ 2 ๐ฆ 1 )๐ ๐ฆ 3 ๐ฆ 1 , ๐ฆ 2 ) โฏ ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐ = ๐ เท ๐ ๐ฆ ๐ ๐ฆ 1 , โฆ , ๐ฆ ๐โ1 ) ๐
N-Grams Maintaining an entire inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously)
N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) *
N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) *
N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * p(ideas | Colorless green) * p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)
N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * apply the p(ideas | Colorless green) * chain rule p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)
N-Grams Maintaining an entire joint inventory over sentences could be too much to ask Store โsmallerโ pieces? p(Colorless green ideas sleep furiously) = p(Colorless) * p(green | Colorless) * apply the p(ideas | Colorless green) * chain rule p(sleep | Colorless green ideas) * p(furiously | Colorless green ideas sleep)
N-Grams p(furiously | Colorless green ideas sleep) How much does โColorlessโ influence the choice of โfuriously?โ
Recommend
More recommend