CSCI 5832 Natural Language Processing Jim Martin Lecture 8 2/7/08 1 Today 2/7 • Finish remaining LM issues Smoothing Backoff and Interpolation • Parts of Speech • POS Tagging • HMMs and Viterbi 2 2/7/08 Laplace smoothing • Also called add-one smoothing • Just add one to all the counts! • Very simple • MLE estimate: • Laplace estimate: • Reconstructed counts: 3 2/7/08 1
Laplace smoothed bigram counts 4 2/7/08 Laplace-smoothed bigrams 5 2/7/08 Reconstituted counts 6 2/7/08 2
Big Changes to Counts • C(count to) went from 608 to 238! • P(to|want) from .66 to .26! • Discount d= c*/c d for “chinese food” =.10!!! A 10x reduction So in general, Laplace is a blunt instrument Could use more fine-grained method (add-k) • Despite its flaws Laplace (add-k) is however still used to smooth other probabilistic models in NLP, especially For pilot studies in domains where the number of zeros isn’t so huge. 7 2/7/08 Better Discounting Methods • Intuition used by many smoothing algorithms Good-Turing Kneser-Ney Witten-Bell • Is to use the count of things we’ve seen once to help estimate the count of things we’ve never seen 8 2/7/08 Good-Turing • Imagine you are fishing There are 8 species: carp, perch, whitefish, trout, salmon, eel, catfish, bass • You have caught 10 carp, 3 perch, 2 whitefish, 1 trout, 1 salmon, 1 eel = 18 fish (tokens) = 6 species (types) • How likely is it that you’ll next see another trout? 9 2/7/08 3
Good-Turing • Now how likely is it that next species is new (i.e. catfish or bass) There were 18 distinct events... 3 of those represent singleton species 3/18 10 2/7/08 Good-Turing • But that 3/18s isn’t represented in our probability mass. Certainly not the one we used for estimating another trout. 11 2/7/08 Good-Turing Intuition • Notation: N x is the frequency-of-frequency-x So N 10 =1, N 1 =3, etc • To estimate total number of unseen species Use number of species (words) we’ve seen once c 0 * =c 1 p 0 = N 1 /N • All other estimates are adjusted (down) to give probabilities for unseen 12 Slide from Josh Goodman 2/7/08 4
Good-Turing Intuition • Notation: N x is the frequency-of-frequency-x So N 10 =1, N 1 =3, etc • To estimate total number of unseen species Use number of species (words) we’ve seen once c 0* =c 1 p 0 = N 1 /N p 0 =N 1 /N=3/18 • All other estimates are adjusted (down) to give probabilities for unseen P(eel) = c*(1) = (1+1) 1/ 3 = 2/3 13 Slide from Josh Goodman 2/7/08 Bigram frequencies of frequencies and GT re-estimates 14 2/7/08 GT smoothed bigram probs 15 2/7/08 5
Backoff and Interpolation • Another really useful source of knowledge • If we are estimating: trigram p(z|xy) but c(xyz) is zero • Use info from: Bigram p(z|y) • Or even: Unigram p(z) • How to combine the trigram/bigram/unigram info? 16 2/7/08 Backoff versus interpolation • Backoff : use trigram if you have it, otherwise bigram, otherwise unigram • Interpolation : mix all three 17 2/7/08 Interpolation • Simple interpolation • Lambdas conditional on context: 18 2/7/08 6
How to set the lambdas? • Use a held-out corpus • Choose lambdas which maximize the probability of some held-out data I.e. fix the N-gram probabilities Then search for lambda values That when plugged into previous equation Give largest probability for held-out set Can use EM to do this search 19 2/7/08 Practical Issues • We do everything in log space Avoid underflow (also adding is faster than multiplying) 20 2/7/08 Language Modeling Toolkits • SRILM • CMU-Cambridge LM Toolkit 21 2/7/08 7
Google N-Gram Release 22 2/7/08 Google N-Gram Release • serve as the incoming 92 • serve as the incubator 99 • serve as the independent 794 • serve as the index 223 • serve as the indication 72 • serve as the indicator 120 • serve as the indicators 45 • serve as the indispensable 111 • serve as the indispensible 40 • serve as the individual 234 23 2/7/08 LM Summary • Probability Basic probability Conditional probability Bayes Rule • Language Modeling (N-grams) N-gram Intro The Chain Rule Perplexity Smoothing: Add-1 Good-Turing 24 2/7/08 8
Break • Moving quiz to Thursday (2/14) • Readings Chapter 2: All Chapter 3: Skip 3.4.1 and 3.12 Chapter 4 Skip 4.7, 4.9, 4.10 and 4.11 Chapter 5 Read 5.1 through 5.5 25 2/7/08 Outline • Probability • Part of speech tagging Parts of speech Tag sets Rule-based tagging Statistical tagging Simple most-frequent-tag baseline Important Ideas Training sets and test sets Unknown words Error analysis HMM tagging 26 2/7/08 Part of Speech tagging • Part of speech tagging Parts of speech What’s POS tagging good for anyhow? Tag sets Rule-based tagging Statistical tagging Simple most-frequent-tag baseline Important Ideas Training sets and test sets Unknown words HMM tagging 27 2/7/08 9
Parts of Speech • 8 (ish) traditional parts of speech Noun, verb, adjective, preposition, adverb, article, interjection, pronoun, conjunction, etc Called: parts-of-speech, lexical category, word classes, morphological classes, lexical tags, POS Lots of debate in linguistics about the number, nature, and universality of these We’ll completely ignore this debate. 28 2/7/08 POS examples • N noun chair, bandwidth, pacing • V verb study, debate, munch • ADJ adjective purple, tall, ridiculous • ADV adverb unfortunately, slowly • P preposition of, by, to • PRO pronoun I, me, mine • DET determiner the, a, that, those 29 2/7/08 POS Tagging: Definition • The process of assigning a part-of-speech or lexical class marker to each word in a corpus: WORDS TAGS the koala put N the V keys P on DET the table 30 2/7/08 10
POS Tagging example WORD tag the DET koala N put V the DET keys N on P the DET table N 31 2/7/08 What is POS tagging good for? • First step of a vast number of practical tasks • Speech synthesis How to pronounce “lead”? INsult inSULT OBject obJECT OVERflow overFLOW DIScount disCOUNT CONtent conTENT • Parsing Need to know if a word is an N or V before you can parse • Information extraction Finding names, relations, etc. • Machine Translation 32 2/7/08 Open and Closed Classes • Closed class: a relatively fixed membership Prepositions: of, in, by, … Auxiliaries: may, can, will had, been, … Pronouns: I, you, she, mine, his, them, … Usually function words (short common words which play a role in grammar) • Open class: new ones can be created all the time English has 4: Nouns, Verbs, Adjectives, Adverbs Many languages have these 4, but not all! 33 2/7/08 11
Open class words • Nouns Proper nouns (Boulder, Granby, Eli Manning) English capitalizes these. Common nouns (the rest). Count nouns and mass nouns Count: have plurals, get counted: goat/goats, one goat, two goats Mass: don’t get counted (snow, salt, communism) (*two snows) • Adverbs: tend to modify things Unfortunately, John walked home extremely slowly yesterday Directional/locative adverbs (here,home, downhill) Degree adverbs (extremely, very, somewhat) Manner adverbs (slowly, slinkily, delicately) • Verbs : In English, have morphological affixes (eat/eats/eaten) 34 2/7/08 Closed Class Words • Idiosyncratic • Examples: prepositions: on, under, over, … particles: up, down, on, off, … determiners: a, an, the, … pronouns: she, who, I, .. conjunctions: and, but, or, … auxiliary verbs: can, may should, … numerals: one, two, three, third, … 35 2/7/08 Prepositions from CELEX 36 2/7/08 12
English particles 37 2/7/08 Conjunctions 38 2/7/08 POS tagging: Choosing a tagset • There are so many parts of speech, potential distinctions we can draw • To do POS tagging, need to choose a standard set of tags to work with • Could pick very coarse tagets N, V, Adj, Adv. • More commonly used set is finer grained, the “UPenn TreeBank tagset”, 45 tags PRP$, WRB, WP$, VBG • Even more fine-grained tagsets exist 39 2/7/08 13
Penn TreeBank POS Tag set 40 2/7/08 Using the UPenn tagset • The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./. • Prepositions and subordinating conjunctions marked IN (“although/IN I/PRP..”) • Except the preposition/complementizer “to” is just marked “TO”. 41 2/7/08 POS Tagging • Words often have more than one POS: back The back door = JJ On my back = NN Win the voters back = RB Promised to back the bill = VB • The POS tagging problem is to determine the POS tag for a particular instance of a word. These examples from Dekang Lin 42 2/7/08 14
How hard is POS tagging? Measuring ambiguity 43 2/7/08 2 methods for POS tagging 1. Rule-based tagging (ENGTWOL) 2. Stochastic (=Probabilistic) tagging HMM (Hidden Markov Model) tagging 44 2/7/08 Rule-based tagging • Start with a dictionary • Assign all possible tags to words from the dictionary • Write rules by hand to selectively remove tags • Leaving the correct tag for each word. 45 2/7/08 15
Recommend
More recommend