Noisy Channel Models CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu
T oday • HW1, Q&A • Weighted FSAs • Noisy Channel Models • Project 1
HW1: your goals for the class based on word frequency i 58 research 17 to 57 linguistics 15 and 53 language 14 in 45 some 14 the 33 processing 12 , 27 be 11 of 26 computational11 learn 20 natural 11 nlp 19 want 11 a 18 have 10 my 18 how 10 this 18 is 10 on 9 like 10
HW1: word frequency distribution
HW1: your goals for the class based on word frequency (no stopwords) learn 20 techniques 6 security 2 nlp 19 projects 6 hci 2 research 17 class 6 visualization1 linguistics 15 apply 6 social 1 language 14 models 5 search 1 processing 12 interested 5 probabilistic1 want 11 goal 5 news 1 natural 11 work 4 media 1 computational11 systems 4 linguistic 1 like 10 study 4 interactive 1 understanding9 human 4 interaction 1 machine 9 data 4 human-in-the-loop 1 computer 4 human-computer 1 applications 4
HW1: probability review Suppose that 1/100,000 of the population has the ability to read other people's minds. You have a test that, if someone can read minds, reads positive with 95% probability; and, if someone cannot read minds, reads negative with 99.5% probability. I take the test and it reads positive. What is the probability that I can do mind reading? (Express your answer as real number in [0,1])
T oday • HW1, Q&A • Weighted FSAs • Noisy Channel Models • Project 1
Bigram Language Model <s> I am Sam </s> <s> Sam I am </s> I do not like green eggs and ham <s> </s> Training Corpus P( I | <s> ) = 2/3 = 0.67 P( Sam | <s> ) = 1/3 = 0.33 P( am | I ) = 2/3 = 0.67 P( do | I ) = 1/3 = 0.33 P( </s> | Sam )= 1/2 = 0.50 P( Sam | am) = 1/2 = 0.50 ... Bigram Probability Estimates
FSA as a language model he saw me he ran home she talked How does this FSA language model differ from a bigram model?
Weighted FSAs • Assigns a probability to each string that it accepts • Usually probabilities add up to one – But not necessary • Strings that are not accepted are said to have probability zero
Weighted FSA as a language model
Weighted Finite-State Automata • We can view n-gram language models as weighted finite state automata • We can also define weighted finite-state transducers – Generates pairs of strings and assigns a weight to each pair – Weight can often be interpreted as conditional probability P(output-string | input-string)
T oday • HW1, Q&A • Weighted FSAs • Noisy Channel Models • Project 1
Noisy Channel Models • Divide-and-conquer strategy common in NLP modeling P(X) source model P(Y|X) channel model X* = argmax_x P(X|Y) • Goal: recover X from Y (decoding)
Noisy Channel Models: Spelling correction
Noisy Channel Models: T okenization
Noisy Channel Models: Speech Recognition
Weighted FSTs and the Noisy Channel Model
Exercise: • Define noisy channel models for – Machine translation from French to English – Question Answering
T oday • HW1, Q&A • Weighted FSAs – and how they relate to n-gram models • Noisy Channel Models – Source model, Channel model, Decoding • Project 1
Recall: Complete Morphological Parser
Recall: Practical NLP Applications • In practice, it is almost never necessary to write FSTs by hand … • Typically, one writes rules: – Chomsky and Halle Notation: a → b / c__d = rewrite a as b when occurs between c and d – E-Insertion rule x ^ __ s # ε → e / s z • Rule → FST compiler handles the rest …
P1: practical details www.cs.umd.edu/class/fall2015/cmsc723/p1.html Teams of 2 or 3 Due before class on Tu Sep 29 Submit code/outputs using handin (see details in piazza post)
T oday • HW1, Q&A • Weighted FSAs • Noisy Channel Models • Project 1
What ’ s next … • Supervised classification, neural networks, and neural language modeling • Project 1 lab
Recommend
More recommend