Motivation Motivation Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Applications Applications Probabilistic Context-Free Grammars Informatics 2A: Lecture 18 Bonnie Webber and Frank Keller Reading: J&M 2 nd edition, ch. 14 (Section 14.2–14.6.1) School of Informatics University of Edinburgh bonnie@inf.ed.ac.uk 5 November 2009 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 1 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 2 Ambiguity Ambiguity Motivation Motivation Coverage Coverage Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Zipf’s Law Zipf’s Law Applications Applications Human Processing Human Processing Motivation Motivation 1: Ambiguity Language is highly ambiguous: The amount of ambiguity – both lexical and structural – increases with sentence length. Four things motivate the use of probabilities in grammars and Real sentences, even in newspapers or email, are fairly long parsing: (avg. sentence length in the Wall Street Journal is 25 words). 1 Ambiguity – ie, the same thing motivating chart parsing, A second provision passed by the Senate and House would LL(1) parsing, etc. eliminate a rule allowing companies that post losses resulting 2 Coverage – Issues in developing a grammar for a language from LBO debt to receive refunds of taxes paid over the previous three years. [wsj 1822] (33 words) 3 Zipf’s Law The amount of (unexpected!) ambiguity increases rapidly with 4 Empirical evidence from studies of human language processing sentence length. This poses a problem for parsers, even chart parsers, that keep track of all possible analyses. We could cut down the amount of work if we could ignore improbable analyses. Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 3 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 4
Ambiguity Ambiguity Motivation Motivation Coverage Coverage Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Zipf’s Law Zipf’s Law Applications Applications Human Processing Human Processing Motivation 2: Coverage Motivation 3: Zipf’s Law (Again) As with words and parts-of-speech, the distribution of grammar It is actually very difficult to write a grammar that covers all constructions is also Zipfian, but the likelihood of a particular the constructions used in ordinary text or speech (e.g., in a construction can vary, depending on: newspaper). register (formal vs. informal): eg, greenish, alot , subject-drop Typically hundreds of rules are required in order to capture ( Want a beer? ) are all more probable in informal than formal both all the different linguistic patterns and all the different register; possible analyses of the same pattern. (Recall from lecture 13, genre (newspapers, essays, mystery stories, jokes, ads, etc.): the grammar rules we had to add to cover three different Clear from the difference in PoS-taggers trained on different analyses of You made her duck .) genres in the Brown Corpus. Ideally, one wants to induce (learn) a grammar from a corpus. domain (biology, patent law, football, etc.). Grammar induction requires probabilities. Probabilistic grammars and parsers can reflect these kinds of distributions. Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 5 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 6 Ambiguity Ambiguity Motivation Motivation Coverage Coverage Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Zipf’s Law Zipf’s Law Applications Applications Human Processing Human Processing Motivation 4: Human Sentence Processing Example: Improbable parse Let’s compare an unexpected (and improbable) but grammatical parse for a 22-word sentence with a more probable parse. While almost every sentence is ambiguous in some way (even this (1) In a general way, such ideas are relevant to suggesting how one!), we (as people) rarely notice it. organisms we know might possibly end up fleeing from household objects. Instead, we seem to see only one interpretation – although we may see different interpretations in different contexts. S Probabilities in the grammar or parser or both seem a good way to PP AP Absolute NP VP model this. In a general way RC relevant PP we know VP organisms such ideas are to suggesting how NP VP More about this in Lectures 28–30. might AP Ptcpl objects possibly end up fleeing from household What’s odd about this? Why is it improbable? Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 7 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 8
Ambiguity Motivation Motivation Coverage Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Zipf’s Law Distributions Applications Applications Human Processing Example: Probable parse Probabilistic Context-Free Grammars S We can try associating the likelihood of an analysis with the likelihood of its grammar rules. PP NP VP Given a grammar G = ( N , Σ , P , S ), a PCFG augments each rule in In a general way such ideas are relevant Comp P with a conditional probability p . to suggesting S This p represents the probability that non-terminal A will expand to the sequence β , which we can write as Adv NP VP how organisms might possibly end up Comp A → β [ p ] or fleeing from household objects P ( A → β | A ) = p Both parses and many more would be produced by an parser that or had to compute all grammatical analyses. P ( A → β ) = p What’s the alternative? Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 9 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 10 Motivation Motivation Conditional Probabilities Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Distributions Distributions Applications Applications Probabilistic Context-Free Grammars Example Suppose there’s only one rule for the non-terminal S in the grammar: If we consider all the rules for a non-terminal A : S → NP VP A → β 1 [ p 1 ] . . . What is P (S → NP VP)? A → β k [ p k ] A PCFG is said to be consistent if the sum of the probabilities of then the sum of their probabilities ( p 1 + · · · + p k ) must be 1. all sentences in the language equals 1. This ensures the probabilities form a valid probability distribution. Note: Recursive rules can cause a grammar to be inconsistent. Let’s see why. Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 11 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 12
Motivation Motivation Conditional Probabilities Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Distributions Distributions Applications Applications Example (from nlp.stanford.edu) S S Consider the very simple grammar G rhubarb : 2/3 2/3 S S S S 2/3 2/3 1/3 1/3 S → rhubarb [ 1 3 ] rhubarb S S rhubarb S → S S [ 2 S S 3 ] 1/3 1/3 1/3 1/3 rhubarb rhubarb rhubarb rhubarb S S 2/3 1/3 S S 3 ) 2 x ( 1 3 ) 3 x 2 = P(rhubarb rhubarb rhubarb) = ( 2 8 rhubarb 1/3 1/3 243 . . . rhubarb rhubarb Σ P(L rhurbarb ) = 1 2 243 + . . . = 1 8 3 + 27 + 2 P(rhubarb) = 1 So the grammar G rhubarb is inconsistent. 3 P(rhubarb rhubarb) = 2 3 x 1 3 x 1 2 3 = 27 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 13 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 14 Motivation Motivation Disambiguation Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Formalization Distributions Applications Applications Language Modeling Questions about PCFGs Application 1: Disambiguation The probability that a PCFG assigns to a parse tree can be used to Four questions are of interest regarding PCFGs: disambiguate sentences that have more than one parse. 1 Applications: What can we use PCFGs for? Assumption: The most probable parse is the intended parse. 2 Estimation: Given a corpus and a grammar, how can we The probability of a parse T for a sentence S is defined as the induce the rule probabilities? product of the probability of each rule r used to expand a node n 3 Parsing: Given a string and a PCFG, how can we efficiently in the parse tree. compute the most probable parse? 4 Grammar induction: Given a corpus, how can we induce both P ( T , S ) = � n ∈ T p ( r ( n )) the grammar and the rule probabilities? In this lecture, we will deal with question 1. The next lecture will Since a sentence S corresponds to the yield of the parse tree T , deal with questions 2 and 3. Question 4 is addressed in the P ( S | T ) = 1, hence: 3 rd -year course Introduction to Cognitive Science (Semester 1) and the 4 th -year courses in Cognitive Modelling and in Machine P ( T ) = P ( T , S ) P ( S | T ) = P ( T , S ) = P ( T , S ) = � n ∈ T p ( r ( n )) 1 Translation (both Semester 2). Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 15 Informatics 2A: Lecture 18 Probabilistic Context-Free Grammars 16
Recommend
More recommend