probabilistic context free grammars
play

Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: - PowerPoint PPT Presentation

Motivation Motivation Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Applications Applications 1 Motivation Ambiguity Coverage Probabilistic Context-Free Grammars Zipfs Law Informatics 2A: Lecture 19 2


  1. Motivation Motivation Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Applications Applications 1 Motivation Ambiguity Coverage Probabilistic Context-Free Grammars Zipf’s Law Informatics 2A: Lecture 19 2 Probabilistic Context-Free Grammars Conditional Probabilities Distributions Bonnie Webber (revised by Frank Keller) 3 Applications School of Informatics Disambiguation University of Edinburgh Formalization bonnie@inf.ed.ac.uk Language Modeling 4 November 2008 Reading: J&M 2 nd edition, ch. 14 (Introduction → Section 14.6) Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 1 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 2 Motivation Ambiguity Motivation Ambiguity Probabilistic Context-Free Grammars Coverage Probabilistic Context-Free Grammars Coverage Applications Zipf’s Law Applications Zipf’s Law Motivation Motivation 1: Ambiguity Language is highly ambiguous: The amount of ambiguity – both lexical and structural – increases with sentence length. Real sentences, even in newspapers or email, are fairly long Three things motivate the use of probabilities in grammars and (avg. sentence length in the Wall Street Journal is 25 words). parsing: A second provision passed by the Senate and House would 1 Ambiguity – ie, the same thing motivating chart parsing, eliminate a rule allowing companies that post losses resulting LL(1) parsing, etc. from LBO debt to receive refunds of taxes paid over the 2 Coverage – Issues in developing a grammar for a language previous three years. [wsj 1822] (33 words) 3 Zipf’s Law Long sentences with high ambiguity pose a problem, even for chart parsers, if they have to keep track of all possible analyses. It would reduce the amount of work required if we could ignore improbable analyses. Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 3 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 4

  2. Motivation Ambiguity Motivation Ambiguity Probabilistic Context-Free Grammars Coverage Probabilistic Context-Free Grammars Coverage Applications Zipf’s Law Applications Zipf’s Law Motivation 2: Coverage Motivation 3: Zipf’s Law (Again) As with words and parts-of-speech, the distribution of grammar It is actually very difficult to write a grammar that covers all constructions is also Zipfian, but the likelihood of a particular the constructions used in ordinary text or speech (e.g., in a construction can vary, depending on: newspaper). register (formal vs. informal): eg, greenish, alot , subject-drop Typically hundreds of rules are required in order to capture ( Want a beer? ) are all more probable in informal than formal both all the different linguistic patterns and all the different register; possible analyses of the same pattern. (Recall in lecture 14, genre (newspapers, essays, mystery stories, jokes, ads, etc.): the grammar rules we had to add to cover three different Clear from the difference in PoS-taggers trained on different analyses of You made her duck .) genres in the Brown Corpus. Ideally, one wants to induce (learn) a grammar from a corpus. domain (biology, patent law, football, etc.). Grammar induction requires probabilities. Probabilistic grammars and parsers can reflect these kinds of distributions. Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 5 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 6 Motivation Ambiguity Motivation Ambiguity Probabilistic Context-Free Grammars Coverage Probabilistic Context-Free Grammars Coverage Applications Zipf’s Law Applications Zipf’s Law Example: Improbable parse Example: Probable parse Let’s compare an improbable but grammatical parse for a sentence S with its probable parse. NP PP VP (1) In a general way, such ideas are relevant to suggesting how In a general way such ideas are relevant Comp organisms we know might possibly end up fleeing from household objects. to suggesting S Adv NP VP S how organisms PP AP Absolute NP VP might possibly end up Comp In a general way RC relevant PP we know VP organisms fleeing from household objects such ideas are to suggesting how NP VP Both parses and many more would be produced by an parser that might AP Ptcpl objects had to compute all grammatical analyses. possibly end up fleeing from household What’s the alternative? What’s odd about this? Why is it improbable? Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 7 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 8

  3. Motivation Motivation Conditional Probabilities Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Distributions Distributions Applications Applications Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars We can try associating the likelihood of an analysis with the likelihood of its grammar rules. If we consider all the rules for a non-terminal A : Given a grammar G = ( N , Σ , P , S ), a PCFG augments each rule in P with a conditional probability p . A → β 1 [ p 1 ] This p represents the probability that non-terminal A will expand . . . to the sequence β , which we can write as A → β k [ p k ] A → β [ p ] then the sum of their probabilities ( p 1 + · · · + p k ) must be 1. or This ensures the probabilities form a valid probability distribution. P ( A → β | A ) = p or P ( A → β ) = p Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 9 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 10 Motivation Motivation Conditional Probabilities Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Distributions Distributions Applications Applications Example Example (from nlp.stanford.edu) Consider the very simple grammar G rhubarb : Suppose there’s only one rule for the non-terminal S in the S → rhubarb [ 1 grammar: 3 ] S → S S [ 2 3 ] S → NP VP S S 2/3 What is P (S → NP VP)? 1/3 S S rhubarb 1/3 1/3 A PCFG is said to be consistent if the sum of the probabilities of rhubarb rhubarb all sentences in the language equals 1. Note: Recursive rules can cause a grammar to be inconsistent. P(rhubarb) = 1 3 P(rhubarb rhubarb) = 2 3 x 1 3 x 1 2 3 = 27 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 11 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 12

  4. Motivation Motivation Conditional Probabilities Conditional Probabilities Probabilistic Context-Free Grammars Probabilistic Context-Free Grammars Distributions Distributions Applications Applications Questions about PCFGs S S Four questions are of interest regarding PCFGs: 2/3 2/3 S S S S 1 Applications: what can we use PCFGs for? 2/3 2/3 1/3 1/3 2 Estimation: given a corpus and a grammar, how can we rhubarb S S rhubarb S S 1/3 1/3 1/3 1/3 induce the rule probabilities? rhubarb rhubarb rhubarb rhubarb 3 Parsing: given a string and a PCFG, how can we efficiently compute the most probable parse? 3 ) 2 x ( 1 3 ) 3 x 2 = 4 Grammar induction: given a corpus, how can be induce both P(rhubarb rhubarb rhubarb) = ( 2 8 243 the grammar and the rule probabilities? . . . Σ P(L rhurbarb ) = 1 2 243 + . . . = 1 8 3 + 27 + In this lecture, we will deal with question 1. The next lecture will 2 deal with questions 2 and 3. Question 4 is addressed in the So the grammar G rhubarb is inconsistent. 3 rd -year course Introduction to Cognitive Science and the 4 th -year courses in Cognitive Modelling and Machine Translation . Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 13 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 14 Motivation Disambiguation Motivation Disambiguation Probabilistic Context-Free Grammars Formalization Probabilistic Context-Free Grammars Formalization Applications Language Modeling Applications Language Modeling Application 1: Disambiguation Application 1: Disambiguation The probability that a PCFG assigns to a parse tree can be used to Example grammar: disambiguate sentences that have more than one parse. R1 S → NP VP (0.85) R9 VP → TV NP NP (0.05) Assumption: The most probable parse is the intended parse. R2 S → Aux NP VP (0.15) R10 VP → TV NP (0.4) The probability of a parse T for a sentence S is defined as the R3 NP → PRO (0.4) R11 VP → IV (0.55) R4 NP → NOM (0.05) R12 Aux → can (0.4) product of the probability of each rule r used to expand a node n R5 NP → NPR (0.35) R13 N → flights (0.5) in the parse tree. R6 NP → NPR NOM (0.2) R14 PRO → you (0.4) R7 NOM → N (0.75) R15 TV → book (0.3) P ( T , S ) = � n ∈ T p ( r ( n )) R8 NOM → N PP (0.25) R16 NPR → TWA (0.4) Since a sentence S corresponds to the yield of the parse tree T , P(R1) + P(R2) = ? P ( S | T ) = 1, hence: P(R3) + P(R4) + P(R5) + P(R6) = ? . . . P ( T ) = P ( T , S ) P ( S | T ) = P ( T , S ) = P ( T , S ) = � n ∈ T p ( r ( n )) 1 What does this imply about the lexical rules given here? Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 15 Informatics 2A: Lecture 19 Probabilistic Context-Free Grammars 16

Recommend


More recommend