Standard PCFGs Standard PCFGs Lexicalized PCFGs Lexicalized PCFGs 1 Standard PCFGs Parameter Estimation Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2: Ignoring Lexical Information Informatics 2A: Lecture 20 2 Lexicalized PCFGs Lexicalization Mirella Lapata Head Lexicalization The Collins Parser School of Informatics University of Edinburgh Reading: J&M 2 nd edition, ch. 14.2–14.6.1, NLTK Book, Chapter 04 November 2011 8, final section on Weighted Grammar 1 / 28 2 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Parameter Estimation In a PCFG every rule is associated with a probability. But where do these rule probabilities come from? In a PCFG every rule is associated with a probability. Use a large parsed corpus such as the Penn Treebank. But where do these rule probabilities come from? Use a large parsed corpus such as the Penn Treebank. ( (S (NP-SBJ (DT That) (JJ cold) (, ,) obtain grammar rules by reading them off the trees; (JJ empty) (NN sky) ) Number of times LHS → RHS occurs in corpus over number S → NP - SBJ VP (VP (VBD was) of times LHS occurs VP → VBD ADJP - PRD (ADJP-PRD (JJ full) PP → IN NP (PP (IN of) Count( α → β ) γ Count( α → γ ) = Count( α → β ) NP → NN CC NN P ( α → β | α ) = (NP (NN fire) � Count( α ) (CC and) (NN light) )))) (. .) )) 3 / 28 4 / 28
Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Parameter Estimation With these parameters (rule probabilities), we can now compute the probabilities of the four sentences S1–S4: Corpus of parsed sentences: Compute PCFG probabilities: P ( S 1) = P ( r 1 | S ) P ( r 3 | NP ) P ( r 5 | VP ) ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) = 2 / 4 · 3 / 4 · 3 / 4 = 0 . 28125 ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ P ( S 2) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) r 3 NP → grass NP 3/4 = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 P ( S 3) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) r 6 VP → grow VP 1/4 = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 r 7 AP → fast AP 1/2 r 8 AP → slowly AP 1/2 P ( S 4) = P ( r 1 | S ) P ( r 4 | NP ) P ( r 6 | VP ) = 2 / 4 · 1 / 4 · 1 / 4 = 0 . 03125 5 / 28 6 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Parameter Estimation Problems with Standard PCFGs What if we don’t have a treebank but we do have a While standard PCFGs are useful for a number of applications, (non-probabilistic) parser? they can produce a wrong result when used to choose the correct parse for an ambiguous sentence. 1 Take a CFG and set all rules to have equal probability 2 Parse the corpus with the CFG How can that be? 3 Adjust the probabilities 1 The independence of the rules in a PCFG. 4 Repeat steps two and three until probabilities converge 2 They ignore lexical information until the very end of the analysis, when word classes are rewritten to word tokens. This is the Inside-Outside algorithm (Baker, 1979), a type of Expectation Maximisation algorithm. It can also be used to induce How can this lead to the wrong choice among possible parses? a grammar, but only with limited success. 7 / 28 8 / 28
Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence Problem 1: Assuming Independence S → NP VP NP → PRO By definition, a CFG assumes that the expansion of non-terminals VP → VBD NP NP → DT NOM is completely independent: It doesn’t matter: The above rules assign the same probability to both these trees, where a non-terminal is in the analysis; because they use the same re-write rules, and probability what else is (or isn’t) in the analysis. calculations do not depend on where rules are used. The same assumption holds for standard PCFGs: The probability of S S a rule is the same, no matter NP VP NP VP where it is applied in the analysis; what else is (or isn’t) in the analysis. VBD NP PRO VBD NP But this assumption is too simple! wrote PRO They wrote them 9 / 28 10 / 28 Parameter Estimation Parameter Estimation Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Problem 1: Assuming Independence Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence Problem 2: Ignoring Lexical Information S → NP VP N → queen | bin But in speech corpus, 91% of 31021 subject NPs are pronouns: NP → NNS | NN NNS → workers | sacks | cars VP → VBD NP | VBD NP PP V → dumped | repaired (1) a. She’s able to take her baby to work with her. PP → P NP DT → a | the b. My wife worked until we had a family. NP → DT NN P → into | of while only 34% of 7489 object NPs are pronouns: Consider the sentences: (2) a. Some laws absolutely prohibit it. (3) a. Workers dumped sacks into a bin. b. It wasn’t clear how NL and Mr. Simmons would b. Workers repaired cars of the queen. respond if Georgia Gulf spurns them again. Because rules for rewriting non-terminals ignore word tokens until So the probability of NP → PRO should depend on where in the the very end, let’s consider these simply as strings of POS tags: analysis it applies (e.g., subject or object position). (4) a. PRO V DT N PREP DT N b. PRO V DT N PREP DT N 11 / 28 12 / 28
Parameter Estimation Lexicalization Standard PCFGs Standard PCFGs Problem 1: Assuming Independence Head Lexicalization Lexicalized PCFGs Lexicalized PCFGs Problem 2: Ignoring Lexical Information The Collins Parser Problem 1: Ignoring Lexical Information Lexicalized PCFGs S S A PCFG can be lexicalised by associating a word and part-of-speech tag with every non-terminal in the grammar. NP VP NP VP It is head-lexicalised if the word is the head of the constituent described by the non-terminal. NNS NNS VBD NP VBD NNS PP Each non-terminal has a head that determines syntactic properties NP PP of phrase (e.g., which other phrases it can combine with). P NP NNS P NP DT N Example DT NN Noun Phrase (NP): Noun Adjective Phrase (AP): Adjective Which do we want for “Workers dumped sacks into a bin” ? Which Verb Phrase (VP): Verb for “Workers repaired cars of the queen” ? Prepositional Phrase (PP): Preposition Most appropriate analysis depends, in part, on the actual words. 13 / 28 14 / 28 Lexicalization Lexicalization Standard PCFGs Standard PCFGs Head Lexicalization Head Lexicalization Lexicalized PCFGs Lexicalized PCFGs The Collins Parser The Collins Parser Lexicalization Lexicalization Example We can lexicalize a PCFG by annotating each non-terminal with its TOP head word, starting with the terminals – replacing S VP → VBD NP PP VP → VBD NP NP → DT NN NP VP NP → NNS PP → P NP NNS VBD NP PP with rules of the form workers dumped NNS P NP VP(dumped) → V(dumped) NP(sacks) PP(into) VP(repaired) → V(repaired) NP(cars) PP(of) DT NN sacks into VP(dumped) → V(dumped) NP(sacks) VP(repaired) → V(repaired) NP(cars) a bin NP(queen) → DT(the) NN(queen) PP(into) → P(into) NP(bins) 15 / 28 16 / 28
Recommend
More recommend