Parameter Estimation and Lexicalization for PCFGs Informatics 2A: - PowerPoint PPT Presentation

Standard PCFGs Lexicalized PCFGs Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4 November 2014 1 / 21

Standard PCFGs Lexicalized PCFGs 1 Standard PCFGs Parameter Estimation Problem 1: Assuming Independence Problem 2: Ignoring Lexical Information 2 Lexicalized PCFGs Lexicalization Head Lexicalization Reading: J&M 2 nd edition, ch. 14.2–14.6, NLTK Book, Chapter 8, final section on Weighted Grammar. 2 / 21

Standard PCFGs Lexicalized PCFGs Clicker Question S → NP VP (1.0) NPR → John (0.5) NP → DET N (0.7) NPR → Mary (0.5) NP → NPR (0.3) V → saw (0.4) VP → V PP (0.7) V → loves (0.6) VP → V NP (0.3) DET → a (1.0) PP → Prep NP (1.0) N → cat (0.6) N → saw (0.4) What is the probability of the sentence John saw a saw ? 1 0.02 2 0.00016 3 0.00504 4 0.0002 3 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation In a PCFG every rule is associated with a probability. But where do these rule probabilities come from? Use a large parsed corpus such as the Penn Treebank. ( (S (NP-SBJ (DT That) (JJ cold) (, ,) S → NP - SBJ VP (JJ empty) (NN sky) ) VP → VBD ADJP - PRD (VP (VBD was) PP → IN NP (ADJP-PRD (JJ full) NP → NN CC NN (PP (IN of) etc. (NP (NN fire) (CC and) (NN light) )))) (. .) )) 4 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation In a PCFG every rule is associated with a probability. But where do these rule probabilities come from? Use a large parsed corpus such as the Penn Treebank. Obtain grammar rules by reading them off the trees. Calculate number of times LHS → RHS occurs over number of times LHS occurs. Count( α → β ) γ Count( α → γ ) = Count( α → β ) P ( α → β | α ) = � Count( α ) 5 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 r 4 NP → bananas NP 1/4 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 r 6 VP → grow VP 1/4 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 r 6 VP → grow VP 1/4 r 7 AP → fast AP 1/2 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation Corpus of parsed sentences: Compute PCFG probabilities: ’ S1: [S [NP grass] [VP grows]]’ r Rule α P ( r | α ) ’ S2: [S [NP grass] [VP grows] [AP slowly]]’ r 1 S → NP VP S 2/4 ’ S3: [S [NP grass] [VP grows] [AP fast]]’ r 2 S → NP VP AP S 2/4 ’ S4: [S [NP bananas] [VP grow]]’ r 3 NP → grass NP 3/4 r 4 NP → bananas NP 1/4 r 5 VP → grows VP 3/4 r 6 VP → grow VP 1/4 r 7 AP → fast AP 1/2 r 8 AP → slowly AP 1/2 6 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation With these parameters (rule probabilities), we can now compute the probabilities of the four sentences S1–S4: P ( S 1) = P ( r 1 | S ) P ( r 3 | NP ) P ( r 5 | VP ) = 2 / 4 · 3 / 4 · 3 / 4 = 0 . 28125 P ( S 2) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 P ( S 3) = P ( r 2 | S ) P ( r 3 | NP ) P ( r 5 | VP ) P ( r 7 | AP ) = 2 / 4 · 3 / 4 · 3 / 4 · 1 / 2 = 0 . 140625 P ( S 4) = P ( r 1 | S ) P ( r 4 | NP ) P ( r 6 | VP ) = 2 / 4 · 1 / 4 · 1 / 4 = 0 . 03125 7 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Parameter Estimation What if we don’t have a treebank, but we do have an unparsed corpus and (non-probabilistic) parser? 1 Take a CFG and set all rules to have equal probability. 2 Parse the (flat) corpus with the CFG. 3 Adjust the probabilities. 4 Repeat steps two and three until probabilities converge. This is the inside-outside algorithm (Baker, 1979), a type of Expectation Maximisation algorithm. It can also be used to induce a grammar, but only with limited success. 8 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problems with Standard PCFGs While standard PCFGs are already useful for some purposes, they can produce poor result when used for disambiguation. Why is that? 1 They assume the rule choices are independent of one another. 2 They ignore lexical information until the very end of the analysis, when word classes are rewritten to word tokens. How can this lead to bad choices among possible parses? 9 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence By definition, a CFG assumes that the expansion of non-terminals is completely independent. It doesn’t matter: where a non-terminal is in the analysis; what else is (or isn’t) in the analysis. The same assumption holds for standard PCFGs: The probability of a rule is the same, no matter where it is applied in the analysis; what else is (or isn’t) in the analysis. But this assumption is too simple! 10 / 21

Parameter Estimation Standard PCFGs Problem 1: Assuming Independence Lexicalized PCFGs Problem 2: Ignoring Lexical Information Problem 1: Assuming Independence S → NP VP NP → PRO VP → VBD NP NP → DT NOM The above rules assign the same probability to both these trees, because they use the same re-write rules, and probability calculations do not depend on where rules are used. S S NP VP NP VP VBD NP PRO VBD NP wrote PRO They wrote them 11 / 21

Parameter Estimation and Lexicalization for PCFGs Informatics 2A: - PowerPoint PPT Presentation

Standard PCFGs Lexicalized PCFGs Parameter Estimation and Lexicalization for PCFGs Informatics 2A: Lecture 21 John Longley 4 November 2014 1 / 21 Standard PCFGs Lexicalized PCFGs 1 Standard PCFGs Parameter Estimation Problem 1: Assuming

Parameter Estimation and Lexicalization for Problem 1: Assuming Independence PCFGs Problem 2:

Natural Language Processing Learning PCFGs Parsing II Dan Klein UC Berkeley Treebank PCFGs

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

SI425 : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort

Probabilistic Context-Free Probabilistic Context-Free Grammars (PCFGs) Grammars (PCFGs) Berlin

Natural Language Processing Parsing II Dan Klein UC Berkeley 1 Learning PCFGs 2 Treebank

I 4 - Bayesian parameter estimation in a normal model STAT 587 (Engineering) Iowa State

Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French Abhishek Arun and

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Martin Emms September 20, 2019 4CSLL5

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

6. Parameter Passing Parameter Passing CS 381 Spring 2016 Example (Formal) Parameter void

10/16/19 Parameter Control Genetic Algorithms Motivation Parameter setting Tuning

Outline Introduction Knowledge Structures Parameter Estimation Maximum Likelihood Estimation

Probability: Classical and Bayesian 12/14/1998 12/14/98 Page 1 12/14/98 Page 2 P(h|e) P(h|e)

Announcements CS 188: Artificial Intelligence Uncertainty and Utilities Homework 3: Games

Introduction to Artificial Intelligence CS171, Summer 1 Quarter, 2019 Introduction to Artificial

Discrete Probability CMPS/MATH 2170: Discrete Mathematics 1 Applications of Probability in

Jeff Lundeen University of Ottawa Dept. of Physics CQIQC Toronto 2013 Anne Ksenia Jeff At

Random Variables Suppose we flip a fair coin twice. What is the sample space ? = {,

CS440/ECE448 Lecture 11: Random Variables CC-BY 3.0, Mark Hasegawa-Johnson, February 2019 edited

Quick Tour of Probability CS246: Mining Massive Datasets Winter 2013 Anshul Mittal Based on