Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Where we’re at Previous lecture:   Standard CKY (for non-probabilistic CFGs) The standard CKY algorithm finds all possible parse trees τ for a sentence S = w (1) …w (n) under a CFG G   in Chomsky Normal Form. Today’s lecture: Probabilistic Context-Free Grammars (PCFGs) – CFGs in which each rule is associated with a probability CKY for PCFGs (Viterbi): – CKY for PCFGs finds the most likely parse tree   τ * = argmax P( τ | S) for the sentence S under a PCFG. 2 CS447 Natural Language Processing

Previous Lecture: CKY for CFGs CS447: Natural Language Processing (J. Hockenmaier) 3

CKY: filling the chart w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w w ... ... ... ... .. .. .. .. . . . . w i w i w i w i ... ... ... ... w w w w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w ... ... ... .. .. .. . . . w i w i w i ... ... ... w w w 4 CS447 Natural Language Processing

CKY: filling one cell w ... ... w i ... w chart[2][6]: w w 1 w 2 w 3 w 4 w 5 w 6 w 7 ... .. . w i ... w chart[2][6]: chart[2][6]: chart[2][6]: chart[2][6]: w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w w ... ... ... ... .. .. .. .. . . . . w i w i w i w i ... ... ... ... w w w w 5 CS447 Natural Language Processing

CKY for standard CFGs CKY is a bottom-up chart parsing algorithm that finds all possible parse trees τ for a sentence S = w (1) …w (n) under a CFG G in Chomsky Normal Form (CNF).   – CNF : G has two types of rules: X ⟶ Y Z and X ⟶ w   (X, Y, Z are nonterminals, w is a terminal) – CKY is a dynamic programming algorithm – The parse chart is an n × n upper triangular matrix:   Each cell chart[i][j] (i ≤ j) stores all subtrees for w (i) …w (j) – Each cell chart[i][j] has at most one entry for each nonterminal X (and pairs of backpointers to each pair of (Y, Z) entry in cells chart[i][k] chart[k+1][j] from which an X can be formed – Time Complexity: O(n 3 | G |) 6 CS447 Natural Language Processing

Dealing with ambiguity: Probabilistic   Context-Free Grammars (PCFGs) CS447: Natural Language Processing (J. Hockenmaier) 7

              Grammars are ambiguous A grammar might generate multiple trees for a sentence: Incorrect analysis VP VP NP VP PP PP P NP V NP NP P NP V eat sushi with tuna eat sushi with tuna VP VP NP VP PP PP NP V P P NP NP V NP eat sushi with chopsticks eat sushi with chopsticks What’s the most likely parse τ for sentence S ?   We need a model of P( τ | S) 8 CS447 Natural Language Processing

          Computing P( τ | S) Using Bayes’ Rule:   P ( τ , S ) arg max P ( τ | S ) = arg max P ( S ) τ τ = arg max P ( τ , S ) τ = arg max P ( τ ) if S = yield( τ ) τ The yield of a tree is the string of terminal symbols   that can be read off the leaf nodes VP NP yield ( ) = eat sushi with tuna PP V NP NP P eat sushi with tuna VP 9 CS447 Natural Language Processing

      Computing P( τ ) T is the (infinite) set of all trees in the language:   L = { s ∈ Σ ∗ | ∃ τ ∈ T : yield ( τ ) = s } We need to define P( τ ) such that:   0 ≤ P ( τ ) ≤ 1 ∀ τ ∈ T : ∑ τ ∈ T P ( τ ) = 1 The set T is generated by a context-free grammar S NP VP VP Verb NP NP Det Noun → → → S S conj S VP VP PP NP → NP PP → → S ..... VP ..... NP ..... → → → 10 CS447 Natural Language Processing

Probabilistic Context-Free Grammars For every nonterminal X, define a probability distribution P(X → α | X) over all rules with the same LHS symbol X:   S → NP VP 0.8 S → S conj S 0.2 NP → Noun 0.2 NP → Det Noun 0.4 NP → NP PP 0.2 NP → NP conj NP 0.2 VP → Verb 0.4 VP → Verb NP 0.3 VP → Verb NP NP 0.1 VP → VP PP 0.2 PP → P NP 1.0 11 CS447 Natural Language Processing

Computing P( τ ) with a PCFG The probability of a tree τ is the product of the probabilities   of all its rules: S → NP VP 0.8 S S → S conj S 0.2 NP VP NP → Noun 0.2 Noun VP PP NP → Det Noun 0.4 NP → NP PP 0.2 John Verb NP P NP NP → NP conj NP 0.2 Noun Noun eats with VP → Verb 0.4 pie cream VP → Verb NP 0.3 VP → Verb NP NP 0.1 P( τ ) = 0.8 × 0.3 × 0.2 × 1.0 × 0.2 3 VP → VP PP 0.2 PP → P NP 1.0 = 0.00384 12 CS447 Natural Language Processing

Learning the parameters of a PCFG If we have a treebank (a corpus in which each sentence is associated with a parse tree), we can just count the number of times each rule appears, e.g.: S � NP VP . (count = 1000) S � S conj S . (count = 220) and then we divide the observed frequency of each rule X → Y Z by the sum of the frequencies of all rules with the same LHS X to turn these counts into probabilities: S � NP VP . (p = 1000/1220)   S � S conj S . (p = 220/1220) 13 CS447 Natural Language Processing

More on probabilities: Computing P(s) :   If P( τ ) is the probability of a tree τ ,   the probability of a sentence s is the sum of the probabilities of all its parse trees: P(s) = ∑ τ :yield( τ ) = s P( τ ) How do we know that P(L) = ∑ τ P( τ ) = 1 ? If we have learned the PCFG from a corpus via MLE, this is guaranteed to be the case. If we just set the probabilities by hand, we could run into trouble, as in the following example: S � S S (0.9)   S � w (0.1) 14 CS447 Natural Language Processing

PCFG parsing (decoding): Probabilistic CKY CS447: Natural Language Processing (J. Hockenmaier) 15

Probabilistic CKY: Viterbi Like standard CKY, but with probabilities. Finding the most likely tree is similar to Viterbi for HMMs: Initialization: – [ optional ] Every chart entry that corresponds to a terminal   (entry w in cell[i][i]) has a Viterbi probability P VIT (w [i][i] ) = 1 (*) – Every entry for a non-terminal X in cell[i][i] has Viterbi probability P VIT (X [i][i] ) = P(X → w | X) [and a single backpointer to w [i][i] (*) ] Recurrence: For every entry that corresponds to a non-terminal X in cell[i][j] , keep only the highest-scoring pair of backpointers to any pair of children ( Y in cell[i][k] and Z in cell[k+1][j] ):   P VIT (X [i][j] ) = argmax Y,Z,k P VIT (Y [i][k] ) × P VIT (Z [k+1][j] ) × P (X → Y Z | X ) Final step: Return the Viterbi parse for the start symbol S   in the top cell[1][n] . *this is unnecessary for simple PCFGs, but can be helpful for more complex probability models 16 CS447 Natural Language Processing

Probabilistic CKY Input: POS-tagged sentence   John_N eats_V pie_N with_P cream_N S → NP VP 0.8 John eats pie with cream S → S conj S 0.2 NP NP → Noun 0.2 Noun S S S John 1.0 0.2 0.8 · 0.2 · 0.3 0.8 · 0.2 · 0.06 0.2 · 0.0036 · 0.8 NP → Det Noun 0.4 VP VP VP NP → NP PP 0.2 Verb eats max( 1.0 · 0.008 · 0.3, 1 · 0.3 · 0.2 1.0 0.3 0.06 · 0.2 · 0.3 ) NP → NP conj NP 0.2 = 0.06 NP NP VP → Verb 0.3 0.4 Noun pie 0.2 · 0.2 · 0.2 1.0 0.2 VP → Verb NP 0.3 = 0.008 Prep VP → Verb NP NP 0.1 PP with 1.0 1 · 1 · 0.2 0.3 VP → VP PP 0.2 Prep NP NP PP → P NP 1.0 Noun cream 1.0 0.2 Prep ⟶ P 1.0 Noun ⟶ N 1.0 Verb ⟶ V 1.0 17 CS447 Natural Language Processing

How do we handle flat rules? S → NP VP 0.8 S ⟶ S ConjS 0.2 S → S conj S 0.2 ConjS ⟶ conj S 1.0 NP → Noun 0.2 NP → Det Noun 0.4 NP → NP PP 0.2 Binarize each flat rule by NP → NP conj NP 0.2 adding dummy nonterminals 0.3 VP → Verb 0.4 (ConjS), VP → Verb NP 0.3 and setting the probability of VP → Verb NP NP 0.1 the rule with the dummy 0.3 VP → VP PP 0.2 nonterminal on the LHS to 1 Prep NP PP → P NP 1.0 18 CS447 Natural Language Processing

Parser evaluation CS447: Natural Language Processing (J. Hockenmaier) 19

Precision and recall Precision and recall were originally developed   as evaluation metrics for information retrieval: - Precision: What percentage of retrieved documents are relevant to the query? - Recall : What percentage of relevant documents were retrieved? In NLP, they are often used in addition to accuracy: - Precision: What percentage of items that were assigned label X do actually have label X in the test data? - Recall: What percentage of items that have label X in the test data were assigned label X by the system? Particularly useful when there are more than two labels. 20 CS447: Natural Language Processing (J. Hockenmaier)

True vs. false positives, false negatives Items labeled X   Items labeled X   by the system in the gold standard   = TP + FP (‘truth’) = TP + FN False True   False   Negatives Positives Positives   (FN) (TP) (FP) - True positives: Items that were labeled X by the system,   and should be labeled X. - False positives: Items that were labeled X by the system,   but should not be labeled X. - False negatives: Items that were not labeled X by the system,   but should be labeled X 21 CS447: Natural Language Processing (J. Hockenmaier)

Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Where were at Previous lecture: Standard CKY (for non-probabilistic CFGs)

Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park Background: PCFG Recap 2

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Generalised Parsing and Combinator Parsing A Happy Marriage? L. Thomas van Binsbergen

Parsing as Deduction Joseph K uhner March 24, 2007 Joseph K uhner Parsing as Deduction

Bottom-up parsing LR parsing Construct parse tree for input from leaves up LR( k ) parsing

Cu Curr rrent t and Em Emerging Rol ole of of Immune Ch Checkpoi oint t Inhibitor ors s

GWAS identifies three loci, implicating the planar cell polarity pathway and the MYCN-DDX1 2p24.3

Cancer Genomes 02-223 How to Analyze Your Own Genome Cancer vs.

GETDN Graduates Entitled to Decide Now What are your basic rights? You have the right to be

Natural Language Processing Lecture 13: The Chomsky Hierarchy Formal Grammars Vocabulary of

Structured and Unstructured Spaces Guy Van den Broeck UBC Jun 7, 2017 References Probabilistic

Structured Probability Spaces Guy Van den Broeck DTAI Seminar - KU Leuven Dec 20, 2016

Administrative notes March 14: Midterm 2: this will cover all lectures, labs and readings