Lecture 16: PCFG Parsing (updated) Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Overview CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/ 2

Where we’re at Previous lecture:   Standard CKY (for non-probabilistic CFGs) The CKY algorithm finds all possible parse trees τ for a sentence S = w (1) …w (n) under a CFG G in Chomsky Normal Form. Today’s lecture: Probabilistic Context-Free Grammars (PCFGs) – CFGs in which each rule is associated with a probability CKY for PCFGs (Viterbi): – CKY for PCFGs finds the most likely parse tree   τ * = argmax P( τ | S) for the sentence S under a PCFG. Shortcomings of PCFGs (and ways to overcome them ) Penn Treebank Parsing   Evaluating PCFG parsers 3 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

CKY: filling the chart w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w w ... ... ... ... .. .. .. .. . . . . w i w i w i w i ... ... ... ... w w w w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w ... ... ... .. .. .. . . . w i w i w i ... ... ... w w w 4 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

CKY: filling one cell w ... ... w i ... w chart[2][6]: w w 1 w 2 w 3 w 4 w 5 w 6 w 7 ... .. . w i ... w chart[2][6]: chart[2][6]: chart[2][6]: chart[2][6]: w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w 1 w 2 w 3 w 4 w 5 w 6 w 7 w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w ... ... w i ... w w w w w ... ... ... ... .. .. .. .. . . . . w i w i w i w i ... ... ... ... w w w w 5 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

CKY for standard CFGs CKY is a bottom-up chart parsing algorithm that finds all possible parse trees τ for a sentence S = w (1) …w (n) under a CFG G in Chomsky Normal Form (CNF). – CNF : G has two types of rules: X ⟶ Y Z and X ⟶ w   (X, Y, Z are nonterminals, w is a terminal) – CKY is a dynamic programming algorithm – The parse chart is an n × n upper triangular matrix:   Each cell chart[i][j] (i ≤ j) stores all subtrees for w (i) …w (j) – Each cell chart[i][j] has at most one entry for each nonterminal X (and pairs of backpointers to each pair of (Y, Z) entry in cells chart[i][k] chart[k+1][j] from which an X can be formed – Time Complexity: O(n 3 | G |) 6 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

c i t s i l i b e a e b r ) F o s r - G P t x F e C t n P o ( C s r a m m a r G CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/ 7

              Grammars are ambiguous A grammar might generate multiple trees for a Incorrect analysis sentence: VP VP NP VP PP PP P NP V NP NP P NP V eat sushi with tuna eat sushi with tuna VP VP NP VP PP PP NP V P P NP NP V NP eat sushi with chopsticks eat sushi with chopsticks What’s the most likely parse τ for sentence S ?   We need a model of P( τ | S) 8 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

          Computing P( τ | S) Using Bayes’ Rule:   P ( τ , S ) arg max P ( τ | S ) = arg max P ( S ) τ τ = arg max P ( τ , S ) τ = arg max P ( τ ) if S = yield( τ ) τ The yield of a tree is the string of terminal symbols   that can be read off the leaf nodes VP NP yield ( ) = eat sushi with tuna PP V NP NP P eat sushi with tuna VP 9 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

      Computing P( τ ) T is the (infinite) set of all trees in the language:   L = { s ∈ Σ ∗ | ∃ τ ∈ T : yield ( τ ) = s } We need to define P( τ ) such that:   0 ≤ P ( τ ) ≤ 1 ∀ τ ∈ T : ∑ τ ∈ T P ( τ ) = 1 The set T is generated by a context-free grammar S NP VP VP Verb NP NP Det Noun → → → S S conj S VP VP PP NP → NP PP → → S ..... VP ..... NP ..... → → → 10 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

Probabilistic Context-Free Grammars For every nonterminal X, define a probability distribution   P(X → α | X) over all rules with the same LHS symbol X: S → NP VP 0.8 S → S conj S 0.2 NP → Noun 0.2 NP → Det Noun 0.4 NP → NP PP 0.2 NP → NP conj NP 0.2 VP → Verb 0.4 VP → Verb NP 0.3 VP → Verb NP NP 0.1 VP → VP PP 0.2 PP → P NP 1.0 11 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

Computing P( τ ) with a PCFG The probability of a tree τ is the product of the probabilities   of all its rules: S → NP VP 0.8 S S → S conj S 0.2 NP VP NP → Noun 0.2 Noun VP PP NP → Det Noun 0.4 NP → NP PP 0.2 John Verb NP P NP NP → NP conj NP 0.2 Noun Noun eats with VP → Verb 0.4 pie cream VP → Verb NP 0.3 VP → Verb NP NP 0.1 P( τ ) = 0.8 × 0.3 × 0.2 × 1.0 × 0.2 3 VP → VP PP 0.2 PP → P NP 1.0 = 0.00384 12 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

Learning the parameters of a PCFG If we have a treebank (a corpus in which each sentence is associated with a parse tree), we can just count the number of times each rule appears, e.g.: S → NP VP . (count = 1000) S → S conj S . (count = 220) PP → IN NP (count = 700) and then we divide the count (observed frequency) of each rule X → Y Z by the sum of the frequencies of all rules with the same LHS X to turn these counts into probabilities: S → NP VP . (p = 1000/1220)   S → S conj S . (p = 220/1220)   PP → IN NP (p = 700/700) 13 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

  More on probabilities: Computing P(s) : If P( τ ) is the probability of a tree τ , the probability of a sentence s is the sum of the probabilities of all its parse trees: P(s) = ∑ τ :yield( τ ) = s P( τ ) How do we know that P(L) = ∑ τ P( τ ) = 1 ? If we have learned the PCFG from a corpus via MLE,   this is guaranteed to be the case. But if we set the probabilities by hand, we could run into trouble:   In this PCFG, the probability mass of all finite trees is less than 1: S → S S (0.9) S → w (0.1) P(L) = P(“w”) + P(“ww”) + P(“w[ww]”) + P(“[ww]w”) + …   = .1 + .009 + 0.00081 + 0.00081 + … ≪ 1 14 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

PCFG Decoding: CKY with Viterbi CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/ 15

How do we handle flat rules? S ⟶ NP VP 0.8 S ⟶ S ConjS 0.2 S ⟶ S conj S 0.2 ConjS ⟶ conj S 1.0 NP ⟶ Noun 0.2 NP ⟶ Det Noun 0.4 NP ⟶ NP PP 0.2 Binarize each flat rule by NP ⟶ NP conj NP 0.2 VP ⟶ Verb 0.3 adding a unique dummy VP ⟶ Verb NP 0.3 nonterminal ( ConjS ), and VP ⟶ Verb NP NP 0.1 setting the probability of the VP ⟶ VP PP 0.3 new rule with the dummy PP ⟶ PP NP 1.0 nonterminal on the LHS to 1 Prep ⟶ P 1.0 Noun ⟶ N 1.0 Verb ⟶ V 1.0 16 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

How do we handle flat rules? S ⟶ NP VP 0.8 S ⟶ NP VP 0.8 S ⟶ S conj S 0.2 S ⟶ S ConjS 0.2 NP ⟶ Noun 0.2 NP ⟶ Noun 0.2 NP ⟶ Det Noun 0.4 NP ⟶ Det Noun 0.4 NP ⟶ NP PP 0.2 NP ⟶ NP PP 0.2 NP ⟶ NP conj NP 0.2 NP ⟶ NP ConjNP 0.2 VP ⟶ Verb 0.3 VP ⟶ Verb 0.3 VP ⟶ Verb NP 0.3 VP ⟶ Verb NP 0.3 VP ⟶ Verb NP NP 0.1 VP ⟶ Verb NPNP 0.1 VP ⟶ VP PP 0.3 VP ⟶ VP PP 0.3 PP ⟶ PP NP 1.0 PP ⟶ PP NP 1.0 Prep ⟶ P 1.0 Prep ⟶ P 1.0 Noun ⟶ N 1.0 Noun ⟶ N 1.0 Verb ⟶ V 1.0 Verb ⟶ V 1.0 ConjS ⟶ conj S 1.0 ConjNP ⟶ conj NP 1.0 NPNP ⟶ NP NP 1.0 17 CS447 Natural Language Processing (J. Hockenmaier) https://courses.grainger.illinois.edu/cs447/

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier - PowerPoint PPT Presentation

CS447: Natural Language Processing http://courses.engr.illinois.edu/cs447 Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Overview CS447 Natural Language Processing (J. Hockenmaier)

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

Pocket Lecture Pocket Lecture Pocket Lecture Pocket Lecture Listen Audio Notes Progress

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Repetition Automatic Control, Basic Course, Lecture 11 Fredrik Bagge Carlson December 17, 2016

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Handout on List Class The template library 'list' is a sequence container that contains elements

Database Migration: Challenges of Migration from Oracle to Open Source European Bioinformatics

Reasoning on programs using Step-indexed Realizability Guilhem Jaber PPS, IRIF, Universite Paris

Overload of frontier lpad by MC Overlay Elizabeth Gallas (Oxford) ADC Weekly Meeting April 15,

CSE 143 Java Using this to run other constructors Overloading, constructors and

in supersymmetric matrix model Tsunehide Kuroki (National Institute of Technology, Kagawa

Different Flavors Hoang Dau University of Illinois at Urbana-Champaign 1 Email:

HOW OUR REGIONAL TAX DOLLARS ARE BEING SPENT THE HONORABLE MARTY NOHE CHAIRMAN, NORTHERN