SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how - PowerPoint PPT Presentation

SI485i : NLP Set 8 PCFGs and the CKY Algorithm

PCFGs • We saw how CFGs can model English (sort of) • Probabilistic CFGs put weights on the production rules • NP -> DET NN with probability 0.34 • NP -> NN NN with probability 0.16 2

PCFGs • We still parse sentences and come up with a syntactic derivation tree • But now we can talk about how confident the tree is • P(tree) ! 3

Buffalo Example • What is the probability of this tree? • It’s the product of all the inner trees, e.g., P(S ->NP VP) 4

PCFG Formalized • G = (T, N, S, R, P) • T is set of terminals • N is set of nonterminals • For NLP, we usually distinguish out a set P ⊂ N of preterminals, which always rewrite as terminals • S is the start symbol (one of the nonterminals) • R is rules/productions of the form X → γ, where X is a nonterminal and γ is a sequence of terminals and nonterminals • P(R) gives the probability of each rule. • ∀𝑌 ∈ 𝑂, 𝑄 𝑌 → 𝛿 = 1 𝑌→𝛿𝜗𝑆 • A grammar G generates a language model L. • 𝑄(𝛿) = 1 𝛿𝜗𝑈∗ Some slides adapted from Chris Manning 5

Some notation • w 1n = w 1 … w n = the word sequence from 1 to n • w ab = the subsequence w a … w b • We’ll write P( N i → ζ j ) to mean P( N i → ζ j | N i ) • This is a conditional probability. For instance, the sum of all rules headed by an NP must sum to 1! • We’ll want to calculate the best tree T • max_T P( T ⇒ * w ab ) 6

Trees and Probabilities • P( t ) -- The probability of tree is the product of the probabilities of the rules used to generate it. • P( w 1 n ) -- The probability of the string is the sum of the probabilities of all possible trees that have the string as their yield • P( w 1n ) = Σ j P( w 1n , t j ) where t j is a parse of w 1n • = Σ j P( t j ) 7

Example PCFG 8

P(tree) computation 11

Time to Parse • Let’s parse!! • Almost ready… • Trees must be in Chomsky Normal Form first. 12

Chomsky Normal Form • All rules are Z -> X Y or Z -> w • Transforming a grammar to CNF does not change its weak generative capacity. • Remove all unary rules and empties • Transform n-ary rules: VP->V NP PP becomes • VP -> V @VP-V and @VP-V -> NP PP • Why do we do this? Parsing is easier now. 13

Converting into CNF 14

The CKY Algorithm • Cocke-Kasami-Younger (CKY) Dynamic Programming Is back! 15

The CKY Algorithm NP->NN NNS 0.13 p = 0.13 x .0023 x .0014 p = 1.87 x 10^-7 NP->NNP NNS 0.056 p = 0.056 x .001 x .0014 p = 7.84 x 10^-8 16

The CKY Algorithm • What is the runtime? O( ?? ) • Note that each cell must check all pairs of children below it. • Binarizing the CFG rules is a must. The complexity explodes if you do not. 17

Evaluating CKY • How do we know if our parser works? • Count the number of correct labels in your table...the label and the span it dominates • [ label, start, finish ] • Most trees have an error or two! • Count how many spans are correct, wrong, and compute a Precision/Recall ratio. 23

Probabilities? • Where do the probabilities come from? • P( NP -> DT NN ) = ??? • Penn Treebank : a bunch of newspaper articles whose sentences have been manually annotated with full parse trees • P( NP -> DT NN ) = Count( NP -> DT NN ) / Count(NP) 24

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how - PowerPoint PPT Presentation

SI485i : NLP Set 8 PCFGs and the CKY Algorithm PCFGs We saw how CFGs can model English (sort of) Probabilistic CFGs put weights on the production rules NP -> DET NN with probability 0.34 NP -> NN NN with probability 0.16

SI485i : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Set 12 Features and Prediction What is NLP, really? Many of our tasks boil down

SI425 : NLP Missing Topics and the Future Who cares about NLP? NLP has expanded quickly

SI485i : NLP Set 13 Information Extraction Information Extraction Yesterday GM released

SI485i : NLP Set 4 Smoothing Language Models Fall 2013 : Chambers Review: evaluating n-gram

SI485i : NLP Set 2 Probability Review Spring 2015 : Chambers Review of Probability

SI485i : NLP Set 2 Probability Review Fall 2013 : Chambers Review of Probability

SI485i : NLP Set 11 Distributional Similarity slides adapted from Dan Jurafsky and Bill

SI485i : NLP Set 7 Syntax and Parsing Syntax Grammar, or syntax: The kind of implicit

SI485i : NLP Set 3 Language Models Fall 2013 : Chambers Language Modeling Which sentence is

SI485i : NLP Set 13 Information Extraction Information Extraction Yesterday GM released

SI485i : NLP Set 6 Sentiment and Opinions It's about finding out what people think... Can be big

SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney

SI485i : NLP Set 5 Using Nave Bayes Motivation We want to predict something . We have

SI485i : NLP Set 10 Lexical Relations slides adapted from Dan Jurafsky and Bill MacCartney

SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning Evaluating CKY How do we

Low-power smart imagers for vision-enabled Low-power smart imagers for vision-enabled wireless

-coupling in Electrodynamics and the Reversed Vavilov-Cherenkov radiation M. Sc. Omar Jesus

3D Deep Learning: An Overview based on My Work Hao Su Feb 23, 2018 Our world is 3D Hao Su 2

Linear stability of contact discontinuities for the nonisentropic Euler equations in two space

Optimization of in collaboration with: Parke Godfrey and Jarek Gryz Regular Path Queries in

Constructing Inductive Families in UniMath Felix Rech Advisor: Steven Schfer June 15, 2018

Constructing Premaximal Binary Cube-free Words of Any Level Elena Petrova and Arseny Shur Ural

The countable homogeneous poset Recognising R Peter J Cameron R is the unique countable graph with