Natural Language Processing CSCI 4152/6509 — Lecture 26 CFGs and CYK Parsing Algorithm Instructor: Vlado Keselj Time and date: 09:35–10:25, 12-Mar-2020 Location: Dunn 135 CSCI 4152/6509, Vlado Keselj Lecture 26 1 / 12
Previous Lecture HMM POS-tagging with a product-sum algorithm Part IV: Parsing (Syntactic Processing) Natural language syntax: ◮ phrase structure, clauses, sentences Reading: [JM] Ch 12 Parsing, parse tree examples CSCI 4152/6509, Vlado Keselj Lecture 26 2 / 12
Some Notions about CFGs CFG, also known as Phrase-Structure Grammar (PSG) Equivalent to BNF (Backus-Naur form) Idea from Wundt (1900), formally defined by Chomsky (1956) and Backus (1959) Typical notation ( V, T, P, S ) ; also ( N, Σ , R, S ) Direct derivation, derivation Example of a direct derivation: S ⇒ NP VP Example of a derivation (beginning of): S ⇒ NP VP ⇒ DT NN VP ⇒ That NN VP ⇒ . . . Left-most and right-most derivation CSCI 4152/6509, Vlado Keselj Lecture 26 3 / 12
Parse Tree Example (revisited) That man caught the butterfly with a net. DT NN VBD DT NN IN DT NN NP NP NP PP VP S CSCI 4152/6509, Vlado Keselj Lecture 26 4 / 12
Leftmost Derivation Example ⇒ NP VP ⇒ DT NN VP ⇒ That NN VP ⇒ That man VP S ⇒ That man VBD NP PP ⇒ That man caught NP PP ⇒ That man caught DT NN PP ⇒ That man caught the NN PP ⇒ That man caught the butterfly PP ⇒ That man caught the butterfly IN NP ⇒ That man caught the butterfly with NP ⇒ That man caught the butterfly with DT NN ⇒ That man caught the butterfly with a NN ⇒ That man caught the butterfly with a net CSCI 4152/6509, Vlado Keselj Lecture 26 5 / 12
Some Notions about CFGs (continued) Language generated by a CFG Context-Free languages Parsing task Ambiguous sentences Ambiguous grammars Inherently ambiguous languages CSCI 4152/6509, Vlado Keselj Lecture 26 6 / 12
Bracket Representation of a Parse Tree (S (NP (DT That) (NN man)) (VP (VBD caught) (NP (DT the) (NN butterfly)) (PP (IN with) (NP (DT a) (NN net) ) ) ) ) CSCI 4152/6509, Vlado Keselj Lecture 26 7 / 12
Some Notes on CFGs Left-hand side (lhs) and right-had side (rhs) of a production → NP VP S ���� � �� � lhs rhs Empty rule (epsilon rule, epsilon production): V → ǫ Unit production: A → B , where A and B are non-terminals Notational variations: ◮ use of ‘ | ’: P → N | A P , instead of P → N , P → A P ◮ BNF notation: P ::= N | A P ◮ use of word ‘opt’: NP ::= DT NN PP opt ◮ or Kleene star: NP ::= DT NN PP ∗ CSCI 4152/6509, Vlado Keselj Lecture 26 8 / 12
CYK Chart Parsing Algorithm When parsing NLP, there are generally two approaches: Backtracking to find all parse trees 1 Chart parsing 2 CYK algorithm: a simple chart parsing algorithm CYK: Cocke-Younger-Kasami algorithm CYK can be applied only to a CNF grammar CSCI 4152/6509, Vlado Keselj Lecture 26 9 / 12
Chomsky Normal Form all rules are in one of the forms: A → B C , where A , B , and C are nonterminals, or 1 A → w , where A is a nonterminal and w is a terminal 2 If a grammar is not in CNF, it can be converted to it Is the following grammar in CNF? S → NP VP VP → V NP N → time V → like NP → N VP → V PP N → arrow V → flies NP → N N PP → P NP N → flies P → like NP → D N D → an CSCI 4152/6509, Vlado Keselj Lecture 26 10 / 12
How about this grammar? (Is it in CNF?) S → NP VP VP → V NP N → time V → like NP → time VP → V PP N → arrow V → flies NP → N N PP → P NP N → flies P → like NP → D N D → an CSCI 4152/6509, Vlado Keselj Lecture 26 11 / 12
CYK Example The following grammar in CNF is given: → → → → S NP VP VP V NP N time V like → → → → NP time VP V PP N arrow V flies → → → → NP N N PP P NP N flies P like → → NP D N D an time flies like an arrow 0 1 2 3 4 5 D NP,N N V, N V, P NP NP PP, VP VP S CSCI 4152/6509, Vlado Keselj Lecture 26 12 / 12
Recommend
More recommend