Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ - PowerPoint PPT Presentation

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1

Reading list v Look at Mike Collins’ note on PCFGs and lexicalized PCFG http://www.cs.columbia.edu/~mcollins/ CS6501-NLP 2

Phrase structure (constituency) trees v Can be modeled by Context-free grammars CS6501-NLP 3

CKY algorithm § for J := 1 to n § Add to [J-1,J] all categories for the J th word § for width := 2 to n § for start := 0 to n-width // this is I § Define end := start + width // this is J § for mid := start+1 to end-1 // find all I-to-J phrases § for every rule X à Y Z in the grammar if Y in [start,mid] and Z in [mid,end] then add X to [start,end] CS6501-NLP 4

Weighted CKY: Viterbi algorithm • initialize all entries of chart to ∞ • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] max ( weight(R) ) • for width := 2 to n Assume the weights • for start := 0 to n-width are log probabilities • Define end := start + width of rules • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] = max( weight(R) + chart[Y,start,mid] + chart[Z,mid,end]) • return chart[ROOT,0,n] CS6501-NLP 5 Slides are modified from Jason Eisner’s NLP course

Likelihood of a parse tree WHY?? CS6501-NLP 6

Probabilistic Trees v Just like language models or HMM for POS tagging v We make independent assumptions! S NP VP time PP VP flies P NP like Det N an arrow 7 CS6501-NLP

Chain rule: One word at a time p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP 8

Chain rule + Indep. assumptions (to get trigram model) p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an) CS6501-NLP 9

Three basic problems for HMMs v Likelihood of the input: v Forward algorithm How likely the sentence ”I love cat” occurs v Decoding (tagging) the input: v Viterbi algorithm POS tags of ”I love cat” occurs v Estimation (learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Forward-backward algorithm CS6501-NLP 15

Phrase Structure Trees Three basic problems for HMMs v Likelihood of the input: v Inside algorithm How likely the sentence ”I love cat” occurs v Decoding (Parsing) the input: v CKY algorithm Parse tree of ”I love cat” v Estimation (Learning): How to learn the model? v Find the best model parameters v Case 1: supervised – tags are annotated v Maximum likelihood estimation (MLE) v Case 2: unsupervised -- only unannotated text v Inside-Outside algorithm CS6501-NLP 16

Probabilistic CKY: Inside algorithm • initialize all entries of chart to 0 • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] += prob(R) • for width := 2 to n • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] += prob(R) * chart[Y,start,mid] * chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 19

S à NP VP How to build a width-6 phrase NP à Det N NP à NP PP VP à V NP VP à VP PP PP à P NP ? 1 7 = 1 2 + 2 7 1 3 + 3 7 1 4 + 4 7 1 5 + 5 7 1 6 + 6 7 CS6501: NLP 20

CKY: Recognition algorithm v initialize all entries of chart to false v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] |= in_grammar(R) v for width := 2 to n v for start := 0 to n-width Pay attention to the orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] |= in_grammar(R) & chart[Y,start,mid] & chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 21

Weighted CKY: Viterbi algorithm (min-cost) v initialize all entries of chart to ∞ v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] min= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] min= weight(R) + chart[Y,start,mid] + chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 22

Weighted CKY: Viterbi algorithm (max-prob) v initialize all entries of chart to 0 v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] max= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] max= weight(R) * chart[Y,start,mid] * chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 23

Weighted CKY: Viterbi algorithm (max-logprob) v initialize all entries of chart to - ∞ v for i := 1 to n v for each rule R of the form X à word[i] v chart[X,i-1,i] max= weight(R) v for width := 2 to n Pay attention to the v for start := 0 to n-width orange code … v Define end := start + width v for mid := start+1 to end-1 v for each rule R of the form X à Y Z v chart[X,start,end] max= weight(R) + chart[Y,start,mid] + chart[Z,mid,end] v return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 24

Probabilistic CKY: Inside algorithm • initialize all entries of chart to 0 • for i := 1 to n • for each rule R of the form X à word[i] • chart[X,i-1,i] += prob(R) • for width := 2 to n • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] += prob(R) * chart[Y,start,mid] * chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 25

Semiring-weighted CKY: General algorithm! ⊗ is like “and”/ ∀ : • initialize all entries of chart to combines all of several • for i := 1 to n pieces into an X • for each rule R of the form X à word[i] ⊕ is like “or”/ ∃ : considers the • chart[X,i-1,i] ⊕ = semiring_weight(R) alternative ways to • for width := 2 to n build the X • for start := 0 to n-width • Define end := start + width • for mid := start+1 to end-1 • for each rule R of the form X à Y Z • chart[X,start,end] ⊕ = semiring_weight(R) ⊗ chart[Y,start,mid] ⊗ chart[Z,mid,end] • return chart[ROOT,0,n] 600.465 - Intro to NLP - J. Eisner 26

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ - PowerPoint PPT Presentation

Lecture 17: Statistical Parsing with PCFG Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP 1 Reading list v Look at Mike Collins note on PCFGs and lexicalized PCFG

Assignment 2: Parsing PCFG and CKY with C2FP Chan Young Park Background: PCFG Recap 2

Introduction to Bottom-Up Parsing Shift-reduce parsing The LR parsing algorithm

CSC 4181 Compiler Construction Parsing 1 1 Outline Top-down v.s. Bottom-up Top-down parsing

Robust Incremental Neural Semantic Graph Parsing Jan Buys and Phil Blunsom Dependency Parsing vs

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS

Statistical Dependency Parsing in Korean: From Corpus Generation To Automatic Parsing Workshop on

Lecture 18: PCFG Parsing Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Where

Parsing with PCFGs Joakim Nivre Uppsala University Department of Linguistics and Philology

Lecture 16: PCFG Parsing (updated) Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center

Multilevel Coarse-to-Fine PCFG Parsing Eugene Charniak, Mark Johnson, Micha Elsner, Joseph

Models of Human Parsing Experimental Data 2 Informatics 2A: Lecture 22 Eye-tracking Reading

Statistical Parsing Parsing context-free languages ar ltekin University of Tbingen

Statistical Parsing Dependency parsing ar ltekin University of Tbingen Seminar fr

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the

Graph-Based Parsing Joakim Nivre Uppsala University Department of Linguistics and Philology

Dependency Parsing II CMSC 470 Marine Carpuat Graph-based Dependency Parsing Slides credit:

Constructing Mid-points for Two party Asynchronous Protocols Petar Tsankov, Mohammad Torabi

F ROM P ETRI N ETS TO D IFFERENTIAL E QUATIONS AN I NTEGRATIVE A PPROACH FOR B IOCHEMICAL N ETWORK

Background Models and Languages for The modelling of chemical reactions using deterministic

PROGRAMMING ANALOG DEVICES WITH JAUNT AND ARCO 2 Programmable Dynamical Systems Analog

Cryptosystems from 1900 to 1975 Late Classical Cryptosystems From 1900 up to the mid-1970s

Mergesort and Quicksort LAST TODAY NEXT Binary search Divide and conquer Part II of course

Phonological domains within Blackfoot Towards a family-wide comparison Natalie Weber 52nd

Pion Hadron Production in NA61 Alessandro Bravar (Universit de Genve) for the NA61