NLP Programming Tutorial 8 – Phrase Structure Parsing NLP Programming Tutorial 8 - Phrase Structure Parsing Graham Neubig Nara Institute of Science and Technology (NAIST) 1
NLP Programming Tutorial 8 – Phrase Structure Parsing Interpreting Language is Hard! I saw a girl with a telescope ● “Parsing” resolves structural ambiguity in a formal way 2
NLP Programming Tutorial 8 – Phrase Structure Parsing Two Types of Parsing ● Dependency: focuses on relations between words I saw a girl with a telescope ● Phrase structure: focuses on identifying phrases and their recursive structure S VP PP NP NP NP PRPVBD DT NN IN DT NN 3 I saw a girl with a telescope
NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 4
NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 5
NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP ??? PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 6
NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP ??? PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 7
NLP Programming Tutorial 8 – Phrase Structure Parsing Recursive Structure? S VP ??? PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 8
NLP Programming Tutorial 8 – Phrase Structure Parsing Different Structure, Different Interpretation S VP ??? NP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 9
NLP Programming Tutorial 8 – Phrase Structure Parsing Different Structure, Different Interpretation S VP ??? NP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 10
NLP Programming Tutorial 8 – Phrase Structure Parsing Different Structure, Different Interpretation S VP ??? NP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 11
NLP Programming Tutorial 8 – Phrase Structure Parsing Different Structure, Different Interpretation S VP ??? NP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope 12
NLP Programming Tutorial 8 – Phrase Structure Parsing Non-Terminals, Pre-Terminals, Terminals S VP Non-Terminal PP NP NP NP Pre-Terminal PRP VBD DT NN IN DT NN I saw a girl with a telescope Terminal 13
NLP Programming Tutorial 8 – Phrase Structure Parsing Parsing as a Prediction Problem ● Given a sentence X, predict its parse tree Y S VP PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope ● A type of “structured” prediction (similar to POS tagging, word segmentation, etc.) 14
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Model for Parsing ● Given a sentence X, predict the most probable parse tree Y S VP PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope P ( Y ∣ X ) argmax Y 15
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Generative Model ● We assume some probabilistic model generated the parse tree Y and sentence X jointly P ( Y , X ) ● The parse tree with highest joint probability given X also has the highest conditional probability P ( Y ∣ X )= argmax P ( Y , X ) argmax Y Y 16
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● How do we define a joint probability for a parse tree? S VP P( ) PP NP NP NP PRPVBD DT NN IN DT NN I saw a girl with a telescope 17
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● PCFG: Define probability for each node S P(S → NP VP) P(VP → VBD NP PP) VP P(PP → IN NP) PP P(NP → DT NN) NP NP NP P(PRP → “I”) P(NN → “telescope”) PRPVBD DT NN IN DT NN I saw a girl with a telescope 18
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Context Free Grammar (PCFG) ● PCFG: Define probability for each node S P(S → NP VP) P(VP → VBD NP PP) VP P(PP → IN NP) PP P(NP → DT NN) NP NP NP P(PRP → “I”) P(NN → “telescope”) PRPVBD DT NN IN DT NN I saw a girl with a telescope ● Parse tree probability is product of node probabilities P(S → NP VP) * P(NP → PRP) * P(PRP → “I”) * P(VP → VBD NP PP) * P(VBD → “saw”) * P(NP → DT NN) * P(DT → “a”) * P(NN → “girl”) * P(PP → IN NP) * P(IN → “with”) 19 * P(NP → DT NN) * P(DT → “a”) * P(NN → “telescope”)
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Parsing ● Given this model, parsing is the algorithm to find P ( Y , X ) argmax Y ● Can we use the Viterbi algorithm as we did before? 20
NLP Programming Tutorial 8 – Phrase Structure Parsing Probabilistic Parsing ● Given this model, parsing is the algorithm to find P ( Y , X ) argmax Y ● Can we use the Viterbi algorithm as we did before? ● Answer: No! ● Reason: Parse candidates are not graphs, but hypergraphs. 21
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Let's say we have S 0,7 two parse trees VP 1,7 NP 2,7 S PP 0,7 4,7 NP NP NP VP 0,1 2,4 5,7 1,7 PRP VBD DT NN IN DT NN PP 0,1 1,2 2,3 3,4 4,5 5,6 6,7 4,7 I saw a girl with a telescope NP NP NP 0,1 2,4 5,7 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 22 I saw a girl with a telescope
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Most parts are the S 0,7 same! VP 1,7 NP 2,7 S PP 0,7 4,7 NP NP NP VP 0,1 2,4 5,7 1,7 PRP VBD DT NN IN DT NN PP 0,1 1,2 2,3 3,4 4,5 5,6 6,7 4,7 I saw a girl with a telescope NP NP NP 0,1 2,4 5,7 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 23 I saw a girl with a telescope
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Graph with all same edges + all nodes S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 24
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● Create graph with all same edges + all nodes S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 25
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the first trees: S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 26
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the second tree: S 0,7 VP 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 27
NLP Programming Tutorial 8 – Phrase Structure Parsing What is a Hypergraph? ● With the edges in the first and second trees: S Two choices! 0,7 Choose red, get the first tree VP Choose blue, get the second tree 1,7 NP 2,7 PP 4,7 NP NP NP 2,4 5,7 0,1 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 I saw a girl with a telescope 28
NLP Programming Tutorial 8 – Phrase Structure Parsing Why a “Hyper”graph? ● The “degree” of an edge is the number of children Degree 1 Degree 2 Degree 3 VP VP PRP VBD 1,7 1,7 0,1 1,2 VBD VBD NP NP PP I saw 1,2 1,2 2,7 2,4 4,7 ● The degree of a hypergraph is the maximum degree of all its edges ● A graph is a hypergraph of degree 1! 1.4 2.3 4.0 2.5 Example → 0 1 2 3 29 2.1
NLP Programming Tutorial 8 – Phrase Structure Parsing Weighted Hypergraphs ● Like graphs: ● can add weights to hypergraph edges ● use negative log probability of rule S 0,7 -log(P(VP → VBD NP)) -log(P(S → NP VP)) VP 1,7 NP -log(P(VP → VBD NP PP)) 2,7 PP 4,7 NP NP NP 0,1 5,7 2,4 PRP VBD DT NN IN DT NN 0,1 1,2 2,3 3,4 4,5 5,6 6,7 log(P(PRP → “I”)) 30 I saw a girl with a telescope
NLP Programming Tutorial 8 – Phrase Structure Parsing Solving Hypergraphs ● Parsing = finding minimum path through a hypergraph 31
NLP Programming Tutorial 8 – Phrase Structure Parsing Solving Hypergraphs ● Parsing = finding minimum path through a hypergraph ● We can do this for graphs with the Viterbi algorithm ● Forward: Calculate score of best path to each state ● Backward: Recover the best path 32
Recommend
More recommend