Collins Parsing Victor, Yùdōng Zhōu
Outline • Introduction • Basic Model • Representation • Calculation • Three generative models • Models • Practice issues • Evaluation 2
Introduction • Michael Collins PhD Thesis, 1999 • Head-Driven (Lexical info) • Statistical, supervised • Input: Tagged sentence • Output: phrase-structure tree 3
Basic Model • Task: Given a sentence, candidate trees and probabilities, find best parsing tree • In this model: T=(B,D), where • B=set of baseNPs • D=set of dependencies 4
Basic Model An Example 5
Basic Model • Dependency Set • Step one: Find head child • Eg. S <NP VP> • Step two: Extract head modifier • Eg. NP modify VP, with rule S <NP VP> 6
Basic Model Notation: AF(j) = (hj, Rj) Eg. AF(1) = (5, <NP,S,VP>). where w1=Smith, w5=announced D = {(AF(1),AF(2)...AF(m)} P(T | S)=P(B,D | S)= P(B | S)* P(D | S,B) 7
Basic Model • Calculation • Dependency Probability: Training data 8
Basic Model • Calculation • Dependency Probability: Training data • Distance Measure Eg . Second “of” will reduce the probability: Shaw, based in Dalton, Ga., has annual sales of about $1.18 billion, and has economies of scale and lower raw-material costs that are expected to boost the profitability of Armstrong's brands, sold under the Armstrong and Evans-Black names . 9
Basic Model • Calculation • Dependency Probability: Training data • Distance Measure • Sparse Data • Solved by smoothing 10
Three Generative Models Generative Model Discrimitive Model joint probability conditional distribution distribution - P(T,S) - P(T|S) 11
Modal 1 Representation: 12
Modal 1 • Calculation: PCFG based • Generate head, P H (H|P,h) • Generate right modifier, P R (R i (r i )|P,h,H) • Until STOP symbol, R m+1 (r m+1 ) • Generate left modifier, P l (L i (l i )|P,h,H) • Example: • S(bought) NP(week) NP(Marks) VP(bought) • P h (VP|S,bought) *P l (NP(Marks)|S,VP,bought) *P l (NP(week)|S,VP,bought) *P l (STOP|S,VP,bought) 13 *P r (STOP|S,VP,bought)
Modal 1 • Calculation • Distance Measure • P R (R i (r i )|P,h,H, R 1 (r 1 ),…R i-1 (r i-1 ) ) =P R (R i (r i )|P,h,H) In Previous Formula =P R (R i (r i )|P,h,H, distance r (i-1) ) 14
Modal 2 • Complement/Adjunct distinction • Reasons doing this while parsing: • Lexical info/additional knowledge needed 15 • Help parsing accuracy
Modal 2 • Identifying Complement in Penn Treebank • Rule based • One incorrect Example: • How to get the correct one? 16
Modal 2 • Subcategorisation Frames • Generate head, P H (H|P,h) • Generate left and right subcat frames, LC and RC, P lc (LC|P,H,h) and P rc (RC|P,H,h) • Generate right modifier, (and then left modifier) P R (R i (r i )|P,h,H, distance r (i-1), RC ) …… 17
Modal 2 • Subcat Frames: Example • S(bought) NP(week) NP-C(Marks) VP(bought) • P h (VP|S,bought) * P lc ({NP-C}|S,VP,bought) * P rc ({ }|S,VP,bought) * P l (NP-C(Marks)|S,VP,bought, {NP-C} )* P l (NP(week)|S,VP,bought, { } ) * P l (STOP|S,VP,bought, { }) * P r (STOP|S,VP,bought, { }) P lc ({NP-C,NP-C}|S,VP,bought) will be quite small Thus achieve the correct parse 18
Modal 3 • Traces and Wh-movement • Example 1 The store (SBAR which TRACE bought Brooks Brothers) • Example 2 The store (SBAR which Marks bought TRACE) • Example 3 The store (SBAR which Marks bought Brooks Brothers from TRACE) 19
Modal 3 • +gap feature added • Introduce parameter P G (G|P,h,H) 20 where G is Head, Left or Right
Practice Issues • Smoothing • Eg. P H estimation e 2 =P H (H|P,t) • Final estimation: • Unknown words 21
Evaluation • Training data: • Section 02-21, Wall Street Journal portion • (Approximately 40,000 sentences) • Testing data: • Section 23 (2,416 sentences) 22
Evaluation • PARSEVAL measures • Label Precision = number of correct constituents in proposed parse number of constituents in proposed parse • Label Recall = number of correct constituents in proposed parse number of constituents in treebank parse • Crossing Brackets = number of constituents which violate constituent boundaries with a constituent in the treebank parse. 23
Evaluation • Collins 96 vs. Model 1 • Model 1 better on unary rules and distance measures • Model 2 vs. Model 3 • For 436 trace cases in testing data, Model 3 has 24 precision/recall 93.3%/90.1%
Q&A 25
Recommend
More recommend