Learning Greedy Policies for the Easy-First Framework Jun Xie, Chao Ma, Janardhan Rao Doppa, Prashanth Mannem, Xiaoli Fern, Tom Dietterich, Prasad Tadepalli Oregon State University 1
The Easy-First Framework: Example A 4.2 magnitude earthquake struck near eastern Sonoma County . Doc 1 A tremor struck in Sonoma County. Doc 2
The Easy-First Framework: Example A 4.2 magnitude earthquake struck near eastern Sonoma County . Doc 1 A tremor struck in Sonoma County. Doc 2 A 4.2 magnitude earthquake eastern Sonoma County Sonoma County A tremor 1. Begin with every mention in its own cluster
The Easy-First Framework: Example A 4.2 magnitude earthquake struck near eastern Sonoma County . Doc 1 A tremor struck in Sonoma County. Doc 2 A 4.2 magnitude earthquake eastern Sonoma County Sonoma County A tremor 1. Begin with every mention in its own cluster 2. Evaluate all possible merges with a scoring function and select the highest scoring merge (easiest)
The Easy-First Framework: Example A 4.2 magnitude earthquake struck near eastern Sonoma County . Doc 1 A tremor struck in Sonoma County. Doc 2 A 4.2 magnitude earthquake eastern Sonoma County Sonoma County A tremor 1. Begin with every mention in its own cluster 2. Evaluate all possible merges with a scoring function and select the highest scoring merge (easiest) 3. Repeat until stopping condition is met
Easy First Training S Initial State Bad Good 0 Weight Update …… c a b d S 1 f ( a ) 0 . 03 f f ( ( b b ) ) 0 0 . . 36 12 f f ( ( c c ) ) 0 0 . . 57 47 f f ( ( d d ) ) 0 0 . . 29 63 f ( a ) 0 . 04 …… h e g i S 2 f ( e ) 0 . 27 f ( g ) 0 . 39 f ( h ) 0 . 41 f ( i ) 0 . 52 …… m Weight Update j k n S 3 f ( j ) 0 . 31 f f ( ( k k ) ) 0 0 . . 38 36 f f ( ( m m ) ) 0 0 . . 51 55 f f ( ( n n ) ) 0 0 . . 62 68 f ( j ) 0 . 34 …… S T 6
Learning Scoring Function Possible goal: learn a scoring function such that: in every state ALL good actions are ranked higher than all bad actions Over-Constrained Goal A better goal: learn a scoring function such that in every state ONE good action is ranked higher than all bad actions 7
Proposed Objective for Update • Goal: find a linear function such that it ranks one good action higher than all bad actions – This can be achieved by a set of constraints max ∈𝐻 𝑥 ⋅ 𝑦 > 𝑥 ⋅ 𝑦 𝑐 + 1 for all 𝑐 ∈ 𝐶 • Our Objective: • Use hinge loss to capture the constraints • Regularization to avoid overly aggressive update 1 w 2 argmin ( 1 max w x w x ) w g b c B g G w b B 8
Optimization • Majorization Minimization algorithm to find a local optimal solution. • In each MM iteration: – Let be the current highest scoring good action – Solve following convex objective (via subgradient descent) 1 w 2 argmin ( 1 max w x w x ) w g b c B g G w b B w * x g
Contrast with Existing Methods Bad Good • Average-good vs. average-bad (AGAB) Average-Good Average-Bad • Best-good vs. best-bad (BGBB) Best-good Best-bad • Proposed method: Best-good vs. violated-bad (BGVB) Best-good Violated-bad 10
Experiment I: cross-document entity and event coref Results on EECB corpus (Lee et al., 2012) BGBB R-BGBB BGVB R-BGVB Lee et al. 80 70 60 50 40 30 20 10 0 MUC B-CUBE CEAF_e CoNLL 11
Experiment II: within-doc Coref Results on OntoNotes BGBB R-BGBB BGVB R-BGVB 80 70 60 50 40 30 20 10 0 MUC B-CUBE CEAF_e CoNLL 12
Diagnostics • Some training statistics on ACE 2004 corpus: Approach Total Steps Mistakes Recoveries Percentage Accuracy RBGVB 50195 16228 4255 0.262 0.87 13
Diagnostics • Some training statistics on ACE 2004 corpus: Approach Total Steps Mistakes Recoveries Percentage Accuracy RBGVB 50195 16228 4255 0.262 0.87 BGBB 50195 11625 4075 0.351 0.82 BGBB corrects errors more aggressively than RBGVB. This is a strong evidence that overfitting does happen with BGBB. 14
Contributions • We precisely represent the learning goal for Easy First as an optimization problem • We develop an efficient Majorization Minimization algorithm to optimize the proposed objective • Achieve highly competitive results against state-of-the-art for both within- and cross- document coref 15
16
Recommend
More recommend