higher order coreference resolution with coarse to fine
play

Higher-order Coreference Resolution with Coarse-to-fine Inference - PowerPoint PPT Presentation

Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke Zettlemoyer University of Washington * Now at Google 1 Coreference Resolution Its because of what both of you are doing to have things


  1. Higher-order Coreference Resolution with Coarse-to-fine Inference Kenton Lee * Luheng He Luke Zettlemoyer University of Washington * Now at Google 1

  2. Coreference Resolution It’s because of what both of you are doing to have things change. I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 2

  3. Coreference Resolution It’s because of what both of you are doing to have things change. I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 3

  4. Coreference Resolution It’s because of what both of you are doing to have things change. I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 4

  5. Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Advantages Disadvantages • Conceptually simple • Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 5

  6. Contributions • Address a modeling challenge: • Enable higher-order (multi-hop) coreference • Address a computational challenge: • Coarse-to-fine inference with a factored model 6

  7. Contributions • Address a modeling challenge: • Enable higher-order (multi-hop) coreference • Address a computational challenge: • Coarse-to-fine inference with a factored model 7

  8. Existing Approach: Span-ranking Model Lee et al. 2017 (EMNLP): Consider all possible spans in the document: • 1 < i < n Compute neural span representations: • h ( i ) Estimate probability distribution over possible antecedents: • P ( y i | h ) ✏ 8

  9. Limitations of a First Order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Local information not sufficient Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 9

  10. Limitations of a First Order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. Absolutely. Global structure reveals inconsistency Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Example from Wiseman et al. (2016) 10

  11. Higher-order Model Let span representations softly condition on previous decisions • 11

  12. Higher-order Model Let span representations softly condition on previous decisions • For each iteration: • Estimation antecedent distribution • Attend over possible antecedents • Merge every span representation with its expected antecedent • 12

  13. Higher-order Model I think that’s what’s… Go ahead Linda. P ( y all of you | h ) Thanks goes to you and to the media to help us. Absolutely. I Linda you ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 13

  14. Higher-order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. P ( y you | h ) Absolutely. I Linda ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 14

  15. Higher-order Model I think that’s what’s… Go ahead Linda. Thanks goes to you and to the media to help us. P ( y you | h ) Absolutely. I Linda ε Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. Learn a representation of “you” w.r.t. “I” 15

  16. Higher-order Model P ( y all of you | h ) I think that’s what’s… Go ahead Linda. I Linda you ε Thanks goes to you and to the media to help us. Absolutely. Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. 16

  17. Higher-order Model P ( y all of you | h ) I think that’s what’s… Go ahead Linda. I Linda you ε Thanks goes to you and to the media to help us. Absolutely. P ( y all of you | h 0 ) Obviously we couldn’t seem loud enough to bring the attention, so our hat is off to all of you as well. I Linda you ε 17

  18. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute h n ( i ) • 18

  19. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) •

  20. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i 20

  21. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) 21

  22. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) 22

  23. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) Final result: • P ( y i | h n ) 23

  24. Higher-order Model Let span representations softly condition on previous decisions • Iterative inference to compute : h n ( i ) • Base case: (from the baseline) h 0 ( i ) = h ( i ) • Recursive case: • X (attention mechanism) a n ( i ) = P ( y i | h n − 1 ) h n − 1 ( i ) y i (forget gates) f n ( i ) = σ ( W [ a n ( i ) , h n − 1 ( i )]) Final coreference decision conditions on clusters of size n + 2 h n ( i ) = f n ( i ) � a n ( i ) + (1 � f n ( i )) � h n − 1 ( i ) Final result: • P ( y i | h n ) 24

  25. Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Disadvantages Advantages • Conceptually simple •Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 25

  26. Recent Trends in Coreference Resolution End-to-end models have achieved large improvements Disadvantages Advantages • Conceptually simple •Computationally expensive • Minimal feature engineering • Very little “reasoning” involved 2nd order model already runs out of memory 26

  27. Contributions • Address a modeling challenge: • Enable higher-order (multi-hop) coreference • Address a computational challenge: • Coarse-to-fine inference with a factored model 27

  28. Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition 28

  29. Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition • O(n 2 ) spans to consider in practice 29

  30. Computational Challenge It’s because of what both of you are doing to have things change. • Mention candidates just for exposition • O(n 2 ) spans to consider in practice • O(n 4 ) coreference links to consider 30

  31. Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) 31

  32. Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) Existing scoring function: s ( i, j, h ) = FFNN ( h ( i )) + FFNN ( h ( j )) Mention scores + FFNN ( h ( i ) , h ( j ) , h ( i ) � h ( j )) Antecedent scores 32

  33. Coarse-to-fine Inference P ( y i | h ) = softmax( s ( i, y i , h )) Coarse-to-fine scoring function: s ( i, j, h ) = FFNN ( h ( i )) + FFNN ( h ( j )) Mention scores + h ( i ) > W c h ( j ) Cheap/inaccurate antecedent scores + FFNN ( h ( i ) , h ( j ) , h ( i ) � h ( j )) Antecedent scores 33

Recommend


More recommend