cs388 natural language processing coreference resolu8on
play

CS388: Natural Language Processing Coreference Resolu8on Greg - PowerPoint PPT Presentation

CS388: Natural Language Processing Coreference Resolu8on Greg Durrett Road Map Text Applica/ons Annota/ons Text Analysis POS tagging Summarize Syntac8c parsing Extract informa8on NER Answer ques8ons Coreference resolu8on Iden8fy


  1. CS388: Natural Language Processing Coreference Resolu8on Greg Durrett

  2. Road Map Text Applica/ons Annota/ons Text Analysis POS tagging Summarize Syntac8c parsing Extract informa8on NER Answer ques8ons Coreference resolu8on Iden8fy sen8ment Translate ‣ Analysis: syntax, seman8cs, discourse, pragma8cs ‣ Coreference: discourse + pragma8cs

  3. Discourse Analysis President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people.

  4. Discourse Analysis President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people. slide credit: Aria Haghighi

  5. Discourse Analysis Discourse (rhetorical, temporal structure) Events En((es Text slide credit: Aria Haghighi

  6. En88es President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people. ‣ En88es are real-world things that can be resolved to an entry in a knowledge base (Wikipedia), can repeatedly reference them in a text Cluster 1: en.wikipedia.org/wiki/Barack_Obama Cluster 2: …/wiki/Edward_M.Kennedy_Serve_America_Act Cluster 3: …/wiki/United_States_Congress

  7. Coreference Resolu8on ‣ Input: text with men8ons President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people. ‣ Output: a clustering of those men8ons President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people.

  8. Coreference Resolu8on ‣ Input: text with men8ons President Barack Obama received the Serve America Act aPer Congress’s vote. He signed the bill last Thursday. The president said it would greatly increase service opportuni8es for the American people. ‣ Alterna8vely: answer “who is my antecedent?” for each anaphor coreferent He President the Serve Congress’s Barack Obama America Act Anaphor Possible Antecedent antecedents

  9. Outline ‣ Linguis8c phenomena in coreference ‣ Building coreference models ‣ Incorpora8ng world knowledge

  10. Phenomena in Coreference

  11. Pragma8cs 101 President Barack Obama received the Serve America Act aPer Congress’s vote. 
 President Barack Obama signed the Serve America Act last Thursday. 
 President Barack Obama said… President Barack Obama received the Serve America Act aPer Congress’s vote. 
 He signed the bill last Thursday. 
 The president said… ‣ When we speak/write, we have an idea of what’s clear to the listener, and communicate more efficiently as a result

  12. Pragma8cs 101 Proper Name Nominal Pronoun President Barack Obama the president he Specificity Salience required President Barack Obama received the Serve America Act aPer Congress’s vote. 
 He signed the bill last Thursday. 
 The president said… ‣ Proper, nominal, and pronominal men8ons all resolve differently

  13. Proper Men8ons ‣ Introduce new en88es and give informa8on, iden8ty en88es unambiguously (mostly) President Barack Obama, 44th president of the United States, … President Obama Obama ‣ When might there be ambiguity? Dell founded what would become his eponymous company in 1984. Dell was later taken private in a leveraged buyout. ‣ Main cues: lexical overlap, seman/c type agreement

  14. Pronouns President Barack Obama received the Serve America Act aPer Congress’s vote. He … President Obama met with Chancellor Merkel. He … The policeman 8cketed the driver aPer he no8ced a broken taillight he ran the stop sign This is the house where the bomb was built into the boat that carried it. ‣ Main cues: salience, number/gender agreement, event seman/cs/ commonsense knowledge

  15. Nominal Men8ons President Obama … The president … Serve America Act … The bill ‣ Basic lexical seman8cs/hypernymy NBC … The network ‣ World knowledge Barack Obama and Angela Merkel … The leaders ‣ Combines the two: Obama is a president, Merkel is a chancellor, the common type of those is leader ‣ Main cues: seman/c type agreement/world knowledge, salience

  16. Phenomena ‣ Salience: distance features ‣ Seman8c compa8bility ‣ Gender: he vs. she ‣ Animacy: he/she vs. it ‣ Seman8c type: Michael Dell (person) vs. Dell (company) ‣ Hypernymy: an act is a bill ‣ Commonsense knowledge: a bomb can be carried, a boat cannot be ‣ World knowledge: Merkel is a leader ‣ Coreference is a challenging NLP problem! Several different subproblems, lots of sources of informa8on that we need to consider

  17. Building Coreference Models

  18. Rule-based Systems ‣ Filter possible antecedents based on syntac8c and seman8c informa8on, resolve to the closest one He President the Serve Congress’s Barack Obama America Act ‣ inanimate ‣ inanimate ‣ Seman8c informa8on used: number and gender (automa8cally scraped), head word / string match, some world knowledge (NBC = network) Haghighi and Klein (2008)

  19. En8ty-centric Ruled-based Systems F EMALE Michelle Obama promoted her fitness and nutri8on program on Thursday. F EMALE Obama gave a speech on the “Let’s Move!” program, praising Sam Kass. He… ‣ Coreference depends on iden8ty of Obama, which in turn depends on other coreference links ‣ Need to make decisions globally: en8ty-centric, “sieve-based” coreference, “easy-first” systems all rely on earlier decisions to do this Rahman and Ng (2009), Raghunathan et al. (2010), Lee et al. (2011)

  20. Men8on-Ranking Systems a 4 a 2 a 1 a 3 New New 3 New New 2 2 1 1 1 President the Serve Congress’s He Barack Obama America Act document p ( a i = j | x ) ∝ exp( w > f ( i, j, x )) ‣ Log-linear model anaphor antecedent features of men8on index index pair + document Denis and Baldridge (2008), Fernandes et al. (2012), Durrej and Klein (2013)

  21. Features for Learning-based Systems Ment. distance = 3 Sent. distance = 1 Salience [new] PRONOUN Antecedent length = 3 Anaph length = 1 [new] he Pragma8cs No string match [new] X signed No head match Seman8c 
 Obama —he MALE—he [new] . X compa8bility X received —he PROPER—X signed [new] Length = 1 President received the Serve… . He signed the bill Barack Obama P RONOUN, M ALE, S INGULAR P ROPER, M ALE, S INGULAR Denis and Baldridge (2008), Fernandes et al. (2012), Durrej and Klein (2013)

  22. Neural Network Models score Feedforward neural network distance, 
 pair feats antecedent feats anaphor feats head match, etc. President Barack Obama received the Serve… . He signed the bill ‣ Similar inputs to log-linear model ‣ Word embeddings + nonlinear layers capture more complex interac8ons between men8on and antecedent Clark and Manning (2016)

  23. Performance 80 78.0 70 CoNLL F1 65.6 61.7 60 55.6 50 40 Stanford Rule-based (2010) Berkeley Log-linear (2014) Stanford Deep Coref (2016) Human

  24. Incorpora8ng World Knowledge

  25. Accuracy Per Men8on Class (Berkeley) Anaphoric pronouns 72.0 Obama he Referring: head match 82.7 the U.S. president president Referring: no head match } 6.2% 6.2 David Cameron prime minister

  26. Accuracy Per Men8on Class (Berkeley) Anaphoric pronouns 72.0 Obama he Referring: head match 82.7 the U.S. president president Referring: no head match } 6.2% 6.2 David Cameron prime minister

  27. Accuracy Per Men8on Class (Berkeley) Anaphoric pronouns 72.0 Obama he Referring: head match 82.7 the U.S. president president Referring: no head match } 6.2% 6.2 David Cameron prime minister

  28. Phenomena ‣ Salience ‣ Seman8c compa8bility ‣ Gender ‣ Basic features get these ‣ Animacy ( ) ‣ Word embeddings sort of 
 ‣ Seman8c type do these ( ) ‣ Hypernymy ‣ Commonsense knowledge ‣ World knowledge

  29. Word Embeddings China Russia ’s economy has been sluggish… Russia …suspected collusion with Russia . The… Iran …a trip to Russia in the spring8me na8on ‣ Russia is not Iran! Possibly compa8ble pairs are less similar than many incompa8ble pairs ‣ Word vectors capture topical similarity , are not trained to capture referen@al iden@ty

  30. Phenomena ‣ Salience ‣ Seman8c compa8bility ‣ Gender ‣ Basic features get these ‣ Animacy ( ) ‣ Word embeddings sort of 
 ‣ Seman8c type do these ( ) ‣ Hypernymy X ‣ Commonsense knowledge ‣ …but they don’t do these X ‣ World knowledge

  31. Leveraging External Resources ‣ How do we figure out what kind of thing NBC is? ‣ Use an external knowledge base 
 like Wikipedia ‣ Knowledge can import the features needed to make difficult coreference decisions

  32. Joint En8ty Linking and Coreference ‣ There are many things NBC could mean! ‣ Need to tackle en@ty linking as well: 
 figuring out what en8ty a given occurrence 
 of NBC refers to ‣ Joint models resolve en88es to Wikipedia and simultaneously place coreference links (Durrej and Klein, 2014) ‣ Improvement from en8ty linking is small: ~1% on CoNLL metric

Recommend


More recommend