design challenges for entity linking
play

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, - PowerPoint PPT Presentation

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking Seattle beat Portland yesterday. 2 Entity Linking Seattle beat Portland yesterday. 3 Entity Linking Seattle beat Portland yesterday. Seattle


  1. Design Challenges 
 for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld

  2. Entity Linking Seattle beat Portland yesterday. 2

  3. Entity Linking Seattle beat Portland yesterday. 3

  4. Entity Linking Seattle beat Portland yesterday. Seattle 
 Seattle 
 Sea-Tac 
 (city) Sounders (airport) 4

  5. Entity Linking Seattle beat Portland yesterday. Seattle 
 Seattle 
 Sea-Tac 
 ~3-4 M 
 (city) Sounders (airport) entries 5

  6. Applications • Relation Extraction 
 (e.g. Koch et al. 2014) • Coreference Resolution 
 (e.g. Hajishirzi et al. 2013, Durrett & Klein 2014) • Question Answering 
 (e.g. Sun et al. 2015) • Web Search 
 (e.g. Knowledge Graph) • many others… 
 (see Shen et al. 2014; Roth et al. 2014) 6

  7. Ambiguity • Seattle beat Portland yesterday. • Seattle scores high in the latest report of startup hubs. • The Emerald City Council To Make Decision on Antibiotic Resolution 7

  8. Ambiguity Seattle Sounders • Seattle beat Portland yesterday. • Seattle scores high in the latest report of startup hubs. • The Emerald City Council To Make Decision on Antibiotic Resolution 8

  9. Ambiguity Seattle Sounders • Seattle beat Portland yesterday. • Seattle scores high in the latest report of startup hubs. Seattle (city) • The Emerald City Council To Make Decision on Antibiotic Resolution 9

  10. 
 Variability • Seattle scores high in the latest report of startup hubs. 
 • The Emerald City Council To Make Decision on Antibiotic Resolution 10

  11. 
 Variability • Seattle scores high in the latest report of startup hubs. 
 Seattle (city) • The Emerald City Council To Make Decision on Antibiotic Resolution 11

  12. Related Work • He et al. (2013b) • Cucerzan (2007) • Milne and Witten (2008) • Cheng and Roth (2013) • Kulkarni et al. (2009) • Sil and Yates (2013) • Ratinov et al. (2011) • Li et al. (2013) • Hoffart et al. (2011) • Cornolti et al. (2013) • Han and Sun (2012) • … many others • He et al. (2013a) 12

  13. Related Work • He et al. (2013b) • Cucerzan (2007) • Milne and Witten (2008) • Cheng and Roth (2013) Joint Inference • Kulkarni et al. (2009) • Sil and Yates (2013) • Ratinov et al. (2011) • Li et al. (2013) • Hoffart et al. (2011) • Cornolti et al. (2013) • Han and Sun (2012) • … many others • He et al. (2013a) 13

  14. Related Work • He et al. (2013b) • Cucerzan (2007) Learning 
 • Milne and Witten (2008) • Cheng and Roth (2013) to rank Joint Inference • Kulkarni et al. (2009) • Sil and Yates (2013) • Ratinov et al. (2011) • Li et al. (2013) • Hoffart et al. (2011) • Cornolti et al. (2013) • Han and Sun (2012) • … many others • He et al. (2013a) 14

  15. Related Work • He et al. (2013b) • Cucerzan (2007) Learning 
 • Milne and Witten (2008) • Cheng and Roth (2013) to rank Joint Inference • Kulkarni et al. (2009) • Sil and Yates (2013) • Ratinov et al. (2011) • Li et al. (2013) Deep Neural 
 • Hoffart et al. (2011) • Cornolti et al. (2013) Networks • Han and Sun (2012) • … many others • He et al. (2013a) 15

  16. Popular Data Sets Datase # of Mentions Knowledge Base t ACE 244 Wikipedia UIUC MSNBC 654 Wikipedia AIDA 
 AIDA-D 5917 Yago (Hoffart et AIDA-T 5616 Yago al. 2011) TAC09 3904 Wikipedia 2008 TAC10 2250 Wikipedia 2008 TAC10T 1500 Wikipedia 2008 TAC KBP TAC11 2250 Wikipedia 2008 TAC12 2226 Wikipedia 2008 16

  17. Unfortunately… ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12 ⎷ Cucerzan (2007) Milne & Witten (2008) ⎷ Kulkarni et al. (2009) Ratinov et al. (2011) ⎷ ⎷ ⎷ Hoffart et al. (2011) ⎷ Han & Sun (2012) ⎷ ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) Cheng & Roth (2013) ⎷ ⎷ ⎷ Sil & Yates (2013) ⎷ ⎷ ⎷ ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ ⎷ ⎷ ⎷ TAC-KBP participants 17

  18. Unfortunately… ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12 ⎷ Cucerzan (2007) Milne & Witten (2008) Learning 
 ⎷ Kulkarni et al. (2009) Joint Inference Ratinov et al. (2011) ⎷ to rank ⎷ ⎷ Hoffart et al. (2011) ⎷ Han & Sun (2012) ⎷ ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) Cheng & Roth (2013) ⎷ Deep Neural 
 ⎷ ⎷ Sil & Yates (2013) ⎷ ⎷ ⎷ Networks ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ ⎷ ⎷ ⎷ TAC-KBP participants 18

  19. Metonymy ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12 ⎷ Cucerzan (2007) Moscow (city) Milne & Witten (2008) ⎷ Kulkarni et al. (2009) … Moscow ’s as Ratinov et al. (2011) ⎷ ⎷ Russia (country) yet undisclosed ⎷ Hoffart et al. (2011) ⎷ proposals … Han & Sun (2012) Government of Russia ⎷ ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) Cheng & Roth (2013) ⎷ ⎷ ⎷ Sil & Yates (2013) ⎷ ⎷ ⎷ ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ ⎷ ⎷ ⎷ TAC-KBP participants 19

  20. Nested Entities ACE MSNBC AIDA-D AIDA-T KBP09 KBP10 KBP10T KBP11 KBP12 ⎷ Cucerzan (2007) Green Party of the US Milne & Witten (2008) ⎷ Kulkarni et al. (2009) Ratinov et al. (2011) ⎷ ⎷ … Florida Green Party … ⎷ Hoffart et al. (2011) ⎷ Han & Sun (2012) ⎷ ⎷ He et al. (2013a) ⎷ ⎷ He et al. (2013b) Cheng & Roth (2013) ⎷ ⎷ ⎷ Green Party of Florida Sil & Yates (2013) ⎷ ⎷ ⎷ ⎷ ⎷ Li et al. (2013) ⎷ ⎷ Cornolti et al. (2013) ⎷ ⎷ ⎷ ⎷ ⎷ TAC-KBP participants 20

  21. Contributions • Vinculum : a simple, deterministic, modular EL sys. • comprehensive evaluation over nine data sets • candidate conditional prob. can work quite well • entity types are important to the final performance • comparable results with two state-of-the-art sys. 21

  22. Agenda • Introduction Mention Extraction Candidate Generation • Vinculum Entity Type • Experiments Coreference • Conclusion Coherence 22

  23. Vinculum Architecture Input: Seattle beat Portland yesterday. Mention Extraction Candidate Generation Entity Type Coreference Coherence 23

  24. Mention Extraction Seattle beat Portland yesterday. Mention Extraction 24

  25. Candidate Generation Seattle beat Portland yesterday. Candidate Entities 
 Mention Extraction Seattle (city) 
 - Candidate Generation Seattle Sounders 
 - Seattle-Tacoma 
 - (airport) 25

  26. Mention Extraction Conditional probability Candidate Generation Entity Type Coreference Coherence … capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington … # [m -> e] p (e | m) = # m 26

  27. Mention Extraction Conditional probability Candidate Generation Entity Type Coreference Coherence … capital of the state of Washington . In 1990, Washington starred as Bleek Gilliam … Washington refused to run for a third term … … Washington … # “W” -> p ( | “Washington”) = # “W” 27

  28. Candidate Generation Seattle beat Portland yesterday. Candidate Entities 
 Mention Extraction Seattle (city) 
 - Candidate Generation Seattle Sounders 
 - Seattle-Tacoma 
 - (airport) 28

  29. Candidate Generation Seattle beat Portland yesterday. Candidate Entities 
 Mention Extraction Seattle (city) 
 - 0.6 Candidate Generation Seattle Sounders 
 - 0.2 Seattle-Tacoma 
 - 0.1 (airport) 29

  30. Entity Types Seattle beat Portland yesterday. Candidate Entities 
 Mention Extraction Seattle (city) 
 - 0.6 Candidate Generation Seattle Sounders 
 - 0.2 Entity Type Seattle-Tacoma 
 - 0.1 (airport) Entity Type Prediction • city 0.1 • sports_team 0.4 • facility/airport 0.1 30

  31. Mention Extraction Entity Types Candidate Generation Entity Type Coreference Seattle beat Portland yesterday. Coherence Candidate Entities 
 Seattle (city) 
 - 0.6 p (e | m) = ∑ t p (e,t | m) 
 Seattle Sounders 
 - 0.2 p(e | m) = ∑ t p (e | t,m) p (t | m) Seattle-Tacoma 
 - 0.1 (airport) Entity Type Prediction • city 0.1 • sports_team 0.4 • facility/airport 0.1 31

  32. Mention Extraction Entity Types Candidate Generation Entity Type Coreference Seattle beat Portland yesterday. Coherence Candidate Entities 
 Seattle (city) 
 - 0.6 p (e | m) = ∑ t p (e,t | m) 
 Seattle Sounders 
 - 0.2 p(e | m) = ∑ t p (e | t,m) p (t | m) Seattle-Tacoma 
 - 0.1 (airport) p (e | t,m) : re-normalization of cond. prob. 
 32

  33. Mention Extraction Entity Types Candidate Generation Entity Type Coreference Seattle beat Portland yesterday. Coherence Candidate Entities 
 Seattle (city) 
 - 0.6 p (e | m) = ∑ t p (e,t | m) 
 Seattle Sounders 
 - 0.2 p(e | m) = ∑ t p (e | t,m) p (t | m) Seattle-Tacoma 
 - 0.1 (airport) p (e | t,m) : re-normalization of cond. prob. 
 e.g. t = LOC p (Seattle-city | LOC , “Seattle”) = 0.6 / 0.7 
 p (Sea-Tac | LOC , “Seattle”) = 0.1 / 0.7 33

Recommend


More recommend