HITS at TAC 2015 Entity Discovery and Linking Benjamin Heinzerling 1 , 2 and Michael Strube 2 1 AIPHES, 2 Heidelberg Institute for Theoretical Studies
Joint Global Disambiguation and NIL Clustering • Our previous years’ system (Fahrni et al., 2013, Judea et al., 2014). • Jointly performs global disambiguation and NIL clustering using a Markov Logic Network (MLN). • General, not trained on TAC data. • Performed consistently well in various evaluations. • Gets some easy decisions wrong. • Mention detection not joint.
Goals • Better integrate mention detection. • Get the easy decisions right. • See how far the new KB gets us. • English EDL with mention detection.
Architecture: Options • Joint approach: Strong interaction, slow, development difficult. • Pipeline of tasks: Weak interaction, fast, ordering difficult. • Pipeline of decisions: Some interaction, fast, ordering feasible.
Architecture Segmentation, Segmentation, Segmentation, POS POS POS NER NER NER Mention Detection Mention Detection Linking / NIL class. Linking / NIL class. NIL clustering Mention Mention Detection Detection Linking / NIL class. Mention Detection Joint Mention Linking / NIL class. Linking Detection, Mention Detection and NIL Linking, Linking / NIL class. classification and NIL NER Clustering Joint Linking NIL clus- & Clustering tering Mention Detection Linking Post- Post- Post- Processing Processing Processing Too simple! Happy middle. Too complex!
Sieves • High-precision linking: • Unambiguous CrossWiki mentions • Entity type checking • Sense label mismatch filter • Person name matching • Salient semantic paths • General and TAC-specific post-processing: • Dominant sense fallback • Country adjectivals mapping • Media organization filter
Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura .
Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura
Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura
Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura Nigeria (Africa) /location/location/contains Daura (Nigeria)
Salient Semantic Paths • Previous work maximizes relatedness between candidate senses of two or more given mentions (Hoffart et al., 2011; Moro et al. 2014). Nigeria became Africas largest economy . . . . . . . town of Daura . Nigeria (Africa) - Pierre Daura Nigeria (Jazz Album) - Pierre Daura Nigeria (Africa) /location/location/contains Daura (Nigeria) Nigeria (Jazz Album) - Daura (Nigeria) • Slow on Freebase.
Salient Semantic Paths • Our approach: Given one linked mention, follow salient semantic paths through the knowledge base graph. • Faster than looking for paths between candidate senses. /location/location/contains + Nigeria became Africas largest economy . . . . . . . town of Daura. /people/person/children /people/person/children Netanyahu ’s sons, Avner and Yair , were chosen . . . . /common/topic/alias Think of it as Oscar Pistorius on steriods. I couldn’t help but think of the blade runner.
Results: Linking 80 Sieves + MLN (HITS2) Sieves only (HITS1) 75 Best strong all match 70 . 9 70 . 7 70 . 3 70 64 . 0 64 . 0 65 63 . 6 60 58 . 8 58 . 8 57 . 8 55 Precision Recall F1
Results: Clustering 80 Sieves + MLN (HITS2) 76 . 5 Sieves only (HITS1) 74 . 8 74 . 7 75 Best mention ceaf 70 68 . 4 68 . 2 67 . 1 65 62 . 2 62 . 2 61 . 0 60 55 Precision Recall F1
Nominal Coreference? Hilary Clinton ’s latest book . . . the author . . . Netanyahu will be busy being a leader, unlike Obama the golfer ! • Can common noun surface forms (NOM) be reached in the Kb? • Tested for paths up to length two. • Only 30 percent of non-NIL NOMs reachable.
Limitations • Fails for nominal coreference. • Hand-picked, genre-specific semantic paths. • Only works for non-NILs.
Conclusions • State-of-the-art monolingual linking and clustering. • Simple symbolic approach: large-scale resources, string match, KB queries. • Global disambiguation has only a small effect on linking performance. • Joint disambiguation and NIL clustering improves clustering performance.
Future Work • Ordering and granularity of decisions. • Top three systems’ linking performance similar: All solving the same easy problems? • Hard: Metonymy, cohyperonymal lists, nominal coreference. • Symbolic approaches unlikely to solve these. • Need to combine with distributional methods.
Thank You
Recommend
More recommend