Entity Search: Building Bridges between Two Worlds Krisztian Balog , Edgar Meij, and Maarten de Rijke ISLA, University of Amsterdam http://ilps.science.uva.nl
Entity search • Information organized around entities • Instead of finding documents about the entity, find the entity itself • Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities
Entity search tasks • Entity ranking • List completion • Related entity finding
Motivation • To which extent are IR and SW methods capable of answering information needs related to entity finding?
Where are we now? • Information Retrieval • Identifying and ranking entities in large volumes of data • Mostly based on co-occurrences between terms and entities • Generated models are not always meaningful for human consumption
Where are we now? • Semantic Web • Structured data, naturally organized around entities • Entity retrieval is as simple as running SPARQL queries? • Free-text querying is more appealing to (naive) end users
Related entity finding • Given • Input entity E (name plus homepage) • Type T of the target entity (person, organization, or product) • Narrative R (describes nature of relation) • Return homepages of related entities
Example topics (E) Source entity name Medimmune, Inc. (E) Source entity URL clueweb09-en0008-26-39300 (T) Target type Product (R) Narrative Products of Medimmune, Inc. (E) Source entity name Boeing 747 (E) Source entity URL clueweb09-en0005-75-02292 (T) Target type Organisation Airlines that currently use Boeing 747 (R) Narrative planes.
Aim • Compare IR and SW approaches on the related entity finding task • Focusing on finding all relevant entities, but not on actually ranking them
Related entity finding Our variation • TREC Entity 2009 topics (20) • Map source entity to a Wikipedia page (17) • Map target category to the most specific class within the DBPedia ontology • Ground truth: Wikipedia pages from relevance assessments
Example topic (E) Source entity name Boeing 747 (E) Source entity URL clueweb09-en0005-75-02292 (T) Target type Organisation Airlines that currently use Boeing 747 (R) Narrative planes. Source entity Boeing_747 DBPedia-owl Organisation/Company/Airline Airlines that currently use Boeing 747 Relation planes.
IR approaches • Aggregation of approaches employed at the TREC Entity track • Various ways of recognizing and ranking entities • Common to all is a mechanism for capturing the co-occurrence between source and target entities
A typical IR approach Query (input entity, relation) Document/snippet retrieval Answer candidate extraction Answer candidate (type) filtering Answer candidate ranking Output (related entities)
Two SW approaches • SPARQL query SELECT DISTINCT ?m ?r WHERE { ?m rdf:type dbpedia-owl:Drug . { ?m ?r dbpedia:MedImmune } UNION { dbpedia:MedImmune ?r ?m } } • Exhaustive graph search • Find all paths between E and T in a knowledge base • The depth of search is limited
SPARQL on DBPedia Query: Products of Medimunne, Inc. ?m ?r dbpedia:Amifostine dbp-prop:wikilink dbpedia:Blinatumomab dbp-prop:wikilink dbpedia:Motavizumab dbp-prop:wikilink dbpedia:Palivizumab dbp-prop:wikilink
SPARQL on DBPedia Query: Airlines that Air Canada has code share flights with. ?m ?r dbpedia:Air_Canada dbp-prop:wikilink dbpedia:Austrian_Airlines dbp-prop:wikilink dbpedia:Japan_Airlines dbp-prop:wikilink dbpedia:Lufthansa dbp-prop:wikilink dbpedia:Turkish_Airlines dbp-prop:wikilink ... dbpedia:Air_Ontario dbp-ontology:Company/parentCompany dbpedia:Air_Canada_Tango dbp-ontology:Company/parentCompany dbpedia:Canadian_Airlines dbp-ontology:foundationPerson
SPARQL on DBPedia Query: Members of the band Jefferson Airplane. ?m ?r dbpedia:Jim_Morrison dbp-prop:wikilink dbpedia:Jimi_Hendrix dbp-prop:wikilink ... dbpedia:Jack_Casady dbp-ontology:associatedMusicalArtist dbpedia:Paul_Kantner dbp-ontology:associatedMusicalArtist dbpedia:Joey_Covington dbp-ontology:associatedMusicalArtist dbpedia:Marty_Balin dbp-ontology:associatedMusicalArtist ... dbpedia:Grace_Slick dbp-prop:pastMembers dbpedia:Jorma_Kaukonen dbp-prop:pastMembers ...
Findings • IR and SW methods find basically the same set of entities • Most relations returned by SW methods are of type wikilink
Next • Extend search to Linked Open Data (LOD) • We use the Linked Data Semantic Repository (LDSR)
SPARQL on LOD Query: Products of Medimunne, Inc. ?m ?r dbpedia:Amifostine dbp-prop:wikilink dbpedia:Blinatumomab dbp-prop:wikilink dbpedia:Motavizumab dbp-prop:wikilink dbpedia:Palivizumab dbp-prop:wikilink dbpedia:Motavizumab fb:base.bioventurist.product.developed_by dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products dbpedia:Motavizumab fb:base.bioventurist.product.developed_by dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products
Graph search on LOD ��������������������������������� ����������������� ������������������������� ������� �������� �������������������������������������� ��������������������� ��������� ����������� ��������� ���� �������� ����� ������������ ����
Findings • More entities as well as more diverse relations • Having more data does not automatically improve results • Some of the identified entities are now too general
Summarizing findings • Information Retrieval • Excellent ways of finding associations between topics and entities • Tend to perform better for less popular entities (not represented in LOD) • Missing: semantics of the found associations
Summarizing findings • Semantic Web • Has the potential of generating a large number of candidate entities and relations • Could be as simple as instantiating a SPARQL query • For many queries LOD is very sparse w.r.t. semantically meaningful links between entities
Zooming out • Enhance text-based models with semantic information from LOD • Use IR models to discover and label links between entities in LOD
TREC Entity 2010 • Main task: Related entity finding • Pilot task: List completion • Given URIs of related entities, complete the list with additional entities from LOD
Questions? Krisztian Balog http://staff.science.uva.nl/~kbalog
Recommend
More recommend