Identifying Relevant Sources for Data Linking using a Semantic Web - PowerPoint PPT Presentation

Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov Mathieu d’Aquin Knowledge Media Institute The Open University, UK

How to link a new dataset? • What other repositories contain relevant data which I should link to? – Select the external repository • How to select the relevant data instances to link? – Select the relevant classes within the chosen repository ? LinkedMDB TV programs movies DBPedia pieces of music Freebase actors MusicBrainz composers bestbuy

Selection criteria • Additional information about local instances • Popularity • Degree of overlap DBLP data.open.ac.uk rae:RKBExplorer Publication data DBPedia

Available information • Additional information about resources – Schema ontology – Test examples • Popularity – VoiD descriptors • Linking repositories – Catalog of repositories (CKAN) • Degree of overlap – VoiD descriptors (only topic relevance) – Relevant info hard to obtain on the client side

Approach Search for sources with potentially high degree of overlap – Use a subset of entity labels from the original dataset as keywords for entity search

Approach Aggregate results – Group instances occurring in returned result sets by their source repositories

Approach Rank sources – Sort by number of individuals returned in search results

Approach Select “most relevant” class – Select the class in each source, which covers most of instances

Issues: imprecise results • Main cause: ambiguous instance labels • Inclusion of irrelevant sources – E.g., DBLP for movie score composers • Selection of inappropriate classes within the selected source – Too generic: e.g., dbpedia: Person vs dbpedia: MusicArtist – Irrelevant: e.g., akt: Publication-Reference (journal volume) vs akt: Journal

Filtering results Determine potentially irrelevant classes – Use state-of-the-art schema matching to select relevant classes

Filtering results Filter out irrelevant search results – Only consider search result instances belonging to “approved” classes

Preliminary experiments • Datasets – ORO journals (data.open.ac.uk): 3110 instances – LinkedMDB films: 400 instances – LinkedMDB music contributors: 400 instances • External components – Semantic index: Sig.ma – Ontology matching techniques: CIDER, instance-based schema mappings retrieved from BTC2009 dataset

Preliminary experiments • Performance measure: – Proportion of relevant sources among the top-10 returned results Before filtering + / - After filtering + / - rae2001 (RKB) + rae2001 (RKB) + dotac (RKB) + DBPedia + DBPedia + dblp.l3s.de + oai (RKB) + Freebase + dblp.l3s.de + DBLP (RKB) + wordnet (RKB) - eprints (RKB) + bibsonomy - eprints (RKB) + Freebase + www.examiner.com -

Preliminary experiments • Summary: – Top-ranked returned repositories are largely relevant from the point of view of linking – Filtering using schema matching techniques greatly improves precision (all remaining sources are relevant) – … but at the expense of some recall

Future work • Improving the quality of results – E.g., estimating the potential loss of precision/ recall for different filtering decisions • Integrating with the data linking workflow – Automatically pre-configuring the data linking algorithm • Repository search as a potentially useful semantic search use case (in addition to entity and document search)

Thanks for your attention Questions?

Identifying Relevant Sources for Data Linking using a Semantic Web - PowerPoint PPT Presentation

Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov Mathieu dAquin Knowledge Media Institute The Open University, UK How to link a new dataset? What other repositories contain relevant data which I

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

A framework for linking land use and A framework for linking land use and A framework for linking

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Data Sources; SCNL Data Sources Data sources producing waveform data can come from a remote

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use

Sources Sources: Kinds of Sources Citizen witness Confidential informants Anonymous

Sources of Start Sources of Start- -up Capital up Capital up Capital Sources of Start Sources

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Select the best sources by Currency Select the checking best sources by Range Select the

Using Hospital Data to Measure Quality of Care and Linking it to DRG of Care and Linking it to

DYNAMIC LINKING CONSIDERED HARMFUL 1 WHY WE NEED LINKING Want to access code/data defined

Sources of Authority Sources of Authority Sources of Authority Lesson No. 3 ENV H 471

(CLOJURE) ENTITY LINKING IN FOR FUN @Sojoner AGENDA Motivation Entity linking

Using the Isabelle Ontology Framework Using the Isabelle Ontology Framework Linking the Formal

Linking library data: contributions and role of subject data Nuno Freire The European Library

Extensible and Scalable Network Monitoring Using OpenSAFE Jeffrey R. Ballard Ian Rae Aditya

Applying QM Standards: The Process and Product of a Program Review Rae Mancilla, Ed.D. &

MCMCT Group of 25 social service agencies and community members working together with parents to

High-Performance Transaction Processing in SAP HANA Presentation by Young-Rae Kim What is SAP

How much meaning can you pack into a real-valued vector? Semantic similarity measuring using

RIV and Resilient Authenticated Encryption Farzaneh Abed 1 , Christian Forler 2 , Eik List 1 ,

Combining Teaching and Research in Text-Mining from Social and Cultural Data Claire Brierley and

How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers Kazuki Irie *,

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Identifying Relevant Sources for Data Linking using a Semantic Web - PowerPoint PPT Presentation

Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov Mathieu dAquin Knowledge Media Institute The Open University, UK How to link a new dataset? What other repositories contain relevant data which I

Linking linking Weak forms Linking Weak forms Elision (sound cut)

Syntax 3 Predicates Predicates and Linking Verbs Linking Verbs Linking Verbs

A framework for linking land use and A framework for linking land use and A framework for linking

Design Challenges for Entity Linking Xiao Ling , Sameer Singh, Daniel S. Weld Entity Linking

Data Sources; SCNL Data Sources Data sources producing waveform data can come from a remote

Entity Linking Enityt Linking Laura Dietz dietz@cs.umass.edu University of Massachusetts Use

Sources Sources: Kinds of Sources Citizen witness Confidential informants Anonymous

Sources of Start Sources of Start- -up Capital up Capital up Capital Sources of Start Sources

RC circuits with DC sources A Circuit i (resistors, voltage sources, v C current sources,

Select the best sources by Currency Select the checking best sources by Range Select the

Using Hospital Data to Measure Quality of Care and Linking it to DRG of Care and Linking it to

DYNAMIC LINKING CONSIDERED HARMFUL 1 WHY WE NEED LINKING Want to access code/data defined

Sources of Authority Sources of Authority Sources of Authority Lesson No. 3 ENV H 471

(CLOJURE) ENTITY LINKING IN FOR FUN @Sojoner AGENDA Motivation Entity linking

Using the Isabelle Ontology Framework Using the Isabelle Ontology Framework Linking the Formal

Linking library data: contributions and role of subject data Nuno Freire The European Library

Extensible and Scalable Network Monitoring Using OpenSAFE Jeffrey R. Ballard Ian Rae Aditya

Applying QM Standards: The Process and Product of a Program Review Rae Mancilla, Ed.D. &amp;

MCMCT Group of 25 social service agencies and community members working together with parents to

High-Performance Transaction Processing in SAP HANA Presentation by Young-Rae Kim What is SAP

How much meaning can you pack into a real-valued vector? Semantic similarity measuring using

RIV and Resilient Authenticated Encryption Farzaneh Abed 1 , Christian Forler 2 , Eik List 1 ,

Combining Teaching and Research in Text-Mining from Social and Cultural Data Claire Brierley and

How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers Kazuki Irie *,

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Applying QM Standards: The Process and Product of a Program Review Rae Mancilla, Ed.D. &