Discovering Links for Metadata Enrichment on Computer Science - PowerPoint PPT Presentation

Discovering Links for Metadata Enrichment on Computer Science Papers At SWIB 2012 - Cologne Technical Report: http://bit.ly/Tiegi9 http://www.gesis.org/publikationen/gesis-technical-reports/ Johann Schaible and Philipp Mayr GESIS - Leibniz Institute for the Social Sciences {johann.schaible, philipp.mayr}@gesis.org

Scenario Title, Authors, Publication Date Title, Authors, Publication Date, Journal, Publisher, Conference, Abstract, Related Work, etc. 2

The Main Objectives 1. How to interlink internal data with the external data sources? 2. How to use an interlinking to enrich the metadata of a paper? 3

How to interlink Data? owl:sameAs Internal Data External Data Source Resource Resource owl:sameAs title hasTitle owl:sameAs author hasAuthor publication owl:sameAs publishedIn date publisher Additional Information journal subject 4

The External Data Sources DBLP ACM SW Conference Corpus • • • Data Data Data • • • About Computer Science Publications of the ACM About Semantic Web • Proceedings & Journals Details of the authors Conferences & • • Articles Access 2 Workshops • • • Information and links RKB Explorer Presented Papers • • about and to authors RKB SPARQL Endpoint Authors, Attendants etc. • • • Access 1 Access 3 RDF/XML Dump • • • RKB Explorer Semantic Sitemap RKB SPARQL Endpoint • • RKB SPARQL Endpoint split by type SNORQL Explorer • • RDF/XML Dump RDF/XML Dump Split by • 13 GB File Conferences & • Semantic Sitemap Workshops RKB split by year 1. http://dblp.rkbexplorer.com/ 2. http://acm.rkbexplorer.com/ 3. http://data.semanticweb.org/documentation/user/faq 5

Lars’ Internal Dataset 1. http://linkeddatabook.com/editions/1.0/ 2. http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/LinkedDataTutorial/ 3. http://aims.fao.org/lode/bd 6

A minimized DBLP & SWCC excerpt 7

Discovering Links with Silk 1 • Input – Specify data sources as SPARQL endpoint or RDF/XML dump – Specify output file, where the links are to be saved – Specify linking tasks, e.g. owl:sameAs • Output – SPARQL Update with discovered links – Discovered links are added to the specified output file 8 1) https://www.assembla.com/spaces/silk/wiki/dg7jfup58r4jZseJe5cbLA

How to use links for enrichment? 1. Add the discovered links to the internal dataset, thus making a hyper reference to the external data sources 2. Utilize the links to perform a query on the external data sources, thus adding their metadata to the internal dataset 9

Adding the links • Advantage – Following links leads to all further information provided by other data publishers – Minimum of effort needed to include the discovered links – Automatic up-to-date, if external data provider change their data • Disadvantage – Reliance on the external data provider. (  If URIs are changed) – dereferencing of the link (  Web representation, RKB Explorer, XML representation) 10

Performing a query to retrieve data • Advantage – All information is stored internally – No reliance on the external data provider • Disadvantage – More effort needed for designing a query – Not up-to-date if external data provider change their data 11

Silk – lessons learned • Silk Usability – Silk Workbench is very well structured and intuitively to use – The drag-and-drop functionality is very user friendly and connecting two properties with a comparator is straightforward – Silk has its own syntax for defining linkage rules – Loading big RDF dumps takes long. No progress bar is shown – If no links are found, Silk just displays an empty screen, without any messages • Silk Results – Each dataset was compared with itself. Silk found all matches easily – Two datasets with a different schema but with the same resources. Silk found all matches, but defining linkage rules was not straightforward – Comparing more that 2 properties often resulted in an error message stating, that Silk was not able to execute queries in parallel. – Silk’s linkage learning function did not work 12

Conclusion • Datasets from all involved data source have to be known (  on schema and instance level) • Knowhow in RDF, Linked Data, link discovery tools, and SPARQL are needed for a good and effective enrichment • “Computer Science Papers” is a good demonstration use case, but how is it with data from other domains? 13

Questions and Discussion Thank You 14

Discovering Links for Metadata Enrichment on Computer Science - PowerPoint PPT Presentation

Discovering Links for Metadata Enrichment on Computer Science Papers At SWIB 2012 - Cologne Technical Report: http://bit.ly/Tiegi9 http://www.gesis.org/publikationen/gesis-technical-reports/ Johann Schaible and Philipp Mayr GESIS - Leibniz

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

TDOT Incident Management/Maintenance Support MOU between DOT and Safety by Participating in

61A Lecture 27 Friday, November 8 2 Dynamic Scope The way in which names are looked up in Scheme

1 Further information: IFRS 17 paragraphs 1, C1 and C34 IFRS 17 Basis for Conclusions paragraphs

Satisfiability Modulo Linear Arithmetic Combinatorial Problem Solving (CPS) Albert Oliveras

SimpleGenericHTTPSoapClient //network communication via HTTP import java.io.*; import

A SYSTEM THAT I USED TO KNOW From Hello World to ShearWave Elastography Benoit Chauvin

Pockets: Hi-fi Midway Milestone Amy Nguyen, Cynthia Jia, Nestor Cano, Ryan Rice Team Members

Rev 5:1, And I saw in the right hand of Him And I saw in the right hand of Him Rev 5:1,

Sambuz

Useful Links

Newsletter

Mail Us

Discovering Links for Metadata Enrichment on Computer Science - PowerPoint PPT Presentation

Discovering Links for Metadata Enrichment on Computer Science Papers At SWIB 2012 - Cologne Technical Report: http://bit.ly/Tiegi9 http://www.gesis.org/publikationen/gesis-technical-reports/ Johann Schaible and Philipp Mayr GESIS - Leibniz

Links Student Web Presence Guidelines Summary 1. The Purpose of Links 2. Worst Links 3. Best

UNSD metadata template / SDMX Metadata Structure Definition Elena De Jess, UNSD Standardized

LINKS AND RULES GENOME VISUALIZATION WITH CIRCOS LINKS AND RULES 1 Martin Krzywinski

How of the Conceptual Future Internet Links lead to links that link to other links. Many

Discovering Gods Word (Part-2) Discovering Gods Word (Part-2) Hermeneutics = The science

Hitachi NEXT 2018 Automating Onboarding Data with Metadata Injection Contents Page 2:

Metadata In ArcGIS 10.0 Jason Cupp Whats New In ArcGIS 10.0 New Metadata Editor for

From SDTM to displays, through ADaM &amp; Analyses Results Metadata, a flight on board METADATA

Batch Metadata Editing in DSpace 1.6+ Maureen P. Walsh, The Ohio State University Libraries

DUNE Data Model Meeting: Metadata Metadata Needs And Considerations Steven Timm The following

The 3 rd Covenant Re-Discovering the Word of God within the words of the Bible Re-Discovering The

~ Discovering gold in the Cortez gold-trend of Nevada ~ NUG:V NULGF:QX Discovering gold in

Discovering Mammalian Endocytic Discovering Mammalian Endocytic Pathways with High- -Throughput

DISCOVERING OF CHILDREN NEEDS DISCOVERING OF CHILDREN NEEDS AND POTENTIALS: MAP SUPPORT IN

Discovering Flight Chapter Overview Discovering Flight The Early Days of Flight Chapter

Discovering Gods Word (Part-1) Discovering Gods Word The Inspired Word (Part-1) 2

TDOT Incident Management/Maintenance Support MOU between DOT and Safety by Participating in

61A Lecture 27 Friday, November 8 2 Dynamic Scope The way in which names are looked up in Scheme

1 Further information: IFRS 17 paragraphs 1, C1 and C34 IFRS 17 Basis for Conclusions paragraphs

Satisfiability Modulo Linear Arithmetic Combinatorial Problem Solving (CPS) Albert Oliveras

SimpleGenericHTTPSoapClient //network communication via HTTP import java.io.*; import

A SYSTEM THAT I USED TO KNOW From Hello World to ShearWave Elastography Benoit Chauvin

Pockets: Hi-fi Midway Milestone Amy Nguyen, Cynthia Jia, Nestor Cano, Ryan Rice Team Members

Rev 5:1, And I saw in the right hand of Him And I saw in the right hand of Him Rev 5:1,

Sambuz

Useful Links

Newsletter

Mail Us

From SDTM to displays, through ADaM & Analyses Results Metadata, a flight on board METADATA