A RESTful JSON-LD Architecture A RESTful JSON-LD Architecture for Unraveling Hidden References for Unraveling Hidden References to Research Data to Research Data Konstantin Baierer, Philipp Zumstein Philipp Zumstein Konstantin Baierer, Mannheim University Library Mannheim University Library SWIB15, 2015-11-24 SWIB15, 2015-11-24 Mannheim 1 / 23 University Library
Overview ● Context (data citations), Problem description ● Project InFoLiS: Overview ● Technical Architecture ● Demo InFoLiS-Project (Integration of research data and literature) Funded by the 2 nd (funding) phase Mannheim 2 / 23 University Library
Data Citation ● Research data = raw data, intermediate results in the research process – Your own research data – Research data from a data provider – Data from official statistics – Research data from your colleague ● Citation = formal structured reference to another scholarly work ● Data Citation = formal structured reference to research data Mannheim 3 / 23 University Library
Début of Data Citation When was the first structured data citation used in a publication? Maybe around the year 2000 ? ( send your suggestion to @infolis_project ) Printing Revolution WWW DataCite around 1450 1991 2009 When was the first unstructured reference to research data used in a publication? 1609 or before ( proof follows ...) Mannheim 4 / 23 University Library
First Unstructured “Data Citation” title “New Astronomy, Based upon Causes, or Celestial Physics, Treated by Means of cites data from Commentaries on the Motions of the Star Mars, from the Tycho de Brahe Observations of Tycho (1546-1601) Brahe” author Johannes Kepler (1571-1630) Kepler (1609): Astronomia nova Mannheim 5 / 23 University Library
Data Citations Principles ● Joint Declaration of Data Citation Principles: 1. Importance 2. Credit and Attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and Verifiability 8. Interoperability and Flexibility ● Currently 100 institutional supporters (39 data centers, 17 publishers, 26 societies and others) Mannheim 6 / 23 University Library
Data Citations Format Suggested Format by DataCite creator (publication year): title. version. publisher. resource type. identifier Rattinger, Hans; Roßteutscher, Sigrid; Schmitt-Beck, Rüdiger; Weßels, Bernhard (2012): Wahlkampf-Panel (GLES 2009). Version: 3.0.0. GESIS Datenarchiv. Dataset. doi:10.4232/1.11131 Data citation guidelines are included in APA style, NLM*, CMoS*, American Sociological Review, The American Economic Review, … (*) at handles databases Mannheim 7 / 23 University Library
But in practice... ● Table 1: Population forecast for Germany depending on age cohorts – proportion in percent. Data base: 10 th Population Forecast of the Federal Statistical Office. ● It already refers the IGLU study, according to which the ten- years-olds in Germany in a international comparison of reading literacy perform significantly better than the fifteen-years-olds. ● For this purpose, data from the Socio-Economic Panel (SOEP) of the years 1990 and 2003 are used and for both periods, the impact factors are estimated using linear regression models. Mannheim 8 / 23 University Library
Processing Steps ● Detect data citations in running (full)text ● Resolve and normalize data citations – IGLU = Internationale Grundschul-Lese-Untersuchung – SOEP = Socio-Economic Panel = Sozio-oekonomische Panel = Sozioökonomische Panel ● Uniquely identify data citations – IGLU 2001, IGLU 2006 oder IGLU 2011? ● Find the cited research data Can I help? – url – location Mannheim 9 / 23 University Library
InFoLiS Project Automating these processing steps, Automating these processing steps, i.e. automatically unraveling i.e. automatically unraveling hidden references (in running text) to research data hidden references (in running text) to research data into structured data citations with URIs into structured data citations with URIs Flexible and long-term sustainable infrastructure Flexible and long-term sustainable infrastructure Mannheim 10 / 23 University Library
InFoLiS Project – more in depth Data Data Algorithms: Data Mining, Bootstrapping Algorithms: Data Mining, Bootstrapping Model: Structure and Semantics Techn. Architecture: LOD + RESTful API Techn. Architecture: LOD + RESTful API Integration Mannheim 11 / 23 University Library
Integration Q: “How to best incorporate data connections into library catalogs?” (Horizon Report – Discovery System 2014 Library Edition) S e a r c h ? S e a r c h h c r a e S Search Journal website Data Repository Q: Where and how is the integration of data citations for our users most useful? Mannheim 12 / 23 University Library
Different Agents Linked Data API Agent Explorer want different data application/schema+json text/turtle application/rdf+xml ... Internal API Public API RD / OA Text Extraction Repository JSON-LD ↔ RDF Pattern Learning REST API Reference Extraction OAI/PMH ? Link Generation Simple HTTP API File Storage Resource Storage application/json RSS/Atom ? u Publisher application/json application/json application/ld+json Browser RDF Bulk CLI Plugin Explorer Tool Mannheim 13 / 23 University Library
API Usability over Semantic Depth Easy to maintain Easy to consume JSON RESTful (ish) Possible to understand Native Ordered Lists Protocol-independent High Performance Serialization-independent Deterministic structure Easy to impement in code Mannheim 14 / 23 University Library
Main Operations in InFoLiS Text Extraction Bootstrapping Extracting text from PDF Speed > Semantics Learning Patterns of data Reducing noise citations in natural languages Speed > Semantics Pattern Application Multiple levels of recursion Extracting dataset Speed > Semantics candidates from text Dataset Resolution Identifying textual references with the datasets they represent Semantics > Speed Automating intuition Mannheim 15 / 23 University Library
Deep modelling has its merit! ● Modelling Dataset granularity – Single issue of annual dataset? – Single panel of multi-faceted survey? ● Modelling Dataset reference vagueness – “As the results of our study indicate ...” – “According to page 15 of the DERP panel …” ● Bibliometric Analyses – Spanning a graph of publications, datasets, people … ● Provenance Mining – Which patterns are found in different learn sets? – Text A sameAs Text B PDF A textEquals PDF B Mannheim 16 / 23 University Library
How to get the best out of both worlds? Deep + KISS Modelling Mannheim 17 / 23 University Library
Frontend architecture MongoDB Mongoose Mongoose Schema TSON Mongoose-Ontology Mapper JSON Schema REST API Triple Pattern Ontology handler handler Handler handler HTTP server RDF / JSON Content Negotiation Mannheim 18 / 23 University Library
Extract from TSON-file RDF Class infolis:Execution RDF Property infolis:algorithm Database schema RDF Property infolis:log for Presentation TSON = Turtleson = json-ld + json-schema in Turtle + CoffeeScript Mannheim 19 / 23 University Library
One schema to rule them all Ontology Database schema [Linked Data Fragments] REST API Data model explorer REST API documentation Mannheim 20 / 23 University Library
Demonstration Discover the InFoLiS data model Mannheim 21 / 23 University Library
Demonstration API: graphical interface API on the command line Mannheim 22 / 23 University Library
Thank you for your attention! Questions? Keep in touch: {baierer, zumstein}@bib.uni-mannheim.de Twitter: @infolis_project Homepage: (Info, API, Tools, … ...it's in rapid development) http://infolis.github.io/ All InFoLiS Software is Open Source: http://github.com/infolis Mannheim 23 / 23 University Library
Recommend
More recommend