Implementation of Open-World, Integrative, Transparent, Collaborative Research Data Platforms: the University of Things (UoT) Prof. Peter Fox (pfox@cs.rpi.edu, @taswegian, #twcrpi, ORCID: 0000-0002-1009-7163) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive Science/ IT and Web Science Rensselaer Polytechnic Institute, Troy, NY USA And the Deep Carbon Observatory Data Science Team CGA, Harvard, April 27, 2018
What to expect… • Inevitable context, history + perspective • Deep Carbon Observatory (Integration and Collaboration) – Data Science Platform for an international science community – Lots of RED, <WHITE> and BLACK • data.rpi.edu V2 (Integration and Transparency) • Where we are headed – Integration, Transparency and Collaboration – University Infrastructure for Data Science
Working premise :== Mission Statement Scientists – actually ANYONE - should be able to access a global, distributed knowledge base of scientific data and information that: • appears to be integrated • appears to be locally available • is in a language (written, programming, or science) that is understandable and can be shared Data intensive – volume, complexity, mode, scale, heterogeneity, … in an OPEN WORLD
Deep Carbon Observatory (DCO) … • “We are dedicated to achieving transformational understanding of carbon’s chemical and biological roles in Earth.” www.deepcarbon.net
Collaboration and Integration needs … • “Enable DCO team leaders to create new groups and associate a number of content types --- documents, discussions, blog posts, tasks, links, and bibliographic entries --- with the group, as well as simple event management (a private event calendar for the group) and embedding of external services (e.g. and esp. Google Calendar)” … more… (data, publications, projects)… a Knowledge Network … and a Virtual Organization (> 1000 people)
Producers Consumers Experience Data Information Knowledge Creation Presentation Integration Gathering Organization Conversation Ecosystem metaphor Context 6
-> DCO Data Science Platform 2012
Science Network of Things (Objects) 2012 deepcarbon.net info.deepcarbon.net data.deepcarbon.net dx.deepcarbon.net
2012
2015
Dataset Browser, People, Field Sites 2015
All information is linked and traceable! 2013 12
2014
State to date… 2014 • Knowledge network – implements both the collaboration and the integration, reporting implements the transparency – It’s being USED • Many means of population – User generation – Machine generation • Contributing these enhancements back to open- source communities (CKAN, VIVO)
There’s more – Jupyter notebooks on top 2016 And this: https://news.rpi.edu/content/2018/04/23/applying-network-analysis-natural-history
data.rpi.edu (V2) 2013
Insert data.rpi screen shots 2013
2013
2013 Internal transparency and integration
Thus… • Integrative – semantics • Transparent – semantics • Collaborative – semantics • Application integration – Yep – semantics • So… where are we headed?
Research-grade but not “University-grade” • Adoption of RDA outputs/ • CIOs approach recommendations – “We only run the applications we know – Data Type Registry how to run” – Permanent ID Types • Library (not a research – Dynamic Data Citation* library) – Scholix* – Helped to start • Improvements to VIVO – Hurt in University • Science network of things adoption (hope^) * underway ^ New Library Director
Progress toward a University of Things • pfox@cs.rpi.edu and the DCO Data 2018 Science Team • @taswegian #twcrpi • http://tw.rpi.edu • http://tw.rpi.edu/web/project/DCO-DS • http://deepcarbon.net
Garden shed
Framework v. systems v. platforms • Rough definitions – Systems have very well-define entry and exit points. A user tends to know when they are using one. Options for extensions are limited and usually require engineering – Frameworks have many entry and use points. A user often does not know when they are using one. Extension points are part of the design – Platforms ~ arise from frameworks Tetherless World Constellation 24
2013 VIVO Extension: Shibboleth Single-Sign-On
VIVO Extension : Dataset deposit in attached data repository Need 2012 Begin DCO-ID? NO Revise YES YES metadata • Includes multi-level metadata Generate & register NO DCO-ID (unique suffix, blank NO collection URL) Data deposit YES • Includes persistent identifier NO External Collect CKAN metadata Revise CKAN data & generate URL metadata (DCO-ID generation) YES • Includes interaction with Add URL (to data in external Deposit in CKAN & generate Review DCO-ID repository) URL to data & CKAN metadata dedicated repository OR accepts URL to the downloadable data Update DCO-ID (map the DCO-ID third-party deposit details to CKAN URL) Update DCO-ID record Object without data URL End DCO-ID & DCO-ID metadata Data Deposited DCO data or URL to external data Science
We identify ‘everything’ = DCO-ID 2012 • Two part: all objects are issued Handle’s, and all published objects are also issued DOIs – DCO issues Handles, registration number is 11121 – We obtain DOIs from DataCite – If it is a person, we support ORCID, ResearcherID, ScopusID, eRA Commons, etc.. • You may see (note EPIC style identifier syntax): http://hdl.handle.net/11121/5676-3964-8313-5126-CC and http://dx.deepcarbon.net/11121/5676-3964-8313-5126-CC • E.g. Adding bibliography is easy, just enter the DOIs, or paste a bibtex record, and we do the rest, same for people (ORCID, ResearcherID, etc.) -> open world – linked to other sources
2013
2013 VIVO Extension: Retrieval of DOI metadata for publications from CrossRef * Expedites entry of e.g. journal articles by retrieving metadata based on DOI * Preserves author rank
Core and Framework Semantics - Multi-tiered interoperability 2012 Mediation! Mediation! Mediation!
Recommend
More recommend