From Open Data to Open Science Geoffrey Boulton University of Edinburgh & CODATA “Learn” Workshop University College, London January 2016
Knowledge and understanding - the engines of material progress depend on technologies that enable their accumulation and communication 2002 1454
Openness – the bedrock of science in the modern era Henry Oldenburg
Scientific self correction
The Challenge: the “Data Storm” is undermining “self correction” /var/folders/ls/nv6g47p94ks4d11f1p72h2ch00 00gn/T/com.apple.Preview/com.apple.Preview .PasteboardItems/rutford_avo_afi_ed_july201 0 (dragged).pdf THEN AND NOW
A crisis of reproducibility and credibility? Why such low levels of reproducibility? • Misconduct/fraud • Invalid reasoning • Absent or inadequate data and/or metadata
2007 Exabytes The digital revolution 19 Global information storage capacity In optimally compressed bytes 2000 2014 - 4000 Exabytes 1993 1986 280 Exabytes Analogue Storage Digital Storage Explosion of the Digital revolution Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=10 18 bytes
Data acquistion: Cost down – Flux up http://www.wired.co. uk/news/archive/201 4-01/15/1000-dollar- genome/viewgallery/3 31679
Information: how much is crystallised into knowledge?
Reinventing reproducibility for the digital age How do we retain an essential principle? The data providing the evidence for a published concept MUST be concurrently published, together with necessary metadata and computer code. To do otherwise is scientific MALPRACTICE
Micro-satellite Four key drivers of change for science • Big data • Semantically-linked data • Open data • Cost reduction Looking at clouds Ozone Levels
Pillars of the Digital Revolution Linked Big Data Open Data Volume Many databases Velocity Semantic Variety Relations Deeper meaning Machine analysis & learning Text and data mining Foundations : Openness
The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour Simulating behaviour of Uncoupled highly coupled systems systems
Scientific opportunities • patterns not hitherto seen • unsuspected relationship • complex systems e.g. complexity: dynamic evolution and system state Simulating system dynamics Mapping a complex state Emergent behaviour of a specific 6-component coupled system Image of brain cells in a rat
The opportunity: data-modelling: iterative integration Satellite observation Surface monitoring Initial conditions Model forecast Model-data iteration - forecast correction
System characterisations: from simple to complex Complex systems Simple relationships No mathematical pipeline Classical statistics Dynamic/complex behaviour Linear regression Glucose in type II diabetes Topological analysis Cluster analysis
A barrier to openness? - Analytic overload. E.g. - Global Earth Observation System of Systems A disconnect between machine analysis & human cognition? • What is the human role? • Can we analyse & scrutinise what is in the black box? - &who owns the box? • What does it mean to be a researcher in a data intensive age?
New modes of technology- Tim Gowers - crowd-sourced mathematics enabled creativity: Mathematics related discussions e.g Crowd-sourcing An unsolved problem posed on his blog. 32 days – 27 people – 800 substantive contributions Emerging contributions rapidly developed or discarded Problem solved! “Its like driving a car whilst normal research is like pushing it” What inhibits such processes? - The criteria for credit and promotion – ALTMETRICS THE ANSWER?
The Open Data Iceberg Technology The Technical Challenge The Consent Challenge The Ecosystem Challenge Processes & The Funding Challenge Organisation The Support Challenge The Skills Challenge People The Incentives Challenge The Mindset Challenge A National Infrastructure motivation and ethos. Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers , No. 246, OECD Publishing.
The “Science International” Accord: principles of open data (www.icsu.org/science-international) Responsibilities 1-2. Scientists 3. Research institutions & universities 4. Publishers 5. Funding agencies 6. Scholarly societies and academies 7. Libraries & repositories 8. Boundaries of openness Enabling practices 9. Citation and provenance 10. Interoperability 11. Non-restrictive re-use 12. Linkability
Responsibilities Scientists i. Publicly funded scientists have a responsibility to contribute to the public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re- used and re-purposed. ii. The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations. To the extent possible, data should be deposited in well-managed and trusted repositories with low access barriers.
African Open Data/Open Science Platform Shared infrastructure investment; shared good practice; capacity building; system development International Platform Forum Standards Coordination Programmes Flagship Government Co-Designed Data Priority setting Intensive Projects Funders Infrastructure Funding Roadmaps Incentives I I CODATA CODATA S S Capacity Building U U Training and Skills
Disciplinary communities can lead the way e.g. Elixir programme in life sciences/bio-informatics EMBL-EBI services Labs around the world send us … provide their data and we … tools to help researchers use it A collaborative Archive it enterprise Analyse, add value and integrate it Classify it Share it with other data providers
Regional Platforms for Open Science Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards. Asian Platform? African Platform? S. American Australian Platform? Platform
Open science “science as a public enterprise” Doing science Open data Open access openly Administrative Public Sector Research Collecting the Research data (held by Research data Data (e.g. data publications public (e.g. Met CERN, Doing (i.e. papers in authorities e.g. Office weather generated in research journals) prescription data) universities) data) Outputs Inputs (communication/dialogue – joint production of knowledge) Stakeholders Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists • Communication/dialogue must be audience-sensitive • Is it – with all stakeholder groups?
Mono/Multi Inter Transdisciplinary Open Knowledge Data / Publications Open Science Stakeholders Researchers Rigour Innovation Policy Solutions
Knowledge� Output� � Scien fic� inference� Open� Big� Data� Research� Analy cs� Data� � Ins tu onal� Ins tu onal� management� and� support� � management� &� support� � � Na onal� policies� Na onal� policies� � � &� e-infrastructure� &� e-infrastructure� � A national data-intensive system EXPLOITING� THE� DATA� REVOLUTION�
I I CODATA CODATA International Research Data Collaboration S S U U CODATA Policies & practice Frontiers of data I I science CODATA CODATA Capacity Building S S U U WDS • Data stewardship • Data standards RDA • Interoperability
Why openness & sharing? 1. Maintaining “self - correction” 2. Open knowledge is creative & productive “ If you have an apple and I have an apple and we exchange these apples, then you and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.” George Bernard Shaw 3. Open data enables semantic linking
Citizen Science • Openly collected science is already helping policy makers. • AshTag app allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra). Chalara spread: 1992-2012
Recommend
More recommend