Sustaining the Data Ecosystem – There is no free lunch but you still need to eat … CCDSC 2016 Dr . Francine Berman Chair , Research Data Alliance/US Hamilton Distinguished Professor , RPI Fr Fran Berman 1
Why does Sustainability Matter? Data drives discovery and • innovation Su Sustainable da data ecosystem • necessary to support Public access to research data – Use and re-use of data – Reproducibility of results – Data management plans – Data stewardship and • preservation fundamental: “ Ho Homeless” data ceases to exist Fr Fran Berman 2
Social and Technical Approaches Both Needed for Sustainability Su Sust stainable development: "development that meets the needs of the present without compromising the ability of future generations to meet their own needs.“ ECONOMICS / ECOLOGY / Our Common Future, U.N. Brundtland Commission Funding Infrastructure Key components Ke • POLITICS / CULTURE / Stakeholder Community – Ecological sustainability support behavior – Cultural sustainability – Economic sustainability – Political sustainability Fr Fran Berman 3
Infrastructure -- Making data Ec Ecology / In ECONOMICS / ECOLOGY / Funding Infrastructure available isn’t good enough POLITICS / CULTURE / Stakeholder Community support behavior Fr Fran Berman 4
Infrastructure -- Making data Ec Ecology / In ECONOMICS / ECOLOGY / Funding Infrastructure available isn’t good enough POLITICS / CULTURE / Stakeholder Community support behavior • Infrastructure needed to support data-driven research and innovation. – Data is not an asset if you don’t know what it means. – Data is not useful if you can’t find it. – Data needs to be in the right form for analysis. – Data needs to be preserved for results to be reproducible. Fr Fran Berman 5
Technical and Social Infrastructure Needed to Support Data-Driven Research Who is How do we increase What will happen How accurate is the agricultural at risk in an Standard Model of productivity? for asthma? earthquake? Physics? Data Interoperability Curation Practice Discovery Tools Frameworks and Policy Digital Object Sustainable Common Identifiers Economics Metadata Standards Data Domain and Institutional Data Access and Repositories Analytics Algorithms Distribution Policy Data Sharing Data Citation Auditing, Certification and Policy Standards Reporting Practice Fr Fran Berman 6 Fran Berman Fr
Accelerating the building and coordinating better/more/useful data infrastructure – Re Research Data Alliance (RD RDA) RDA Interest Groups – identify/explore data RD • infrastructure needed to enable data-driven research Domain Repositories Interest Group – Chemistry Research Data Interest Group – Research Data Alliance (RD Re RDA) Legal Interoperability Interest Group – rd-al rd allian ance.org: Health Data Interest Group – Global community-driven <You initiate> Interest Group – organization whose mission is to build and deploy so social and technical infrast structure RDA Working Groups – build and deploy RD • that enables data sharing. infrastructure that addresses specific problems 4300 + from 110 110 countries, Membership: 4300 Me Dynamic Data Citation Working Group – all sectors, and a broad spectrum of Wheat Data Interoperability Working Group etc. – domains: <You initiate> Working Group – • Broad community spanning “ da data consumers” to “d ” including co “data providers” Adopters – utilize RDA infrastructure to domain scientists, data scientists, data Ad • professionals, information scientists, improve local environment for data sharing and librarians, computer scientists, data-driven research. technologists, policy makers, educators, etc. Fr Fran Berman 7
RDA focus: (70+) RDA Working Groups and Interest Groups fostering better Curation, Management, Stewardship and Use So Social/organizational solution aimed So Social/organizational solution aimed at Social Policy, at at dat ata a provider data consumer da Good RDA/CODATA legal interoperability of • RDA/CODATA Summer Schools in Data • Practice, data Interest Group Science and Cloud Computing in the Community Developing World Interest Group Domain Repositories Interest Group • Standards, Dynamic Data Citation Working Group • National Data Services Interest Group • Education, Data Rescue Interest Group • RDA/CODATA Materials Data, • SOLUTION Awareness, Infrastructure and Interoperability Ethics and Social Aspects of Data Interest • etc. … Interest Group Group SO Technical solution aimed at data Te Technical solution aimed at data Te pr provide der co consumer Tools, Data Type Registries Working Group Wheat Data Interoperability Working • • frameworks, Group Preservation e-infrastructure Interest • models, Group Digital practices in History and • Ethnography Interest Group registries, Libraries for Research Data Interest • Technical portals, etc. Group Marine Data Harmonization Interest Group • Chemistry Research Data Interest Group • BioSharing Registry Working Group • Data Provider BE BENEFICIARY Data Consumer Fr Fran Berman 8 TAB Clustering slides adapted from Beth Plale
Ec Economics / Fu Funding – Who should pay the data ECONOMICS / ECOLOGY / Funding Infrastructure bill and what do we need to support? POLITICS / CULTURE / Stakeholder Community support behavior Fr Fran Berman 9
Ec Economics / Fu Funding – Who should pay the data ECONOMICS / ECOLOGY / Funding Infrastructure bill and what do we need to support? POLITICS / CULTURE / Stakeholder Community support behavior Mo More mgt., stewardsh st ship re require red Data infrastructure costs Da Mo More increase with usage, in Lo Long-liv lived curation cu data da require re red stewardship and access requirements, “L “Locally Ma Manageable” perceived value Da Data Ac Access co control, Bi Big data broad br d Greate Gr ter costs ts at t th the ac access extrem ex emes es (including “big” Coupled Co da data data) … da se services Data Center Costs include • Security and failover systems • People (expertise, help, infrastructure management, • Maintenance and upkeep development) • Software tools and packages • Training, documentation • Utilities (power, cooling) • Monitoring, auditing • Space • Reporting costs • Networking • Costs of compliance with regulation, etc. Fr Fran Berman 10
Why are Infrastructure Investments such a hard sell? Quantifying opportunity cost a challenge • Hard to “market” compared to more • urgent/newsworthy/short-term competing priorities Business model must be sustainable and • address infrastructure refresh and evolution Archival Storage Systems Supercomputers Metrics of High reliability; Minimal data loss and High Performance; good ranking on the Success damage Top500 list; application impact Smooth migration for data key: Growth in capability/capacity key: Next Preservation collections must migrate to Compatibility of systems not required Generation new media without loss of data or although there should be application Systems disruption to users transition paths No gaps. Funding must be available for Serial “one time” funding for each new HPC Funding Model continuous support of data collections resource possible Fr Fran Berman 11
There’s no free lunch but you still have to eat How can we pay for/sustain research data and infrastructure? Ac Academic Sector Create sustainable university library and domain repository stewardship options ?? Private Sector Pr Not govt. Govt. supported supported Facilitate private sector stewardship of Public access version at Public Sector Pu public access research http://www.cs.rpi.edu/~bermaf/ data as a public good Clarify public sector Indivi In viduals stewardship commitments: articulate what data will / Charge low-barrier-to- won’t be supported access fees for data / Advertise / Subscribe Evolve research culture to adapt what works in the private sector Fr Fran Berman 12
ECONOMICS / ECOLOGY / Funding Infrastructure Community behavior – How can we Culture / Co Cu POLITICS / CULTURE / Stakeholder Community support behavior minimize risk for valued open data? Fr Fran Berman 13
ECONOMICS / ECOLOGY / Funding Infrastructure Community behavior – How can we Cu Culture / Co POLITICS / CULTURE / Stakeholder Community support behavior minimize risk for valued open data? How much public research data is at risk? Ho • U.S. National Institute of Health estimates for 2011 U. • Pu PubMed Central publications: Sustainable stewardship 12% of publication data sets deposited in recognized – At Risk Gap repositories, 88% of the data sets were invisible Estimated approximately 200, 200,000 000-235 235,00 000 0 invisible data – se sets generated NIH work published in 2011 87% of the invisible data sets are new, 13% reflect – Sustainable data re-use More than 50% of the datasets were derived from live – human or animal subjects (Valued) Sponsored Community practice key to sustaining the data Co • Research Data ecosystem ec em Information from PLOS ONE http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0132735; Fr Fran Berman 14 Graphic from http://www.colorado.edu/ibs/cupc/stewardship_gap/
Recommend
More recommend