VO Sandpit, November 2009 Are you sitting comfortably? VO Sandpit, - PowerPoint PPT Presentation

Datasets: from creation to publication or “ A tale of two datasets ” Sarah Callaghan* [sarah.callaghan@stfc.ac.uk] @sorcha_ni LCPD13 Workshop 26 September 2013, Valetta, Malta * and a lot of others, including, but not limited to: the Chilbolton Group, the NERC data citation and publication project team, the PREPARDE project team and the CEDA team VO Sandpit, November 2009

Are you sitting comfortably? VO Sandpit, November 2009

Creating data: a radio propagation dataset The problem: rain and cloud mess up your satellite radio signal. How can we fix this? Italsat F1: Owned and operated by Italian Space Agency (ASI). Launched January 1991, ended operational life January 2001. VO Sandpit, November 2009

The receive cabin at Sparsholt in Hampshire Inside the receive cabin – the instruments my data came from VO Sandpit, November 2009

Creating/processing data One day ’ s worth of raw data from one of the receivers My job was to take this... ...turn it into this.... VO Sandpit, November 2009

Analysing data … a process which involved 4 major steps, 4 different computer programmes, and 16 intermediate files for each day of measurements. Each month of preproccessed data represented somewhere between a couple of days and a week's worth of effort. It was a job where attention to detail was important, and you really had to know what you were looking at from a scientific perspective. ...with the final result being this. VO Sandpit, November 2009

Example documentation Note the software filenames in the documentation. I still have the IDL files on disk somewhere, but I ’ d be very surprised if they ’ re still compatible with the current version of IDL VO Sandpit, November 2009

I started work on this project in 1999. In 2006 (five years after the dataset was finished) we finally got a journal publication out of it: Ventouras, S., S. A. Callaghan, and C. L. Wrench (2006), Long-term statistics of tropospheric attenuation from the Ka/U band ITALSAT satellite experiment in the United Kingdom, Radio Sci. , 41 , RS2007, doi:10.1029/2005RS003252. It's been cited twice, both times by me. VO Sandpit, November 2009

Publications – grey literature VO Sandpit, November 2009

Publications – journal paper Where ’ s the data? VO Sandpit, November 2009

Preserving data (the wrong way!) Part of the Italsat data archive – on CDs in a shelf in my office VO Sandpit, November 2009

What the processed data set looks like on disk What the raw data files looked like. (I do have some Word documents somewhere which describe what all this is … ) VO Sandpit, November 2009

What it all came down to: Composite image from Flickr user bnilsen and Matt Stempeck (NOI), shared under Creative Commons license And I wasn ’ t even preserving my data properly! VO Sandpit, November 2009

Good news: the data is all on the BADC now VO Sandpit, November 2009

Data creation and management is hard work. But not everyone understands. "Piled Higher and Deeper" by Jorge Cham www.phdcomics.com VO Sandpit, November 2009

Why bother linking the data to the publication? Surely the important stuff is in the journal paper? If you can ’ t see/use the data, then you can ’ t test the conclusions or reproduce the results! It ’ s not science! VO Sandpit, November 2009

The Data Publication (1) ¡Data ¡ Publica(ons ¡ contained ¡and ¡ Pyramid with ¡ ¡ explained ¡within ¡ data ¡ the ¡ar(cle ¡ (2) ¡Further ¡data ¡ explana(ons ¡in ¡ any ¡kind ¡of ¡ Processed ¡Data ¡and ¡ ¡ supplementary ¡ (3) ¡Data ¡ Data ¡ files ¡to ¡ar(cles ¡ referenced ¡from ¡ Representa(ons ¡ the ¡ar(cle ¡and ¡ held ¡in ¡data ¡ centers ¡and ¡ (4) ¡Data ¡ repositories ¡ Data ¡Collec(ons ¡and ¡ publica(ons, ¡ describing ¡ Structured ¡Databases ¡ available ¡datasets ¡ (5) ¡Data ¡in ¡ drawers ¡and ¡on ¡ disks ¡at ¡the ¡ ins(tute ¡ Raw ¡Data ¡and ¡Data ¡Sets ¡ 17

The ¡Pyramid ’ s ¡likely ¡short ¡term ¡ reality: ¡ (1) ¡Top ¡of ¡the ¡ Pubs ¡ pyramid ¡is ¡stable ¡ but ¡small ¡ (2) ¡Risk ¡that ¡ supplements ¡to ¡ ar(cles ¡turn ¡into ¡ Supps ¡ Data ¡Dumping ¡ (3) ¡Too ¡many ¡ places ¡ disciplines ¡lack ¡a ¡ Data ¡Archives ¡ community ¡ endorsed ¡data ¡ archive ¡ (4) ¡Es(mates ¡are ¡ Data ¡on ¡Disks ¡ ¡ that ¡at ¡least ¡75 ¡% ¡ of ¡research ¡data ¡ is ¡never ¡made ¡ and ¡in ¡Drawers ¡ openly ¡avaiable ¡ 18 18

The ¡Ideal ¡Pyramid ¡ (1) ¡More ¡integra(on ¡ of ¡text ¡and ¡data, ¡ Data ¡ ¡ viewers ¡and ¡ seamless ¡links ¡to ¡ In ¡ ¡ interac(ve ¡datasets ¡ (2) ¡Only ¡if ¡data ¡ cannot ¡be ¡ Publica(ons ¡ integrated ¡in ¡ (3) ¡Seamless ¡links ¡(bi-‑ ar(cle, ¡and ¡only ¡ direc(onal) ¡between ¡ relevant ¡extra ¡ Ar(cle ¡Supps ¡ publica(ons ¡and ¡ explana(ons ¡ data, ¡interac(ve ¡ viewers ¡within ¡the ¡ (4) ¡More ¡Data ¡ ar(cles ¡ Journals ¡that ¡ Data ¡Archives ¡ describe ¡ datasets, ¡data ¡ mgt ¡plans ¡and ¡ data ¡methods ¡ Data ¡on ¡Disks ¡and ¡in ¡Drawers ¡ 19 19

Compare and contrast 2 datasets Italsat dataset Publish Publish … Analyse Process journal Collect data dataset on data data paper BADC GBS dataset Publish Archive Publish Collect Analyse … Process dataset data in journal data data data in a data BADC paper journal VO Sandpit, November 2009

What is a data journal? The traditional online journal model Data 1) Author prepares the paper using word processing software. A Journal (Any online journal system) 3) Reviewer reviews the Word processing software PDF file against the 2) Author submits PDF PDF PDF PDF PDF with journal template the paper as a PDF/ journal ’ s acceptance Word file. criteria. Overlay journal model for publishing data 2a) Author submits 1) Author prepares the the data paper to data paper using word Data Journal the journal. 3) Reviewer reviews processing software and (Geoscience Data Journal) the data paper and the dataset using 2b) Author submits the dataset it points appropriate tools. html html html html the dataset to a to against the repository. journals acceptance criteria. Word processing software with journal template Data Data Data Data BODC BADC VO Sandpit, November 2009

What is a data article? A data article describes a dataset, giving details of its collection, processing, software, file formats, etc., without the requirement of novel analyses or ground breaking conclusions. • the when, how and why data was collected and what the data-product is. VO Sandpit, November 2009

Why bother publishing the dataset in a data journal? Why not just publish a normal journal paper citing the data? Data Journals: • Peer-review the data • Publish negative results • Make it quicker to publish the data as they don ’ t require analysis or novelty – the dataset is published “ as-is ” • Provide attribution and credit for the data collectors who might not be involved with the analysis • Make it easier to find datasets, understand them and be sure of their quality and provenance. VO Sandpit, November 2009

Live Data Paper in Geoscience Data Journal ! Dataset citation is first thing in the paper (after abstract) and is also included in reference list (to take advantage of citation count systems) DOI: 10.1002/gdj3.2 VO Sandpit, November 2009

Linking between data and publications = Citing Data • We already have a working method for linking between publications which is: • commonly used • understood by the research community • used to create metrics to show how much of an impact something has (citation counts) • applied to digital objects (digital versions of journal articles) • We can extend citation to other things like: • data • code • multimedia And the best bit is, researchers don ’ t need to learn a new method of linking – they cite like they normally would! http://www.naa.gov.au/records-management/ VO Sandpit, November 2009 capability-development/keep-the-knowledge/ index.aspx

Out of Cite, Out of Mind: Report of the CODATA Task Group on Data Citation The report was published by the CODATA Data Science Journal on 13 September 2013 https://www.jstage.jst.go.jp/article/dsj/12/0/12_OSOM13-043/_article VO Sandpit, November 2009

First Principles for Data Citation 1. Status of Data: Data citations should be accorded the same importance in the scholarly record as the citation of other objects. 2. Attribution: A citation to data should facilitate giving scholarly credit and legal attribution to all parties responsible for those data. 3. Persistence: Citations should refer to objects that persist. 4. Access: Citations should facilitate access to data by humans and by machines. 5. Discovery: Citations should support the discovery of data and their documentation. VO Sandpit, November 2009

VO Sandpit, November 2009 Are you sitting comfortably? VO Sandpit, - PowerPoint PPT Presentation

Datasets: from creation to publication or A tale of two datasets Sarah Callaghan* [sarah.callaghan@stfc.ac.uk] @sorcha_ni LCPD13 Workshop 26 September 2013, Valetta, Malta * and a lot of others, including, but not limited to: the

Pet Sitting 101 Pet Sitting What is Pet Sitting? In 1997 Pet Sitters International (PSI)

We understand more about sitting Sitting incorrectly is unhealthy. So is sitting on the wrong

Offshore Wind Meet Oil & Gas, Defence, Space Sandpit Session Dr Nee-Joo Teh Energy

Sitting Volleyball in Great Britain Gordon Neale OBE GB Sitting Volleyball Challenge Sitting

Sitting Netball Sitting Netball 25th November 1 londonsport.org 2016 #MostActiveCity

Is Sitting the New Smoking Were we made for sitting? Discussion Points What does a normal

Chapter 4 Section 1 By: Mrs. Sergent To Get You Thinking Could a polar bear live comfortably in

Comfortably cosmopolitan? How patterns of social cohesion vary with crime and fear Anine

Comfortably cosmopolitan? How patterns of social cohesion vary with crime and fear Anine

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

The leader The leader in Active in Active Sit Sitting ting Kore Wobble Chairs KORE is

Reading the small print Early years literacy work of public libraries in West Dunbartonshire

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

POWER NETWORKING By Donna M Gray, CRM Are you sitting next to someone you came with?????

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

VISTA: A System for Interactive Code Improvement Wankang Zhao, David Whalley, Robert van Engelen,

Interactive Proofs in Higher-Order Concurrent Separation Logic Robbert Krebbers 1 Amin Timany 2

Increasing Interactivity A Refresh Teaching event 12.04.16 Presented by: Prof. Dr. Florian

Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist DataCamp Interactive Data

Sponsorship Brochure Combining healthcare management and clinical leadership 950+

Project work: Date and time of exam Work in your team to complete next milestone Exam

MMI 2: Mobile Human- Computer Interaction Visualization and Interaction Techniques for Small

Vaccinate with Confidence for COVID-19 Vaccines Amanda Cohn, MD October 30, 2020 For more

Sambuz

Useful Links

Newsletter

Mail Us

VO Sandpit, November 2009 Are you sitting comfortably? VO Sandpit, - PowerPoint PPT Presentation

Datasets: from creation to publication or A tale of two datasets Sarah Callaghan* [sarah.callaghan@stfc.ac.uk] @sorcha_ni LCPD13 Workshop 26 September 2013, Valetta, Malta * and a lot of others, including, but not limited to: the

Pet Sitting 101 Pet Sitting What is Pet Sitting? In 1997 Pet Sitters International (PSI)

We understand more about sitting Sitting incorrectly is unhealthy. So is sitting on the wrong

Offshore Wind Meet Oil &amp; Gas, Defence, Space Sandpit Session Dr Nee-Joo Teh Energy

Sitting Volleyball in Great Britain Gordon Neale OBE GB Sitting Volleyball Challenge Sitting

Sitting Netball Sitting Netball 25th November 1 londonsport.org 2016 #MostActiveCity

Is Sitting the New Smoking Were we made for sitting? Discussion Points What does a normal

Chapter 4 Section 1 By: Mrs. Sergent To Get You Thinking Could a polar bear live comfortably in

Comfortably cosmopolitan? How patterns of social cohesion vary with crime and fear Anine

Comfortably cosmopolitan? How patterns of social cohesion vary with crime and fear Anine

Merging Merb into Rails Wednesday, November 18, 2009 Me Wednesday, November 18, 2009 Yehuda

The leader The leader in Active in Active Sit Sitting ting Kore Wobble Chairs KORE is

Reading the small print Early years literacy work of public libraries in West Dunbartonshire

Questions? Questions? Questions? Questions? Questions? Questions? Questions? Questions?

POWER NETWORKING By Donna M Gray, CRM Are you sitting next to someone you came with?????

SURVEY AREA WWW-YES-2009-France Water Survey Results 3 June 2009 WWW-YES-2009-France water

2009 Half Year Results Presentation 6 months to 30 June 2009 13 August 2009 2009 Half Year

VISTA: A System for Interactive Code Improvement Wankang Zhao, David Whalley, Robert van Engelen,

Interactive Proofs in Higher-Order Concurrent Separation Logic Robbert Krebbers 1 Amin Timany 2

Increasing Interactivity A Refresh Teaching event 12.04.16 Presented by: Prof. Dr. Florian

Plot and Mapped Attributes (Part 1) Omayma Said Data Scientist DataCamp Interactive Data

Sponsorship Brochure Combining healthcare management and clinical leadership 950+

Project work: Date and time of exam Work in your team to complete next milestone Exam

MMI 2: Mobile Human- Computer Interaction Visualization and Interaction Techniques for Small

Vaccinate with Confidence for COVID-19 Vaccines Amanda Cohn, MD October 30, 2020 For more

Sambuz

Useful Links

Newsletter

Mail Us

Offshore Wind Meet Oil & Gas, Defence, Space Sandpit Session Dr Nee-Joo Teh Energy