provenance of astronomical data
play

Provenance of astronomical data The IVOA Provenance Working Group: - PowerPoint PPT Presentation

Provenance Data Model Provenance of astronomical data The IVOA Provenance Working Group: Catherine Boisson Franois Bonnarel Johan Bregeon Pierre Le Sidaner Julien Lefaucheur Mireille Louys Markus Nullmeier Ana Palacios Kristin Riebe


  1. Provenance Data Model Provenance of astronomical data The IVOA Provenance Working Group: Catherine Boisson François Bonnarel Johan Bregeon Pierre Le Sidaner Julien Lefaucheur Mireille Louys Markus Nullmeier Ana Palacios Kristin Riebe Michèle Sanguillon Mathieu Servillat

  2. What is provenance? ● I n g e n e r a l : t r a c k i n g t h e h i s t o r y , o r i g i n o f something: – a r t – food industry – information (data vis) on news webpage – scientific data! ● In astronomy: explain how data sets were produced: – Who created the data? – Which algorithm was used to produce it? – Which steps were undertaken to process the image? – Can I get access to the original, uncalibrated files from the observation? 2

  3. Goals ● F o r a g i v e n d a t a s e t , p r o v e n a n c e s h o u l d h e l p t o … – D i s c o v e r s t e p s o f p r o d u c t i o n w h i c h p r o c e s s i n g s t e p s h a v e b e e n d o n e a l r e a d y ? – Give attribution Who was involved in the project? Who can I ask about these data? – Aid in reprocessing But not necessarily: allow reprocessing on keypress – Aid in debugging Find possible error sources, e.g. check version of processing software, ambient conditions, telescope configuration, parameter settings, … – Allow to assess the quality of the data – Search in structured provenance metadata 3

  4. What is provenance? ● From W3C, Prov-Overview: Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. 4

  5. Example in astronomy data ● Where is the data coming from? release 5

  6. Example in astronomy data ● Where is the data coming from? release ● What were the input files for the pipeline? pipeline 6

  7. Example in astronomy data ● Where is the data coming from? release ● What were the input files for the pipeline? pipeline ● Have calibrated files been used for the pipeline? calibrated ● How were they calibrated? files calibration 7

  8. Example in astronomy data ● Where is the data coming from? release ● What were the input files for the pipeline? pipeline ● Have calibrated files been used for the pipeline? calibrated ● How were they calibrated? files ● Can I get the raw images? calibration ● Were there perfect conditions during the observation? raw images observation observation 8

  9. Example in astronomy data ● Where is the data coming from? release ● What were the input files for the pipeline? pipeline ● Have calibrated files been used for the pipeline? calibrated ● How were they calibrated? files time ● Can I get the raw images? calibration ● Were there perfect seeing conditions during the observation? raw images => Track data back in time observation observation 9

  10. Example in astronomy data ● identify data entities release pipeline calibrated files time calibration raw images observation observation 10

  11. Example in astronomy data ● identify data entities release ● identify processes (activities) pipeline calibrated files time calibration raw images observation observation 11

  12. Example in astronomy data ● identify data entities release ● identify processes (activities) ● provenance is defined by the pipeline relations between data and activities ● provenance is about history calibrated files => points backwards in time time calibration raw images observation observation 12

  13. Central provenance objects ● D a t a s e t s : f i t s f i l e s ( i m a g e s ) , v o t a b l e s , d a t a b a s e tables, spectra, log files, parameters, ... DatasetDM: Dataset = " a f i l e o r f i l e s w h i c h a r e considered to be a single deliverable" Provenance: Dataset Dataset = one or more data entities with a common origin ● Activities: observations; processing steps like bias subtraction, image stacking, continuum fit, object extraction; simulations, ... ● Persons/Organizations: data creator, publisher, contact, ... Activity ● . . . a l s o s e e P r o v D M o f W 3 C . . . 13

  14. Provenance DM from W3C http://www.w3.org/TR/prov-dm/, published 2013 ● 3 core classes: – Activity – Entity – Agent ● core relations: – used – wasGeneratedBy – wasDerivedFrom – wasAttributedTo – wasAssociatedWith ● + many more classes and relations 14

  15. Example in astronomy data ● input: release data that is “used” by an activity wasGeneratedBy ● output: pipeline data that “wasGeneratedBy” an activity used calibrated files time wasGeneratedBy calibration used raw images wasGeneratedBy observation observation 15

  16. W3C or more? ● Is W3C enough? – Many implementations already exist, also see: Southampton Provenance Suite, https://provenance.ecs.soton.ac.uk/ ● includes validator, converter, visualisation tools Prov Implementation report: http://www.w3.org/TR/prov-implementations/ ● ● In astronomy: – know most common processes => predefine activities – => could predefine input/output of activities (roles) e.g. image stacking needs n fits-images as input, one fits-image as output – => could predefine standard entities (fits-files, VO-tables, …) 16

  17. Job description Workflows UWS? W3C VOTable PARAM PROV UCDs Unique ObscoreDM identifiers DatasetDM ObscoreDM DatasetDM DataLink? 17

  18. Pollux use case Database of more than 8000 very high resolution synthetic spectra in the optical domain ( 3000 Å to 12000 Å ) . Scientjsts: Ana Palacios, Agnès Lèbre Sofuware engineer: Michèle Sanguillon 18

  19. RAVE survey use case ● Radial velocity experiment ● multi-fibre spectroscopic survey of the southern hemisphere, 2003 - 2013 ● different calibration, reduction and analysis steps ● radial velocities + other stellar properties for ~ half million stars ● use provenance to track history of datasets, where data is coming from @ Kristin Riebe 19

  20. Acquisition/ Simulations CTA use case DL0 Calibration ● Next Cherenkov Very High energy observatory (per telescope) ● O p e n observatory DL1 ● must ensure that data Reconstruction (shower) processing is traceable DL2 and reproducible. Analysis ● inform user on (science preparation) processing steps DL3 performed Data pro duct ● link to generation progenitor DL4 @ Mathieu Servillat 20

  21. Working group activities http://wiki.ivoa.net/twiki/bin/view/IVOA/ObservationProvenanceDataModel ● IVOA Sesto splinter meeting, June 2015 ● Provenance Day in Paris, April 2016 ● IVOA Cape Town splinter meeting and DM session, May 2016 ● Provenance Day in Heidelberg, June 2016 ● Next in Paris, July or August 2016 Program of the last discussions: ● Data Model updates ● Structuring a database from the data model ● Storing/ serializing the Activity/Entity Descriptions (VOTable, json, FITS frame...) ● Access to the Provenance database (TAP, specific access layer) ● Structure and content of the IVOA working draft ● Roadmap for Trieste (IVOA Interop in October) and beyond 21

  22. What's your use case? ● Would you benefit from a standardized solution to expose your Provenance metadata? => contact us! ● What Provenance metadata do you need to expose? ● Does it fit in the Provenance Data Model? ● How would you store Provenance metadata? – Files (FITS header? FIT frame? VOTable? XML? JSON?) – Database ● How would you query the Provenance metadata? – Search for progenitors – Detailed search on execution context (nodes, resources), dates – Detailed search on activity/entity types 22

Recommend


More recommend