Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - PowerPoint PPT Presentation

Data at the Leibniz-Institute for Astrophysics Kristin Riebe

AIP – Leibniz-Institute for Astrophysics Potsdam • Research areas: – cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) – extragalactic astrophysics (galactic archeology, galaxies and quasars, cosmology) • Development of Research Technology and Infrastructure – Robotic telescopes, (3D) spectroscopy – Supercomputing and E-Science • Participation in many projects – e.g. RAVE, ROSAT, XMM-Newton, LOFAR, MUSE, ... 2

Example data types at AIP • Observations: – RAVE • Radial velocity measurements + spectra – SDSS • Mirror of DR7, catalog server – „minor data sets“: • Plate archive (historical plates) • CALIFA (spectra of galaxies) • Cepheids (collection of data for time series), ... • Simulation data: – Magnetohydrodynamics – Cosmological simulations: particle data, dark matter halo catalogues, halo merger history, ... 3

Behind the scenes • Supercomputers: Leibniz, Babel, for in-house simulations, data processing • Almagest: Graywulf cluster for archiving, exchanging data, hosting databases, publishing data, 700 TB disk space • Virtual research environment: – Erebos: ~ 250 TB disk space – Used by CLUES collaboration to exchange and process data • Web servers for publishing smaller data sets 4

Data center task: Extract – Transform – Load Extract Load Webserver Server From different Publish the data sources Transform Checking, Corrections, Additions; bring into (standard) format 5

Example: MultiDark Database • Collaboration with Spanish MultiDark project • Publish data of cosmological simulations in a simulation database • Have similar success like MillenniumDB! :-) • http://www.multidark.org • 2 simulations uploaded (12+6 TB) • > 1 million queries in 2 years, ~ 1500 per day, 4 TB downloaded • ~ 140 registered users 6

Example workflow: MultiDark Database • Extract: – Cosmologists produce data, copy them to a server at AIP (VRE) • Transform: – We check data and reading routines, data curation (C/Fortran/Perl/Python) • Load: – Ingest data into database (SQL, bulk copy) • Check and test: – Check the data for completeness, consistency (SQL) – Create Peano-Hilbert keys, indexes (C#, Spatial 3D library (T. Budavari, G. Lemson)) • Publish: – Using simpledb (Gerard Lemson, Millennium DB, jsp ) – Write/update documentation; update admin tables of the database – Inform users 7

Transform: Data curation • Check completeness of data sets • Create homogeneous data sets, bring into useful (standard) formats • Add identifiers, grid indexes etc. for faster queries & for representing relations in the database • Cross-link data with other catalogues => usually we applied tailor-made solutions, tuned to each individual data set, custom reading routines required => now things are improving ... 8

DBIngestor and libhilbert • DBIngestor library + AsciiIngest – Adrian Partl, https://github.com/adrpar/DBIngestor, …/AsciiIngest – Apply converters (unit conversions, adding identifiers for db indexing, spatial grid indexes) – Apply asserters (nan, inf etc.) – => transform and load in one go – Easy to write own converters & add own reading routines for binary data • C-library libhilbert – For creating indexes of space-filling Peano-Hilbert curve in 20 dimensions 9

Data publication • Many possibilities, very often individual solutions for each project • Now: new webapp Daiquiri , http://escience.aip.de/daiquiri/ • Developed by Jochen Klar und Adrian Partl • Web application for publishing data • Modular, highly customizable • Using PHP, Zend-framework • Modern interface using bootstrap, jQuery • Authentication, Query Interface • Wordpress integration • One code base to serve most needs, open source, (easily) extendable 10

Daiquiri examples • MultiDark2 • Califa • 4MOST workshop • Plate Archive • Jubilee, Curie simulation database in Madrid http://escience.aip.de/daiquiri/ 11

Screenshot

VO compliance • Currently working on including VO protocols with Daiquiri – Download data as VOTables (MySQL-VOTable-Dump, see github) – TAP protocol for accessing data – UWS for job queues (MySQL query queue) • Problems: – No public PHP libraries for IVOA protocols available (only in java) – But community rather needs PHP or Python implementations 15

Concluding Remarks • Comon tasks for each data publication: extracting, transforming, uploading the data • Different tool for each data set? – Should rather use only a few, generalized tools, reusable, easier to maintain – Takes a lot of time to develop – => Collect tools from data centers? Combine efforts? • Would like to have more implementations/libraries of VO protocols, in different languages 16

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - PowerPoint PPT Presentation

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for Astrophysics Potsdam Research areas: cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) extragalactic astrophysics

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

Time-domain Astrophysics in the Era of Big Data V. Ashley Villar Center for Astrophysics |

Cloudy With a Chance of Misconceptions Dominik Wermke* Nicolas Huaman Christian Stransky

DOI datacenters should provide Harry Enke Leibniz-Institute for Astrophysics Potsdam (AIP)

KOS evolution in Linked Data Joachim Neubert ZBW Leibniz Information Centre for Economics,

A Fast Database for Large Observational or Simulation Datasets Adrian M. Partl Leibniz-Institut

The bursty cosmic dawn Outline 1 Introduction Umberto Maio Motivations Leibniz Institute for

Future of High Energy Astrophysics Future of High Energy Astrophysics Nicholas White NASA GSFC

High time-domain Astrophysics with SALT High time-domain Astrophysics with SALT Stephen Potter

Nuclear Astrophysics at SJTU Lie-Wen Chen ( ) Department of Physics and Astronomy,

Enzo-E/Cello astrophysics and cosmology Adaptive mesh refinement astrophysics using Charm++ James

Dark Matter from cosmology/astrophysics Jo Dunkley Oxford Astrophysics Summary Cosmological

5. High Time Resolution Astrophysics (HTRA) PhD Course, University of Padua Page 1 High Energy

Gravitational-wave transient detection and multi-messenger astrophysics astrophysics Ray Frey,

Verbal VP-modifiers in Samoan verb serialization Jens Hopperdietzel Leibniz-ZAS Berlin

cience About S cience 2.0 and Open S Dr. Guido Scherp Coordinator Leibniz Research Alliance

WHY WHAT WHEN 1 Curriculum Modifications Series Module 1: Setting Up the Environment WHAT IS

Preventing errors before they happen: Lightweight verification via pluggable type-checking

Dynamic Shape and Data Structure Analysis in Java Presented by Sokhom Pheng (Supervised by

The Use of JML in Embedded Real-Time Systems Joseph Kiniry Technical University of Denmark

Clustered Logging with mod_log_spread Theo Schlossnagle <jesus@omniti.com> Theo

Conversation with your data platform Nirav Merchant nirav@email.arizona.edu Dir. Data Science

Global burden of cancer between 1990 and 2010 Preliminary results from a systematic analysis and

Environmental Impacts, Threshold Levels and Health Effects Lecture 9: Noise Part 3 (29.04.2020)

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP - PowerPoint PPT Presentation

Data at the Leibniz-Institute for Astrophysics Kristin Riebe AIP Leibniz-Institute for Astrophysics Potsdam Research areas: cosmic magnetic fields (solar/stellar physics, magnetohydrodynamics) extragalactic astrophysics

Gamma- Gamma -Ray Particle Ray Particle Astrophysics: Astrophysics: Astrophysics:

Time-domain Astrophysics in the Era of Big Data V. Ashley Villar Center for Astrophysics |

Cloudy With a Chance of Misconceptions Dominik Wermke* Nicolas Huaman Christian Stransky

DOI datacenters should provide Harry Enke Leibniz-Institute for Astrophysics Potsdam (AIP)

KOS evolution in Linked Data Joachim Neubert ZBW Leibniz Information Centre for Economics,

A Fast Database for Large Observational or Simulation Datasets Adrian M. Partl Leibniz-Institut

The bursty cosmic dawn Outline 1 Introduction Umberto Maio Motivations Leibniz Institute for

Future of High Energy Astrophysics Future of High Energy Astrophysics Nicholas White NASA GSFC

High time-domain Astrophysics with SALT High time-domain Astrophysics with SALT Stephen Potter

Nuclear Astrophysics at SJTU Lie-Wen Chen ( ) Department of Physics and Astronomy,

Enzo-E/Cello astrophysics and cosmology Adaptive mesh refinement astrophysics using Charm++ James

Dark Matter from cosmology/astrophysics Jo Dunkley Oxford Astrophysics Summary Cosmological

5. High Time Resolution Astrophysics (HTRA) PhD Course, University of Padua Page 1 High Energy

Gravitational-wave transient detection and multi-messenger astrophysics astrophysics Ray Frey,

Verbal VP-modifiers in Samoan verb serialization Jens Hopperdietzel Leibniz-ZAS Berlin

cience About S cience 2.0 and Open S Dr. Guido Scherp Coordinator Leibniz Research Alliance

WHY WHAT WHEN 1 Curriculum Modifications Series Module 1: Setting Up the Environment WHAT IS

Preventing errors before they happen: Lightweight verification via pluggable type-checking

Dynamic Shape and Data Structure Analysis in Java Presented by Sokhom Pheng (Supervised by

The Use of JML in Embedded Real-Time Systems Joseph Kiniry Technical University of Denmark

Clustered Logging with mod_log_spread Theo Schlossnagle &lt;jesus@omniti.com&gt; Theo

Conversation with your data platform Nirav Merchant nirav@email.arizona.edu Dir. Data Science

Global burden of cancer between 1990 and 2010 Preliminary results from a systematic analysis and

Environmental Impacts, Threshold Levels and Health Effects Lecture 9: Noise Part 3 (29.04.2020)

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Clustered Logging with mod_log_spread Theo Schlossnagle <jesus@omniti.com> Theo