APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo collaboration HESS collaboration, F. Acero and H. Gast Volker Beckmann François Arago Centre / APC CNRS / IN2P3
Outline � Status � Computing � Data storage � Grid vs. Cloud � Challenges � CNRS / IN2P3 initiative Bergh ö fer et al. 2015 arXiv:1512.00988
Data vs. computing � Very different experiments � Different data types: Events, time-series, images � Shared computing resources � High-Throughput Computing (HTC): large computing centres, like CC-IN2P3, Grid (EGI) � Local computing clusters � Minor importance (French view): HPC, GPUs, (academic) cloud systems HESS Fermi Cherenkov Telescope Array (CTA) V. Beckmann
Space based experiments + Rather low data rates (~ 10s Gbyte / day) - All-sky, all-mission analyses (e.g. Fermi/LAT, INTEGRAL IBIS / SPI, Swift/BAT) - Complex analysis + Low storage needs (Tbyte range) + Accessibility and usability: centralised archives, common data formats, common tools Fermi INTEGRAL Swift V. Beckmann
Ground based experiments - High data rates (~ Tbytes / day) + Event lists (HESS, Auger, Antares) + Time series (adv. Virgo / LIGO) - Computing intensive - Remote observation sites - Accessibility, common tools (e.g. ctools/gammalib) Antares; Credits: F. Montanet Auger; ASPERA/G.Toma/A.Saftoiu V. Beckmann
Data rate evolution � Space based data limited by bandwidth � 100 Mbps max (X-band), but � INTEGRAL (2002): 1.2 Gbyte/day � Hubble space telescope (1990): 15 Gbyte/day � Gaia (2013), Euclid (2021): ~50 Gbyte/day � ~10-20 Tbyte raw data per year � Ground based: fast increase through fast read-out systems, multiple charge-coupled devices (CCDs) � 1990s: 1 Mbyte / CCD frame � LSST (2022): 3 GByte / exposure (15s) � ~10 Pbyte raw data per year
Daily data rates (Europe) V. Beckmann
European computing needs Requirement in units of CERN-LHC Tier-0 centre V. Beckmann
CPU requirements (Europe) kHS06 (~100 CPU) 2016 2020 Event-like data 149 380 Auger, HESS, CTA, KM3NeT, Fermi, … Signal-like (VIRGO, 780 1290 LIGO) Image-like (LSST, 117 280 Euclid, …) Total in kHS06 1047 1951 Total in LHC Tier-0 1.6 3.0 (2012) V. Beckmann
Disk storage requirements Pbytes 2016 2020 Event-like data 7 39 Auger, HESS, CTA, KM3NeT, Fermi, … Signal-like (VIRGO, 5.1 11 LIGO) Image-like (LSST, 2.4 21 Euclid, …) Total in pByte 16 72 Total in LHC Tier-0 0.52 2.4 (2012) Trend similar for tape storage V. Beckmann
Analysis, simulation, modelling � Air shower experiments require extensive simulations � Cosmic Ray Simulations for Kascade (CORSIKA) � Large CPU + storage requirements � GPUs � Gravitational waves: small data sets � large computing need � HPC, GPUs � Increase in computing needs by a factor of 2 until end of this decade V. Beckmann
Data storage � High-energy cosmic ray experiments: � Raw : reconstructed : simulation; 60:10:30 � Ground based: dominated by raw data � Space based: dominated by derived data � Increase by factor 5 until end of this decade! � Commercial cloud systems? V. Beckmann
Grid vs. Cloud � EGI runs 1 million jobs per day � LHC: Largely batch, MC simulations, event reconstruction � HESS, CTA (but also others; future e.g. KM3NeT) others CTA ALICE LHCb CMS ATLAS Usage of CPU time France Grille per project V. Beckmann
Grid vs. Cloud � EGI runs 1 million jobs per day � LHC: Largely batch, MC simulations, event reconstruction � HESS, CTA (but also others; future e.g. KM3NeT) Cloud solutions � Software as a Service (SaaS) – run online tasks (Hera at HEASARC, VO) � Platform as a Service (PaaS) – complete s/w platform (e.g. Euclid CODEEN) � Data as a Service (DaaS) – use data remotely (e.g. CALDB, iRODS) � Hardware as a Service (HaaS) – you provide OS + s/w + data (e.g. StratusLab) � Infrastructure as a Service (IaaS) – Grid on demand V. Beckmann
Grid vs. Cloud GRID Cloud Middleware gLite, UNICORE, ARC, … SlipStream, Hadoop Resources EGI Local academic clouds Availability ++ - Input / output + - Ease of use - + Flexibility -- ++ V. Beckmann
Approach � Try to reduce amount of data! � On-site (ground-based; LOFAR, CTA, SKA, …), on-board satellite processing (Gaia, INTEGRAL) where possible � Then: centralise if possible (INTEGRAL, Gaia, LOFAR, SKA) � use as few as possible (LSST, Euclid) � GPUs: fast, but not good on i/o � Fourier transformation (e.g. LOFAR), template fitting, challenging to use, training needed � GRID: infrastructure, middleware, relatively heavy to use � CTA, KM3NeT � HPC / HTC with Hadoop, Hive (again, training needed) � Cloud: virtualisation, flexibility, lower performance � project development, production phase smaller projects V. Beckmann InSiDe Jülich
Open Issues � Need for experts computing + projects � Grid evolution � Academic cloud systems in production (reliability) � Federated cloud systems � Commercial cloud systems (peak processing / replacement ?) � Standards? � Preservation – VM archive? � Data storage evolution (x5 until 2020) V. Beckmann
Computing branch at CNRS / IN2P3 Objectives: � Support research in applied computing at IN2P3 labs � Identify main projects and colleagues interested � Animate and coordinate initiatives � Support education (training, master, PhD, HDR) � Discuss and shape the future of computing in high-energy (astroparticle) physics We need your expertise! V. Beckmann
Open Issues � Need for experts and training on computing + projects � Grid evolution (also EGI) � Academic cloud systems in production (reliability) � Federated cloud systems � Commercial cloud systems (peak processing / replacement ?) � Standards? � Preservation – VM archive? � Data storage evolution (x5 until 2020) V. Beckmann
Additional slides matériel supplémentaire noch mehr Folien materiale aggiuntivo aanvullend materiaal
Computing requirements V. Beckmann
Data storage (disk) V. Beckmann
Challenges � Astroparticle physics goes BigData: CTA, KM3NET, SKA, … � Pbyte scale data with need of Tflop processing � Solutions depend on the science requirements � Space or ground, remote or central, real-time processing or not… � Advantage: community used to work together, file format standards, coding standards (C++, python) � Development platforms
Astro is 25% of CC-IN2P3 Credit: Pierre Macchi (CC-IN2P3) Computing of 7 PNHE projects amounts to 5% GENCI computing; see previous presentation by F. Casse
One file format � Used for: images, spectra, light curves, tables, data cubes … � Used in: space-based and ground-based astrophysics, across all disciplines Standards for key words, header, coordinate systems, … � Fits i/o libraries � tools to visualize and manipulate fits files (ftools, ds9, pyfits, …) � http://fits.gsfc.nasa.gov/ � � BigData: Volume, Velocity, Variety, Veracity and Value � Future? HDF5 ? V. Beckmann
CTA Computing: data volume Raw-data rate • CTA South: 5.4 GB/s • CTA North: 3.2 GB/s 1314 hours of observation per year Raw-data volume • ~40 PB/year • ~4 PB/year after reduction Total volume • ~27 PB/year including L. Arrabito et al. (2015) calibrations, reduced data and all copies
Big-Data tomorrow: Euclid How to achieve science goals: Images (optical / infrared) spectra External (ground-based) images Merging of the data Photometric redshifts (distances) spectra Shape High-level science measurements products Euclid RedBook (2012)
Big-Data tomorrow: Euclid Euclid RedBook (2012)
Recommend
More recommend