app computing
play

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. - PowerPoint PPT Presentation

APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo collaboration HESS collaboration, F. Acero and H. Gast Volker Beckmann Franois Arago Centre / APC CNRS / IN2P3 Outline Status Computing


  1. APP Computing Auger; ASPERA/G.Toma/A.Saftoiu KM3NeT, ASPERA / G. Toma / A. Saftoiu Virgo collaboration HESS collaboration, F. Acero and H. Gast Volker Beckmann François Arago Centre / APC CNRS / IN2P3

  2. Outline � Status � Computing � Data storage � Grid vs. Cloud � Challenges � CNRS / IN2P3 initiative Bergh ö fer et al. 2015 arXiv:1512.00988

  3. Data vs. computing � Very different experiments � Different data types: Events, time-series, images � Shared computing resources � High-Throughput Computing (HTC): large computing centres, like CC-IN2P3, Grid (EGI) � Local computing clusters � Minor importance (French view): HPC, GPUs, (academic) cloud systems HESS Fermi Cherenkov Telescope Array (CTA) V. Beckmann

  4. Space based experiments + Rather low data rates (~ 10s Gbyte / day) - All-sky, all-mission analyses (e.g. Fermi/LAT, INTEGRAL IBIS / SPI, Swift/BAT) - Complex analysis + Low storage needs (Tbyte range) + Accessibility and usability: centralised archives, common data formats, common tools Fermi INTEGRAL Swift V. Beckmann

  5. Ground based experiments - High data rates (~ Tbytes / day) + Event lists (HESS, Auger, Antares) + Time series (adv. Virgo / LIGO) - Computing intensive - Remote observation sites - Accessibility, common tools (e.g. ctools/gammalib) Antares; Credits: F. Montanet Auger; ASPERA/G.Toma/A.Saftoiu V. Beckmann

  6. Data rate evolution � Space based data limited by bandwidth � 100 Mbps max (X-band), but � INTEGRAL (2002): 1.2 Gbyte/day � Hubble space telescope (1990): 15 Gbyte/day � Gaia (2013), Euclid (2021): ~50 Gbyte/day � ~10-20 Tbyte raw data per year � Ground based: fast increase through fast read-out systems, multiple charge-coupled devices (CCDs) � 1990s: 1 Mbyte / CCD frame � LSST (2022): 3 GByte / exposure (15s) � ~10 Pbyte raw data per year

  7. Daily data rates (Europe) V. Beckmann

  8. European computing needs Requirement in units of CERN-LHC Tier-0 centre V. Beckmann

  9. CPU requirements (Europe) kHS06 (~100 CPU) 2016 2020 Event-like data 149 380 Auger, HESS, CTA, KM3NeT, Fermi, … Signal-like (VIRGO, 780 1290 LIGO) Image-like (LSST, 117 280 Euclid, …) Total in kHS06 1047 1951 Total in LHC Tier-0 1.6 3.0 (2012) V. Beckmann

  10. Disk storage requirements Pbytes 2016 2020 Event-like data 7 39 Auger, HESS, CTA, KM3NeT, Fermi, … Signal-like (VIRGO, 5.1 11 LIGO) Image-like (LSST, 2.4 21 Euclid, …) Total in pByte 16 72 Total in LHC Tier-0 0.52 2.4 (2012) Trend similar for tape storage V. Beckmann

  11. Analysis, simulation, modelling � Air shower experiments require extensive simulations � Cosmic Ray Simulations for Kascade (CORSIKA) � Large CPU + storage requirements � GPUs � Gravitational waves: small data sets � large computing need � HPC, GPUs � Increase in computing needs by a factor of 2 until end of this decade V. Beckmann

  12. Data storage � High-energy cosmic ray experiments: � Raw : reconstructed : simulation; 60:10:30 � Ground based: dominated by raw data � Space based: dominated by derived data � Increase by factor 5 until end of this decade! � Commercial cloud systems? V. Beckmann

  13. Grid vs. Cloud � EGI runs 1 million jobs per day � LHC: Largely batch, MC simulations, event reconstruction � HESS, CTA (but also others; future e.g. KM3NeT) others CTA ALICE LHCb CMS ATLAS Usage of CPU time France Grille per project V. Beckmann

  14. Grid vs. Cloud � EGI runs 1 million jobs per day � LHC: Largely batch, MC simulations, event reconstruction � HESS, CTA (but also others; future e.g. KM3NeT) Cloud solutions � Software as a Service (SaaS) – run online tasks (Hera at HEASARC, VO) � Platform as a Service (PaaS) – complete s/w platform (e.g. Euclid CODEEN) � Data as a Service (DaaS) – use data remotely (e.g. CALDB, iRODS) � Hardware as a Service (HaaS) – you provide OS + s/w + data (e.g. StratusLab) � Infrastructure as a Service (IaaS) – Grid on demand V. Beckmann

  15. Grid vs. Cloud GRID Cloud Middleware gLite, UNICORE, ARC, … SlipStream, Hadoop Resources EGI Local academic clouds Availability ++ - Input / output + - Ease of use - + Flexibility -- ++ V. Beckmann

  16. Approach � Try to reduce amount of data! � On-site (ground-based; LOFAR, CTA, SKA, …), on-board satellite processing (Gaia, INTEGRAL) where possible � Then: centralise if possible (INTEGRAL, Gaia, LOFAR, SKA) � use as few as possible (LSST, Euclid) � GPUs: fast, but not good on i/o � Fourier transformation (e.g. LOFAR), template fitting, challenging to use, training needed � GRID: infrastructure, middleware, relatively heavy to use � CTA, KM3NeT � HPC / HTC with Hadoop, Hive (again, training needed) � Cloud: virtualisation, flexibility, lower performance � project development, production phase smaller projects V. Beckmann InSiDe Jülich

  17. Open Issues � Need for experts computing + projects � Grid evolution � Academic cloud systems in production (reliability) � Federated cloud systems � Commercial cloud systems (peak processing / replacement ?) � Standards? � Preservation – VM archive? � Data storage evolution (x5 until 2020) V. Beckmann

  18. Computing branch at CNRS / IN2P3 Objectives: � Support research in applied computing at IN2P3 labs � Identify main projects and colleagues interested � Animate and coordinate initiatives � Support education (training, master, PhD, HDR) � Discuss and shape the future of computing in high-energy (astroparticle) physics We need your expertise! V. Beckmann

  19. Open Issues � Need for experts and training on computing + projects � Grid evolution (also EGI) � Academic cloud systems in production (reliability) � Federated cloud systems � Commercial cloud systems (peak processing / replacement ?) � Standards? � Preservation – VM archive? � Data storage evolution (x5 until 2020) V. Beckmann

  20. Additional slides 
 matériel supplémentaire 
 noch mehr Folien 
 materiale aggiuntivo 
 aanvullend materiaal

  21. Computing requirements V. Beckmann

  22. Data storage (disk) V. Beckmann

  23. Challenges � Astroparticle physics goes BigData: CTA, KM3NET, SKA, … � Pbyte scale data with need of Tflop processing � Solutions depend on the science requirements � Space or ground, remote or central, real-time processing or not… � Advantage: community used to work together, file format standards, coding standards (C++, python) � Development platforms

  24. Astro is 25% of CC-IN2P3 Credit: Pierre Macchi (CC-IN2P3) Computing of 7 PNHE projects amounts to 5% GENCI computing; see previous presentation by F. Casse

  25. One file format � Used for: images, spectra, light curves, tables, data cubes … � Used in: space-based and ground-based astrophysics, across all disciplines Standards for key words, header, coordinate systems, … � Fits i/o libraries � tools to visualize and manipulate fits files (ftools, ds9, pyfits, …) � http://fits.gsfc.nasa.gov/ � � BigData: Volume, Velocity, Variety, Veracity and Value � Future? HDF5 ? V. Beckmann

  26. CTA Computing: data volume Raw-data rate • CTA South: 5.4 GB/s • CTA North: 3.2 GB/s 1314 hours of observation per year Raw-data volume • ~40 PB/year • ~4 PB/year after reduction Total volume • ~27 PB/year including L. Arrabito et al. (2015) calibrations, reduced data and all copies

  27. Big-Data tomorrow: Euclid How to achieve science goals: Images (optical / infrared) spectra External (ground-based) images Merging of the data Photometric redshifts (distances) spectra Shape High-level science measurements products Euclid RedBook (2012)

  28. Big-Data tomorrow: Euclid Euclid RedBook (2012)

Recommend


More recommend