XI CPAN DAYS 21-23 October 2019 Plans of the WLCG for Run3 and HL-LHC era Jose F. Salt Cairols Instituto de Física Corpuscular 23/10/2019 XI CPAN Days 1
Overview 1.-The WLCG Global Collaboration 2.-Run 3 and HL/LHC Plan 3.- The Spanish LHC Computing GRID community (LCG-ES) 4.- Usage of additional compute resources 5.- Heterogeneity and Federation 6.- Software Optimization 7.- Spanish Strategy in Computing 8.- Summary and Outlook 23/10/2019 XI CPAN Days 2
1.- The WLCG Global Collaboration CERN Computing Center The Worldwide LHC Computing GRID. The equipment purchased by Distributed High- throughput the centers (T0&T1 &T2) computing infrastructure to store, give service to the whole collaboration (as a detector) process and analyze data produced by the LHC experiments. In numbers: - 167 sites, 42 countries, 63 MoU’s - ~ 800 Kcores - ~ 500 PB disk storage - ~ 750 PB tape storage Contributes to the scientific - Optical private nertwork (LHCOPN) and technological progress and overlay over NREN s (LHCONE) of the center which with 10/100 Gbps links participates in WLCG: scientific infrastructure, WLCG is a worldwide and non-stop expert perssonel, etc infrastructure 23/10/2019 XI CPAN Days 3
2.-Run 3 and HL/LHC Plan BEST GUESS Run 3: - 2021 is a vey low data test run , resources-> same as 2018 for pp - full Heavy Ions run is likely -> will need some level of additional resources - 2022 is a full year with a resources level of 1’5 times 2018 - 2023-24 Moderate (20%) growth rates 23/10/2019 XI CPAN Days 4
Resource Evolution From I. Bird’s talk at 7th Scientific Computing Forum, 4/10/210 SCF, 4th Oct 2019, CERN 23/10/2019 XI CPAN Days 5
- 4-5 times gap between ‘flat budget– 20% annual increase’ and resource requirements for HL-LHC - Intense R&D to reduce data and resource requirements 23/10/2019 XI CPAN Days 6
- Cost evolution is not well established - Assumed price reduction - 10% CPU, 15% disk, 20% tape 23/10/2019 XI CPAN Days 7
3.- The Spanish LHC Computing GRID Community (LCG-ES) Clouds: ● CERN, CA, DE, ES, FR, IT, ND, NL, RU, TW, UK, US The PIC Cloud (ES) LCG-ES ● Tier1: PIC Barcelona ● Provides 5% of Tier1 data processing of CERN's LHC detectors ATLAS, CMS and LHCb ● Tier2s : ○ CMS Spanish Tier2 Total accounting of ○ CIEMAT Madrid Resources: ○ CPU (HS06) =182K IFCA Santander Disk (PB) = 14.5 ○ ATLAS Spanish Tier2 Tape (PB) = 19.6 IFIC Valencia IFAE Barcelona UAM Madrid) ○ LHCb Spanish Tier2 ○ USC Santiago de Compostela - Integrated in the WCLG project (World Wide LHC Computing GRID) and ○ UB (Universitat de Barcelona= following the ATLAS/CMS/LHCb computing models ○ LIP Lisbon, Portugal - We represent the 4% of the total Tier-2s resources and the 5% of the ○ UTFSM Santiago, Chile ○ Tier-1s ones UNLP La Paz, Argentina (inactive) 23/10/2019 XI CPAN Days 8
Spanish Cloud performance in Run II More than 22 million finished jobs More than 196 million events proccessed On average, 5000 slots occupied by running jobs daily More than 46 million files produced 23/10/2019 XI CPAN Days 9
4.- Usage of additional compute resources • Supercomputers for LHC – Growing funding in supercomputing (HPC) infrastructures • Roadmap towards Exaflop machines • Countries/Funding agencies pushing HEP community to use these resources – Euro HPC Beur funding 2 aprox 200 PFlps machines by 2021, 2 EXaFlops by 2024 – Data intensive computing with HPC facilities is not easy. • Limited/ no network connectivity in complete nodes • Limted storage for cahcing I/O event data files – The ‘Call for resource allocAtion” in not suitable • We need a guaranteed share of resources • agreement with BSC – LHC applications are NOT really suited for HPC • No large parallelization ( no use of fast node interconnects • No eseential use of acceleratos (GPU, FPGA) – Substantial integration work to make HPC work for HTC 23/10/2019 XI CPAN Days 10
• Use of BSC (Barcelona Supercomputing Center) resources: – CMS: – Recommendation of using the computing resources of BSC coming from Funding Agency • CIEMAT/PIC: Regarding the use of BSC resources by CMS, we still cannot use them due to the lack of – ATLAS: : effort devoted to addapt the queues at BSC to run network connectivity from the nodes, which is simulation production jobs . In 2018, start to call for computing necessary in CMS to integrate them into the WMS. time (IFIC, IFAE) and several requests have been granted There is a project with the HTCondor team to Computing hours have been requested in the Spanish • address that limitation. Supercomputing Network (RES) and Europe (PRACE), being granted for the IFAE 2.8 M hours and IFIC 1.2M hours in the Mare Nostrum • IFCA Adaptation of ALTAMIRA (node of RES in (BSC) and 2M hours in Lusitania (Cenit) Cantabria) within the GRID Infrastructure (input de installed the ATLAS software and the necessary tools for the • execution of simulation work of the ATLAS detector in these HPCs, so Ibán) in this way we have used resources outside the Spanish Tiers centers. – The grid infrastructure of the T2 has been We have simulated more than 60 million event redesigned so that when the T2 is saturated, - IFIC/IFAE-PIC led check the availability of free HPC resources and ATLAS simulation forward them there. At the moment pilot when profiting of examples are operating using altamira in opportunistic HPC "parasitic" mode, but it can be easily changed. resources - More than 60 millions of events - LHCb: simulated at the spanish level the LHCb groups have - More than 90% of not started with these activities yet jobs ended successfully 23/10/2019 XI CPAN Days 11
-In December 2018: meeting at BSC to explore the possibility of having a dedicated share for LHC computing needs Take the example of another special ‘project ‘agreement with BSC – February-April: to prepare an LHC Computing-BSC agreement draft View of Mare Nostrum – Discussion of technical and policy questions – July 2019: Sergi Girona (BSC) will prepare the definitive document agreement to be approved at the November BSC ‘Junta de Gobierno’ (BSC Executive Board) - February-March 2020 could be opened for users (hopefully) Meeting at BSC in December 2018 23/10/2019 XI CPAN Days 12
Cloud Computing Resources: Experiments have run large scale tests using Cloud compute nodes Google Cloud, Amazon AWS, Microsoft Azure -> (aprox) 50K cores concurrently for few days =>Commercial cloud is • not profitable for either (a) storage or (b) computing, • But it can be useful to test new architectures without investing ⇒ Currentely essentially no commercial cloud use for LHC computing ⇒ Potential future opportunties: European Open Science Cloud (EOSC) A EU model for use of cloud computing in the private and public sector 23/10/2019 XI CPAN Days 13
European Science cluster of Astronomy & Particle Physics ESFRI Research Infrastructure 23/10/2019 XI CPAN Days 14
5.- Heterogeneity and resources federation 23/10/2019 XI CPAN Days 15
23/10/2019 XI CPAN Days 16
Federation is the key Federation in data storage: • – The idea is localize bulk data in a cloud service (data lake): minimize replication, assure availability – Serve data to remote ( or local) compute grid, cloud, HPC, ??? – Simple caching is all that is needed at compute site (or none, if fast network) – Federated data at national, regional, global scales 23/10/2019 XI CPAN Days 17
• Federation of computing resources – Main issue: reducing the hardware cost – reducing the operational cost – Co-location of data and processors is not guaranteed- sites can be ‘diskless’ – Heterogenous computing PIC is contributing actively in the first group with studies in Data Access and Popularity for a CMS at PIC and CIEMA measuring the effect on the applications to real data in a remote way 23/10/2019 XI CPAN Days 18
6.- Software Optimization Solution could come from the software • – 50 millions of lines of code mainly C++ Improvement in CPU consumption by – “a project / experiment cannot afford to have bad using faster phyisics algortithms in software” (Graeme’s talk in Granada) FASTSIM/FASTRECO Initiatives: • – HEP Software Foundation – IRIS-HEP: Institute for Research & Innovation in Software for HEP, 25M$, 5 years – Proposal a EU Scientific Software Institute – In Spain: COMCHA forum • New hardware architectures – High level parallelism , new instructions sets,… – Support in software frameworks for heterogenous hardware New/faster algorithms • – Machine Learning/Deep Learning – Rewrite physics algorithms for new hardware 23/10/2019 XI CPAN Days 19
7.- Spanish Strategy in Computing • Common theme in many contributions to the EPPS Granada is the desire to collaborate with and benefit from LHC R&D work • Synergies and ‘not to reinvent the wheel’ • Situation in different projects: Nuclear Physics Coll: ESCAPE address FAIR data DUNE and CTA will management leverage the WLCG for its The LHC Computing Model Computing Infrastructure has been adapated to the needs and the size of AGATA collaboration Computing @ Future Accelerators Meeting May 2019: Addressing the outstanding questions CLIC and Future Circular Cilliders 23/10/2019 XI CPAN Days 20
Recommend
More recommend