virtualized hpc infrastructure of the novosibirsk
play

Virtualized HPC infrastructure of the Novosibirsk Scientific Center - PowerPoint PPT Presentation

Virtualized HPC infrastructure of the Novosibirsk Scientific Center for HEP data analysis D. Maximov on behalf of NSC/SCN consortium International Workshop on Antiproton Physics and Technology at FAIR Budker INP, Novosibirsk, Russia 19


  1. Virtualized HPC infrastructure of the Novosibirsk Scientific Center for HEP data analysis D. Maximov on behalf of NSC/SCN consortium International Workshop on Antiproton Physics and Technology at FAIR Budker INP, Novosibirsk, Russia 19 November 2015 D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 1 / 24

  2. Contents History 1 Supercomputing Network of the Novosibirsk Scientific Center 2 Dynamical Virtualized Computing Cluster 3 Experiments integrated with GCF 4 Results and conclusion 5 D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 2 / 24

  3. BINP/GCF — History started in 2004 initial goal: participate in LCG project currently: a gateway to the NSC computing resources for BINP experimental groups. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 3 / 24

  4. Supercomputing Network of the Novosibirsk Scientific Center Isolated 10 Gbps network connecting main computing resources of Akademgorodok Organizations involved: Institute of Computational Technologies (ICT SB RAS) Novosibirsk State University (NSU) Institute of Computational Mathematics and Mathematical Geophysics (ICM&MG SB RAS) Budker Institute of Nuclear Physics (BINP SB RAS) Expansion perspectives: other NSC institutes, Tomsk State University D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 4 / 24

  5. Supercomputing Network of the NSC Remote cites Budker INP plasma Panda others CERN KEK SND BINP network core and broadband CMD-3 storage providers KEDR SB RAS network core BINP/ TSU scTau GCF SKIF- NSC/SCN stark network Cyberia core LHCb ATLAS ICT storage CMS system SSCC Belle2 NUSC computing BaBar SB RAS cluster NSC Supercomputing Network D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 5 / 24

  6. Novosibirsk State University (NSU) Supercomputer Center (NUSC) http://nsu.ru 29 TFlops (2432 physical CPU cores) + 8 TFlops (GPUs), 108 TB of storage http://nusc.ru D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 6 / 24

  7. Siberian Supercomputer Center (SSCC) at the Institute of Computational Mathematics & Mathematical Geophysics (ICM&MG) SSCC was created in 2001 in order to provide computing resources for SB RAS research organizations and the external users (including the ones from industry) 30 + 85 TFlops of combined computing performance since 2011Q4 (CPU + GPU) NKS-30T 120 TB of storage 2x 70 sq.m of raised floor space Up to 140kVA power input & 120kW of heat removal capacity (combined) http://www2.sscc.ru D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 7 / 24

  8. CPU/Cores: 128/512, 5 TFlops Peak D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 8 / 24

  9. Key properties of the HEP computing environment Each experiment has unique computing environment ◮ wide variety OS and standard packages versions ◮ a lot of specifically developed software Software can be easily parallelized by data Mostly non interactive programs, executed via some batch system. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 9 / 24

  10. How to glue HEP and HPC together? We want: on the HEP side: keep the specific computing environment and user’s experience on the supercomputer side: be like a normal SC user The answer is: run HEP tasks inside virtual machines, run VMs inside supercomputer’s batch system jobs. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 10 / 24

  11. Batch System Integration Mechanisms NSU NUSC PBS Pro Batch System NSC/SCN Orchestration Orchestration Orchestration Services Services Services SGE Batch SGE Batch SGE Batch System System System BINP/GCF KEDR Detector SND Detector ATLAS Data User Group User Group Analysis Group BINP D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 11 / 24

  12. Batch System Integration Mechanisms NSU NUSC PBS Pro Batch System STAGE 1 Job submission and automated VM group deployment sequence NSC/SCN Orchestration Orchestration Orchestration Services Services Services SGE Batch SGE Batch SGE Batch System System System BINP/GCF KEDR Detector SND Detector ATLAS Data User Group User Group Analysis Group BINP D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 11 / 24

  13. Batch System Integration Mechanisms NSU NUSC PBS Pro Batch System Set of computing Group of VMs STAGE 1 nodes with KVM, of particular Job submission IPoIB & HT support type created on and automated enabled on VM group demand demand deployment sequence NSC/SCN Orchestration Orchestration Orchestration Services Services Services SGE Batch SGE Batch SGE Batch System System System BINP/GCF KEDR Detector SND Detector ATLAS Data User Group User Group Analysis Group BINP D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 11 / 24

  14. Batch System Integration Mechanisms NSU NUSC PBS Pro Batch System Set of computing Group of VMs STAGE 1 nodes with KVM, of particular Job submission IPoIB & HT support type created on and automated enabled on VM group demand demand deployment STAGE 2 STAGE 3 sequence Handling late Automated VM group stages of VM NSC/SCN discovery and analysis job deployment, submission, VM performs a configuring network self shutdown when storage layout no suitable pending jobs left Orchestration Orchestration Orchestration Services Services Services SGE Batch SGE Batch SGE Batch System System System BINP/GCF KEDR Detector SND Detector ATLAS Data User Group User Group Analysis Group BINP D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 11 / 24

  15. Batch System Integration Mechanisms NSU STAGE 4 Computing nodes NUSC are returned to PBS Pro Batch System their original state Set of computing Group of VMs STAGE 1 nodes with KVM, of particular Job submission type created on IPoIB & HT support and automated VM group demand enabled on deployment demand STAGE 2 STAGE 3 sequence Handling late Automated VM group stages of VM NSC/SCN discovery and analysis job deployment, submission, VM performs a configuring network self shutdown when storage layout no suitable pending jobs left Orchestration Orchestration Orchestration Services Services Services SGE Batch SGE Batch SGE Batch System System System BINP/GCF KEDR Detector SND Detector ATLAS Data User Group User Group Analysis Group BINP D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 11 / 24

  16. Virtualized computing infrastructure In this way we have dynamical virtualized computing cluster (DVCC). Physicists use computing resources in a conventional way. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 12 / 24

  17. Experiments at Budker INP integrated with GCF High Energy Physics Local ◮ KEDR ◮ CMD-3 ◮ SND ◮ Super- c τ (planned) External ◮ ATLAS ◮ Belle2 ◮ BaBar Other activities Plasma & accelerator physics, engineering calculation. . . D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 13 / 24

  18. Virtualized infrastructure: what we’ve learnt so far HEP data analysis could be successfully performed using the virtualized HPC infrastructure of the Novosibirsk Scientific Center Long term VM stability obtained (up to a month at NUSC, up to year at BINP) Most of the underlying implementation details are completely hidden from the users. No changes were required for experimental group’s software and/or its execution environment. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 14 / 24

  19. Virtualized infrastructure: what we’ve learnt so far (2) Main benefits: Ability to use free capacity of supercomputer sites in order to run much more simple (from HPC point of view) single threaded HEP software. Ability to freeze software of an experimental group and its execution environment and (exactly!) reproduce it when needed. This scheme could be easily extended for other experimental groups and computing centers. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 15 / 24

  20. Milestones Initial deployment of GCF at BINP in 2004. DVCC middleware development started in 2010. KEDR runs production jobs at NUSC since 2011Q1. Other BINP groups joined the activity in 2012Q1. (Started by ATLAS) SSCC connected in August 2012, in production since 2013Q1. Belle2 experiment joined in 2014Q2. Works through the DIRAC system. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 16 / 24

  21. Usage of the local cluster Average Utilization (last year) — 49%. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 17 / 24

  22. Participation in Belle2 Produced — about 3% CPU-hour for simulation. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 18 / 24

  23. Conclusion Results NSC/SCN resources are successfully used for HEP data processing in BINP making analysis cycles 10-100 times faster. Our experience could be applied for other computing centers and HEP experiments. D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 19 / 24

  24. Plans Install new computing and storage hardware Extend the number of user groups Further development of the DVCC middleware. Deploy special storage for BaBar experiment. Access to others computing centers Present our resources to LCG D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 20 / 24

  25. Thank you for attention! D. Maximov (BINP) HEP data analysis (FAIR-2015) 19.11.2015 21 / 24

Recommend


More recommend