lhcb vac vcycle status
play

LHCb, Vac, Vcycle status Andrew McNab University of Manchester - PowerPoint PPT Presentation

LHCb, Vac, Vcycle status Andrew McNab University of Manchester LHCb status Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS Cloud sites using Vcycle Manchester, Lancaster, Oxford; Imperial and CERN Both Vac and Vcycle


  1. LHCb, Vac, Vcycle status Andrew McNab University of Manchester

  2. LHCb status Running production jobs in VMs at 3 UK Vac sites and on 2 IaaS ● Cloud sites using Vcycle Manchester, Lancaster, Oxford; Imperial and CERN – Both Vac and Vcycle are advertised as GridPP products – Vac has been a supported LHCb platform since last year – Vcycle now adopted by LHCb too – (LHCb hasn't tried running the HLT as a Cloud service, since it has ● been a production DIRAC site for several years) LHCb's VM architecture is done by us, using the Pilot VM model also ● used to make GridPP DIRAC and ATLAS VMs DIRAC Pilot 2.0 with improved monitoring, VM support, and ● modularity will also be joint CERN and Manchester work More VM slots at sites for LHCb would be welcome! ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 2

  3. LHCb jobs in VMs CLOUD jobs in VMs managed by Vcycle ● on OpenStack CLOUD.CERN.ch is ~500 VM slots ● VAC jobs in VMs managed by Vac of ● course LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 3

  4. Vac On each physical node, Vac VM factory daemon runs to create and ● apply contextualization to transient VMs Multiple VM flavours (“VM types”) are supported, ~1 per experiment ● Each site or Vac “space” is composed of autonomous factory nodes ● All using the same /etc/vac.conf – Factories communicate with each other via UDP ● Type of VM to start in a free slot based on what else is running and target shares – So no headnode central point of failure; robust against losing individual nodes – Aims for reliability and robustness through simplicity ● VM instantiation failure rate << 1/1000 – much better than typical IaaS sites – Running LHCb production jobs since last year; and ATLAS production ● jobs at Manchester (40K+ jobs), Lancaster, Oxford since early April Documentation, RPMs, links to GitHub at www.gridpp.ac.uk/vac ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 4

  5. Vcycle on OpenStack etc Use Vac approach to run VMs on IaaS cloud platforms ● Python daemon manages lifecycle of VMs in tenancy ● – (Re)creates VMs using boot image and user_data Supports multiple tenancies and multiple vmtypes per tenancy ● Doesn't need to know about task queues etc ● – VMs are black boxes: created, run, shutdown, then deleted – Vcycle can be run by the experiment, site, or a third-party In production for LHCb at CERN since early May (~500 VMs) ● Running production ATLAS and LHCb VMs in the gridpp-vcycle ● tenancy at Imperial College Being evaluated for ATLAS at CERN ● Sources in Vac GitHub area ● LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 5

  6. Immediate plans LHCb ● Begin work on Pilot 2.0 – New monitoring framework – Multiple concurrent single-processor payloads in one pilot job or pilot VM – Improve TimeLeft handling, for better elastic MC jobs and multiple payload jobs – Vac ● CloudInit support – Increase robustness of UDP protocol if high (50%?) packet loss – Increase scalability from present level of hundreds of VMs – Generic condor worker VM based on CernVM condor support – Vcycle ● Man page, Admin Guide, RPMs – (Re-)add EC2 support for non-OpenStack tenancies – LHCb, Vac, Vcycle - Andrew.McNab@cern.ch - GridPP technical meeting 6

Recommend


More recommend