computing construction project
play

Computing Construction Project DUNE UK/FNAL planning meeting Pete - PowerPoint PPT Presentation

iris DUNE-UK Computing Construction Project DUNE UK/FNAL planning meeting Pete Clarke Edinburgh Edinburgh 8/9 Oct 2018 1 1 IET, Oct 09 Dummies guide to how computing support works in UK 2 Physical Computing Resources (HS06, PBytes) q


  1. iris DUNE-UK Computing Construction Project DUNE UK/FNAL planning meeting Pete Clarke Edinburgh Edinburgh 8/9 Oct 2018 1 1 IET, Oct 09

  2. Dummies guide to how computing support works in UK 2

  3. Physical Computing Resources (HS06, PBytes) q First port of call is GridPP GridPP is the UK Project which provides computing for HEP • Current GridPP incarnation is GridPP5 until March 2020 • GridPP5 is funded primarily for the LHC (90%) • 10% for “other HEP experiments known at the time” i.e. not really for DUNE • But GridPP always tries to help anyway, and generally succeeds • The GridPP6 incarnation will (hopefully) be April 2020 - 2024 • The GridPP6 “ask” will include DUNE requirements • But STFC is still mainly operating under “flat-cash” resources • So “ask” != “get” • Ø See talk by Dave Briton (GridPP Project Leader) 3

  4. Physical Computing Resources (HS06-years, PBytes) q Next port of call is a thing called IRIS IRIS is too complicated to explain fully here – pub or coffee bar later • IRIS is coordinating body for all computing across all of STFC • • Particle Physics, Astronomy, • • Astroparticle, Nuclear, • • Photon source (Diamond), • Neutrons(IRIS), • Laser(CLF) IRIS also has capital resources until 2022 (from ministry – BEIS) • DUNE is in scope of IRIS (fair share along with all others) • Ø DUNE will benefit from IRIS resources (until 2022) provided via GridPP 4

  5. Physical Computing Resources (HS06, PBytes) q Next ports of call are more nebulous Effectively, IRIS just made a further bid for additional hardware resource • Probability of success <=50% and may take scale of years to happen • Meanwhile, all Research Councils in the UK have just amalgamated into UKRI • UKRI is attempting to “harmonise” eInfrastructure • This brings an opening to ask Ministry (BEIS) for hardware resource at some point • • Probability of success in short term <=30%, long term ~ 80% • It’s a hard path - but we have to walk it 5

  6. Software Infrastructure: Common • This means distributed computing middleware common to all experiments - WLCG software components: Compute Elements, Storage Elements • Tape service • CVMFS • Databases • Information service • Security, VOMS • Accounting, monitoring • Network • ….. • • GridPP is funded for the people to deploy and operate WLCG middleware à GridPP is the only UK effort for WLCG middleware à GridPP can not really support things which are single experiment specific • It would be “difficult” to get resource in the UK to support a fundamentally different fabric 6

  7. Software Infrastructure Vertical Layer Activity – 1 Activity – 2 Activity - 3 Scientists using analysis Scientists using Scientists using frameworks and adding analysis frameworks analysis frameworks analysis specific code and adding analysis and adding analysis specific code specific code Activity specific Activity specific Activity specific Research software reconstruction, reconstruction, reconstruction, Engineering simulation and analysis simulation and simulation and framework software analysis framework analysis framework software software Engineering, Activity specific Activity specific Activity specific Re-engineering, production computing, production computing, production computing, data management, data management, data management, Development, trigger and online trigger and online trigger and online computing. computing. computing. Porting, Moving code down Common distributed computing software infrastructure, operations, stack support and development. Global services such as security response, service registries, monitoring and accounting. Software verification and rollout. Physical infrastructure and operations staff GridPP is here 7 7

  8. Software Infrastructure DUNE specific • Software Infrastructure (sInfrastructure) support – DUNE specific DUNE Vertical Layer Activity – 1 Activity – 2 Activity - 3 Scientists using analysis Scientists using Scientists using frameworks and adding analysis frameworks analysis frameworks analysis specific code and adding analysis and adding analysis specific code specific code Activity specific Activity specific Activity specific reconstruction, reconstruction, reconstruction, simulation and analysis simulation and simulation and Research software framework software analysis framework analysis framework Engineering software software Activity specific effort Activity specific effort Activity specific effort for production for production for production Engineering, computing, data computing, data computing, data Re-engineering, management, trigger management, trigger management, trigger and online computing. and online computing. and online computing. Development, Common distributed computing software infrastructure, operations, Porting, support and development. Global services such as security response, service registries, monitoring and accounting. Software verification and Moving code down rollout. stack Physical infrastructure and operations staff 8

  9. Software Infrastructure: DUNE specific • This means the vertical elements of the diagram • This includes – DUNE user analysis software – DUNE reconstruction and simulation software – DUNE distributed production computing software • The funding route (now) in UK is through the PPRP = Projects Peer Review Panel • We have just submitted the DUNE UK Construction Proposal to PPRP: 2019-2026 à WP1: Physics and Computing à WP2: DAQ à WP3: APAs • WP1 à WP1.1, 1.2 1.4 are Physics à WP 1.3 Production Computing Construction 9

  10. Software Infrastructure: DUNE specific q WP 1.3 - Computing Construction (15 FTE-years) WP 1.3.1 Data movement and management Ø • Starts with RUCIO, SAM integration as necessary – but flexible as project evolved WP 1.3.2 Offline Production Management and Monitoring (2.5 FTE-y) Ø • Production management • Workload management • Monitoring WP 1.3.3 Integration with Cloud Platforms (1.75 FTE-y) Ø • Integration of HEPCloud, IRIs, ..other cloud resources into DUNE production WP 1.3.4 AI Application to Offline Production (3.25 FTE-y) Ø • Applying AI to the monitoring of DUNE production work – is it working, data quality. • Applying AI to data selection (if we can) WP 1.3.5 Computing Production System for SURF (5.25 FTE-y) Ø • SURF data centre 10

  11. Software Infrastructure: People q WP 1.3 : UK DUNE project people • Edinburgh: Perry, Nebot, Clarke, Muheim , (Washbrook), (Gambetta) • Manchester: McNab • RAL: Nandakumar, Brew, Wilson q DUNE UK compute group : DUNE members + helpful GridPP people • Jones, Bauer, Dewhurst, Nowak, Pec, Davda, Moore, Blake, Hartnell, Back, Doige, Long, Fayer, 11

  12. Work areas so far q UK Resources Work by GridPP people (Manchester, RAL, Imperial, Liverpool) • Have enabled GridPP resources • Disk: 0.9 PB of Proto-DUNE data • • CPU : q Data Management Edinburgh working on RUCIO development • RAL, Imperial working on multi VO RUCIO • 12

  13. Summary q Physical Compute resources for DUNE/ProtoDUNE in short term ✓ q Physical compute resources for DUNE during exploitation – unknowable but optimistic q Computing middleware support via GridPP – to be requested in GridPP6 but optimistic q Computing Project construction Staff - 15 SY requested - believe it when it happens due to track record of the process 13

  14. Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you 14

  15. Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you 15

  16. Leave you with a sobering example of how services can go wrong Protocol is à send English to an email translation service à Welsh translation is emailed back to you This says “I am not in the office at the moment. Send any work to be translated”. 16

Recommend


More recommend