operating the operating the distributed ndgf tier 1
play

Operating the Operating the Distributed NDGF Tier-1 Distributed - PowerPoint PPT Presentation

Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grnager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008 Talk Outline Talk Outline What is NDGF ?


  1. Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008

  2. Talk Outline Talk Outline  What is NDGF ?  Why a distributed Tier-1 ?  Services  Computing  Storage  Databases  VO Specific  Operation  Results ISGC08, Taipei, April 2008 2

  3. Nordic DataGrid Facility Nordic DataGrid Facility  A Co-operative Nordic Data and Computing Grid facility  Nordic production grid, leveraging national grid resources  Common policy framework for Nordic production grid  Joint Nordic planning and coordination  Operate Nordic storage facility for major projects  Co-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1)  Develop grid middleware and services  NDGF 2006-2010  Funded (2 M € /year) by National Research Councils of the Nordic Countries NOS-N DK S SF N Nordic Data Grid Facility ISGC08, Taipei, April 2008 3

  4. Nordic DataGrid Facility Nordic DataGrid Facility  Nordic Participation in Big Science :  WLCG – the Worldwide Large Hadron Collider Grid  Gene databases for bio-informatics sciences  Screening of CO2-Sequestration suitable reservoirs  ESS – European Spallation Source  Astronomy projects  Other... ISGC08, Taipei, April 2008 4

  5. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? ISGC08, Taipei, April 2008

  6. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed ISGC08, Taipei, April 2008

  7. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7 ISGC08, Taipei, April 2008

  8. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community ISGC08, Taipei, April 2008

  9. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy ISGC08, Taipei, April 2008

  10. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center ISGC08, Taipei, April 2008

  11. Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center  Fast inter Nordic network ISGC08, Taipei, April 2008

  12. Organization – Tier-1 related Organization – Tier-1 related ISGC08, Taipei, April 2008

  13. Tier-1 Services Tier-1 Services  Storage – Tape and Disk  Computing – well connected to storage  Network - part of the LHC OPN  Databases:  3D for e.g. ATLAS  LFC for indexing files  File Transfer Service  Information systems  Monitoring  Accounting  VO Services:  ATLAS specific  ALICE specific ISGC08, Taipei, April 2008

  14. Resources at Sites Resources at Sites  Storage is distributed  Computing is distributed  Many services are distributed  But the sites are heterogeneous... ISGC08, Taipei, April 2008

  15. Resources at Sites Resources at Sites ISGC08, Taipei, April 2008

  16. Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy ISGC08, Taipei, April 2008

  17. Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally ISGC08, Taipei, April 2008

  18. Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  gLite keeps nodes idle in up/download ISGC08, Taipei, April 2008

  19. Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  ARC uses the CE for datahandling ISGC08, Taipei, April 2008

  20. Storage Storage ISGC08, Taipei, April 2008 20

  21. Storage Storage ISGC08, Taipei, April 2008 21

  22. Storage Storage ISGC08, Taipei, April 2008

  23. Storage Storage ISGC08, Taipei, April 2008

  24. Storage Storage  dCache  Java based – so runs even on Windows !  Separation between resources and services  Open source  Pools at sites  Doors and Admin nodes centrally  Part of the development  Added GridFTP2 to bypass door nodes in transfers  Various improvements a tweaks for distributed use  Central services at the GEANT endpoint ISGC08, Taipei, April 2008

  25. Storage Storage ISGC08, Taipei, April 2008

  26. Network Network  Dedicated 10GE to CERN via GEANT (LHCOPN)  Dedicated 10GE between participating Tier-1 sites Örestaden NORDUnet NREN NDGF AS - A S39590 National National Sites Switch National HPC2N IP PDC FI network SE NSC ... ... DK NO Central host(s) CERN LHC ISGC08, Taipei, April 2008

  27. Other Tier-1 Services Other Tier-1 Services  Catalogue: RLS & LFC  FTS – File Transfer Service  3D – Distributed Database Deployment  SGAS -> APEL  Service Availability Monitoring – via ARC- CE SAM sensors ISGC08, Taipei, April 2008

  28. ATLAS Services ATLAS Services  So far part of Dulcinea  Moving to PanDa  The aCT (ARC Control Tower aka “the fat pilot”)  PanDa improves gLite performance through better data handling (similar to ARC)  Moving RLS to LFC ISGC08, Taipei, April 2008

  29. ALICE Services ALICE Services  Many VO Boxes – one pr site  Aalborg, Bergen, Copenhagen, Helsinki, Jyväskylä, Linjköping, Lund, Oslo, Umeaa  Central VO Box integrating distributed dCache with xrootd  Ongoing efforts to integrate ALICE and ARC ISGC08, Taipei, April 2008

  30. NDGF Facility - 2008Q1 NDGF Facility - 2008Q1 ISGC08, Taipei, April 2008

  31. Operations Operations ISGC08, Taipei, April 2008

  32. Operation Operation ISGC08, Taipei, April 2008

  33. Operation Operation  1 st line support – (in operation)  NORDUnet NOC – 24x7  2 nd line support – (in operation)  Operator on Duty – 8x365  3 rd line support – (in operation)  NDGF Operation Staff  Sys Admins at sites  Shared tickets with NUNOC ISGC08, Taipei, April 2008

  34. People People ISGC08, Taipei, April 2008

  35. Results - Accounting Results - Accounting  According to EGEE Accounting Portal for 2007:  NDGF contributed to 4% of all EGEE  NDGF was the 5 th biggest EGEE site  NDGF was the 3 rd biggest ATLAS Tier-1 worldwide  NDGF was the biggest European ATLAS Tier-1 ISGC08, Taipei, April 2008

  36. Results - Reliability Results - Reliability  NDGF has been running SAM tests since 2007Q3  Overall 2007Q4 reliability was 96%  Which made us the most reliable Tier-1 in the world ISGC08, Taipei, April 2008

  37. Results - Efficiency Results - Efficiency  The efficiency of the NorduGrid cloud (NDGF + Tier-2/3s using ARC) was 93%  Result was mainly due to:  High middleware efficiency  High reliability  This was due to:  Distributed setup  Professional operation team ISGC08, Taipei, April 2008

  38. Worries Worries  Can re-constructions run on a distributed setup  High data throughput  Low CPU consumption  NDGF, Triumph and BNL reprocessed M5 data in February in the CCRC08-1  Shown to work  Bottleneck was 3D DB (which is running on only one machine) ISGC08, Taipei, April 2008

  39. Looking ahead... Looking ahead...  The Distributed Tier-1 a success  High efficiency  High reliability  Passed the CCRC08-1 tests  Partnering with EGEE on:  Operation (taking part in CIC on Duty)  Interoperability  Tier-2s under setup  CMS will use gLite interoperability to run on ARC ISGC08, Taipei, April 2008

  40. Thanks! Thanks! Questions? ISGC08, Taipei, April 2008 40

Recommend


More recommend