Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008
Talk Outline Talk Outline What is NDGF ? Why a distributed Tier-1 ? Services Computing Storage Databases VO Specific Operation Results ISGC08, Taipei, April 2008 2
Nordic DataGrid Facility Nordic DataGrid Facility A Co-operative Nordic Data and Computing Grid facility Nordic production grid, leveraging national grid resources Common policy framework for Nordic production grid Joint Nordic planning and coordination Operate Nordic storage facility for major projects Co-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1) Develop grid middleware and services NDGF 2006-2010 Funded (2 M € /year) by National Research Councils of the Nordic Countries NOS-N DK S SF N Nordic Data Grid Facility ISGC08, Taipei, April 2008 3
Nordic DataGrid Facility Nordic DataGrid Facility Nordic Participation in Big Science : WLCG – the Worldwide Large Hadron Collider Grid Gene databases for bio-informatics sciences Screening of CO2-Sequestration suitable reservoirs ESS – European Spallation Source Astronomy projects Other... ISGC08, Taipei, April 2008 4
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed Even the biggest adds up to 7 ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center ISGC08, Taipei, April 2008
Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? Computer centers are small and distributed Even the biggest adds up to 7 Strong Nordic HEP community Technical reasons: Added redundancy Only one 24x7 center Fast inter Nordic network ISGC08, Taipei, April 2008
Organization – Tier-1 related Organization – Tier-1 related ISGC08, Taipei, April 2008
Tier-1 Services Tier-1 Services Storage – Tape and Disk Computing – well connected to storage Network - part of the LHC OPN Databases: 3D for e.g. ATLAS LFC for indexing files File Transfer Service Information systems Monitoring Accounting VO Services: ATLAS specific ALICE specific ISGC08, Taipei, April 2008
Resources at Sites Resources at Sites Storage is distributed Computing is distributed Many services are distributed But the sites are heterogeneous... ISGC08, Taipei, April 2008
Resources at Sites Resources at Sites ISGC08, Taipei, April 2008
Computing Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy ISGC08, Taipei, April 2008
Computing Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally ISGC08, Taipei, April 2008
Computing Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally gLite keeps nodes idle in up/download ISGC08, Taipei, April 2008
Computing Computing A distributed compute center uses a grid for LRMS... Need to run on all kind of Linux distributions Use resources optimally Easy to deploy NorduGrid/ARC ! Already deployed Runs on all Linux flavors Uses resources optimally ARC uses the CE for datahandling ISGC08, Taipei, April 2008
Storage Storage ISGC08, Taipei, April 2008 20
Storage Storage ISGC08, Taipei, April 2008 21
Storage Storage ISGC08, Taipei, April 2008
Storage Storage ISGC08, Taipei, April 2008
Storage Storage dCache Java based – so runs even on Windows ! Separation between resources and services Open source Pools at sites Doors and Admin nodes centrally Part of the development Added GridFTP2 to bypass door nodes in transfers Various improvements a tweaks for distributed use Central services at the GEANT endpoint ISGC08, Taipei, April 2008
Storage Storage ISGC08, Taipei, April 2008
Network Network Dedicated 10GE to CERN via GEANT (LHCOPN) Dedicated 10GE between participating Tier-1 sites Örestaden NORDUnet NREN NDGF AS - A S39590 National National Sites Switch National HPC2N IP PDC FI network SE NSC ... ... DK NO Central host(s) CERN LHC ISGC08, Taipei, April 2008
Other Tier-1 Services Other Tier-1 Services Catalogue: RLS & LFC FTS – File Transfer Service 3D – Distributed Database Deployment SGAS -> APEL Service Availability Monitoring – via ARC- CE SAM sensors ISGC08, Taipei, April 2008
ATLAS Services ATLAS Services So far part of Dulcinea Moving to PanDa The aCT (ARC Control Tower aka “the fat pilot”) PanDa improves gLite performance through better data handling (similar to ARC) Moving RLS to LFC ISGC08, Taipei, April 2008
ALICE Services ALICE Services Many VO Boxes – one pr site Aalborg, Bergen, Copenhagen, Helsinki, Jyväskylä, Linjköping, Lund, Oslo, Umeaa Central VO Box integrating distributed dCache with xrootd Ongoing efforts to integrate ALICE and ARC ISGC08, Taipei, April 2008
NDGF Facility - 2008Q1 NDGF Facility - 2008Q1 ISGC08, Taipei, April 2008
Operations Operations ISGC08, Taipei, April 2008
Operation Operation ISGC08, Taipei, April 2008
Operation Operation 1 st line support – (in operation) NORDUnet NOC – 24x7 2 nd line support – (in operation) Operator on Duty – 8x365 3 rd line support – (in operation) NDGF Operation Staff Sys Admins at sites Shared tickets with NUNOC ISGC08, Taipei, April 2008
People People ISGC08, Taipei, April 2008
Results - Accounting Results - Accounting According to EGEE Accounting Portal for 2007: NDGF contributed to 4% of all EGEE NDGF was the 5 th biggest EGEE site NDGF was the 3 rd biggest ATLAS Tier-1 worldwide NDGF was the biggest European ATLAS Tier-1 ISGC08, Taipei, April 2008
Results - Reliability Results - Reliability NDGF has been running SAM tests since 2007Q3 Overall 2007Q4 reliability was 96% Which made us the most reliable Tier-1 in the world ISGC08, Taipei, April 2008
Results - Efficiency Results - Efficiency The efficiency of the NorduGrid cloud (NDGF + Tier-2/3s using ARC) was 93% Result was mainly due to: High middleware efficiency High reliability This was due to: Distributed setup Professional operation team ISGC08, Taipei, April 2008
Worries Worries Can re-constructions run on a distributed setup High data throughput Low CPU consumption NDGF, Triumph and BNL reprocessed M5 data in February in the CCRC08-1 Shown to work Bottleneck was 3D DB (which is running on only one machine) ISGC08, Taipei, April 2008
Looking ahead... Looking ahead... The Distributed Tier-1 a success High efficiency High reliability Passed the CCRC08-1 tests Partnering with EGEE on: Operation (taking part in CIC on Duty) Interoperability Tier-2s under setup CMS will use gLite interoperability to run on ARC ISGC08, Taipei, April 2008
Thanks! Thanks! Questions? ISGC08, Taipei, April 2008 40
Recommend
More recommend