Operating the Operating the Distributed NDGF Tier-1 Distributed - PowerPoint PPT Presentation

Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grønager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008

Talk Outline Talk Outline  What is NDGF ?  Why a distributed Tier-1 ?  Services  Computing  Storage  Databases  VO Specific  Operation  Results ISGC08, Taipei, April 2008 2

Nordic DataGrid Facility Nordic DataGrid Facility  A Co-operative Nordic Data and Computing Grid facility  Nordic production grid, leveraging national grid resources  Common policy framework for Nordic production grid  Joint Nordic planning and coordination  Operate Nordic storage facility for major projects  Co-ordinate & host major eScience projects (i.e., Nordic WLGC Tier-1)  Develop grid middleware and services  NDGF 2006-2010  Funded (2 M € /year) by National Research Councils of the Nordic Countries NOS-N DK S SF N Nordic Data Grid Facility ISGC08, Taipei, April 2008 3

Nordic DataGrid Facility Nordic DataGrid Facility  Nordic Participation in Big Science :  WLCG – the Worldwide Large Hadron Collider Grid  Gene databases for bio-informatics sciences  Screening of CO2-Sequestration suitable reservoirs  ESS – European Spallation Source  Astronomy projects  Other... ISGC08, Taipei, April 2008 4

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ? ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7 ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center ISGC08, Taipei, April 2008

Why a Distributed Tier-1 ? Why a Distributed Tier-1 ?  Computer centers are small and distributed  Even the biggest adds up to 7  Strong Nordic HEP community  Technical reasons:  Added redundancy  Only one 24x7 center  Fast inter Nordic network ISGC08, Taipei, April 2008

Organization – Tier-1 related Organization – Tier-1 related ISGC08, Taipei, April 2008

Tier-1 Services Tier-1 Services  Storage – Tape and Disk  Computing – well connected to storage  Network - part of the LHC OPN  Databases:  3D for e.g. ATLAS  LFC for indexing files  File Transfer Service  Information systems  Monitoring  Accounting  VO Services:  ATLAS specific  ALICE specific ISGC08, Taipei, April 2008

Resources at Sites Resources at Sites  Storage is distributed  Computing is distributed  Many services are distributed  But the sites are heterogeneous... ISGC08, Taipei, April 2008

Resources at Sites Resources at Sites ISGC08, Taipei, April 2008

Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy ISGC08, Taipei, April 2008

Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally ISGC08, Taipei, April 2008

Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  gLite keeps nodes idle in up/download ISGC08, Taipei, April 2008

Computing Computing  A distributed compute center uses a grid for LRMS...  Need to run on all kind of Linux distributions  Use resources optimally  Easy to deploy  NorduGrid/ARC !  Already deployed  Runs on all Linux flavors  Uses resources optimally  ARC uses the CE for datahandling ISGC08, Taipei, April 2008

Storage Storage ISGC08, Taipei, April 2008 20

Storage Storage ISGC08, Taipei, April 2008 21

Storage Storage ISGC08, Taipei, April 2008

Storage Storage  dCache  Java based – so runs even on Windows !  Separation between resources and services  Open source  Pools at sites  Doors and Admin nodes centrally  Part of the development  Added GridFTP2 to bypass door nodes in transfers  Various improvements a tweaks for distributed use  Central services at the GEANT endpoint ISGC08, Taipei, April 2008

Storage Storage ISGC08, Taipei, April 2008

Network Network  Dedicated 10GE to CERN via GEANT (LHCOPN)  Dedicated 10GE between participating Tier-1 sites Örestaden NORDUnet NREN NDGF AS - A S39590 National National Sites Switch National HPC2N IP PDC FI network SE NSC ... ... DK NO Central host(s) CERN LHC ISGC08, Taipei, April 2008

Other Tier-1 Services Other Tier-1 Services  Catalogue: RLS & LFC  FTS – File Transfer Service  3D – Distributed Database Deployment  SGAS -> APEL  Service Availability Monitoring – via ARC- CE SAM sensors ISGC08, Taipei, April 2008

ATLAS Services ATLAS Services  So far part of Dulcinea  Moving to PanDa  The aCT (ARC Control Tower aka “the fat pilot”)  PanDa improves gLite performance through better data handling (similar to ARC)  Moving RLS to LFC ISGC08, Taipei, April 2008

ALICE Services ALICE Services  Many VO Boxes – one pr site  Aalborg, Bergen, Copenhagen, Helsinki, Jyväskylä, Linjköping, Lund, Oslo, Umeaa  Central VO Box integrating distributed dCache with xrootd  Ongoing efforts to integrate ALICE and ARC ISGC08, Taipei, April 2008

NDGF Facility - 2008Q1 NDGF Facility - 2008Q1 ISGC08, Taipei, April 2008

Operations Operations ISGC08, Taipei, April 2008

Operation Operation ISGC08, Taipei, April 2008

Operation Operation  1 st line support – (in operation)  NORDUnet NOC – 24x7  2 nd line support – (in operation)  Operator on Duty – 8x365  3 rd line support – (in operation)  NDGF Operation Staff  Sys Admins at sites  Shared tickets with NUNOC ISGC08, Taipei, April 2008

People People ISGC08, Taipei, April 2008

Results - Accounting Results - Accounting  According to EGEE Accounting Portal for 2007:  NDGF contributed to 4% of all EGEE  NDGF was the 5 th biggest EGEE site  NDGF was the 3 rd biggest ATLAS Tier-1 worldwide  NDGF was the biggest European ATLAS Tier-1 ISGC08, Taipei, April 2008

Results - Reliability Results - Reliability  NDGF has been running SAM tests since 2007Q3  Overall 2007Q4 reliability was 96%  Which made us the most reliable Tier-1 in the world ISGC08, Taipei, April 2008

Results - Efficiency Results - Efficiency  The efficiency of the NorduGrid cloud (NDGF + Tier-2/3s using ARC) was 93%  Result was mainly due to:  High middleware efficiency  High reliability  This was due to:  Distributed setup  Professional operation team ISGC08, Taipei, April 2008

Worries Worries  Can re-constructions run on a distributed setup  High data throughput  Low CPU consumption  NDGF, Triumph and BNL reprocessed M5 data in February in the CCRC08-1  Shown to work  Bottleneck was 3D DB (which is running on only one machine) ISGC08, Taipei, April 2008

Looking ahead... Looking ahead...  The Distributed Tier-1 a success  High efficiency  High reliability  Passed the CCRC08-1 tests  Partnering with EGEE on:  Operation (taking part in CIC on Duty)  Interoperability  Tier-2s under setup  CMS will use gLite interoperability to run on ARC ISGC08, Taipei, April 2008

Thanks! Thanks! Questions? ISGC08, Taipei, April 2008 40

Operating the Operating the Distributed NDGF Tier-1 Distributed - PowerPoint PPT Presentation

Operating the Operating the Distributed NDGF Tier-1 Distributed NDGF Tier-1 Michael Grnager Technical Coordinator, NDGF International Symposium on Grid Computing 08 Taipei , April 10 th 2008 Talk Outline Talk Outline What is NDGF ?

dCache dCache in the in the NDGF Distributed Tier 1 NDGF Distributed Tier 1 Gerd Behrmann

CSE 513 I ntroduction to Operating Systems Class 9 - Distributed and Multiprocessor Operating

Unit 15: Experimental Microkernel Systems 15.1. The Amoeba Distributed Operating System AP 9/01

About Me Bhuvan Urgaonkar Operating Systems Assistant Professor, CSE Operating Systems

Experiences with the Amoeba Distributed Operating System (Andrew Tanenbaum, Robbert van Renesse,

Distributed Systems CS 111 Operating Systems Peter Reiher Lecture 16 CS 111 Page 1 Spring

Network OS OpenFlow Network OS: distributed system that creates a consistent, up-to-date network

Interoperation with Interoperation with Infrastructures: Infrastructures: NDGF-EGEE NDGF-EGEE

Distributed Systems CS 111 Operating Systems Peter Reiher Lecture 16 CS 111 Page 1 Fall 2015

Scheduling Operating System Services PhD Planner Research Area: Operating Systems, Distributed

Roadmap for Section 1.2. History of Operating Systems Tasks of an Operating System OS as

A better picture Many applications Application One Operating System System calls Operating

Operating Systems WT 2019/20 Abridged History of Operating Systems Something to Ponder What is

The Operating System Computer Literacy1 Lecture 6 02/10/08 Topics Firmware Operating

Why Nobody Should Care About Operating Systems for Exascale Operating Systems for Exascale Ron

History Where the idea for Operating systems came from Genealogy of Operating Systems

Unique Aspects of Operating in an Airport 1 Operating a Business in an Airport . . . Its

Introduction to OSs What is an Operating System? Architectural Support for Operating

CPS 210: Operating Systems CPS 210: Operating Systems Operating Systems: The Big Picture

WHAT IS IN OUR ANNUAL OPERATING PLAN? Included: Operating Repair and Sponsored Total

Networking for Operating Systems CS 111 Operating Systems Peter Reiher Lecture 15 CS 111

Networking for Operating Systems CS 111 Operating Systems Peter Reiher Lecture 15 CS 111

CS 423 Operating System Design: Distributed File Systems Acknowledgement: This slide set is

Operating System Overview Chapter 2 1 Operating System A program that controls the