Grid-3 and the Open Science Grid in the U.S. LATBauerdick, Fermilab International Symposium on Grid Computing ISGC 2004 中央研究院 Academia Sinica, Taipei, Taiwan f Lothar A T Bauerdick Fermilab ISGC 2004 July 27, 2004
f U.S. Grids Science Drivers Science drivers for U.S. Physics Grid Projects: iVDGL, GriPhyN and PPDG (”Trillium”) 2009 � ATLAS & CMS experiments @ CERN LHC � 100s of Petabytes 2007 - ? Community growth � High Energy & Nuclear Physics expts 2007 Data growth � ~1 Petabyte (1000 TB) 1997 – present � LIGO (gravity wave search) 2005 � 100s of Terabytes 2002 – present � Sloan Digital Sky Survey 2003 � 10s of Terabytes 2001 – present 2001 Future Grid resources � Massive CPU (PetaOps) � Large distributed datasets (>100PB) � Global communities (1000s) Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 2
f Globally Distributed Science Teams Sharing and federating vast Grid resources Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 3
f Gravitational Wave Observatory Grid-enabled GW Pulsar Search using the Pegasus system Goal: Implement a production-level blind galactic-plane search for Gravitational Wave pulsar signals Run 30 days on ~5-10x more resources than LIGO has -- using the grid (e.g., 10,000 CPUs for 1 month) Millions of individual jobs Planning by GriPhyN Chimera/Pegasus Execution by Condor DAGman File cataloging by Globus RLS Metadata by Globus MCS Achieved: Access to ~ 6000 CPUs for 1 week ~ 5% utilization due to bottlenecks Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 4
f Sloan Digital Sky Survey Galaxy Cluster Finding: red-shift analysis, weak lensing effects Using the GriPhyN Chimera and Pegasus Coarse grained DAG works fine (batch system) Fine grain DAG has scaling issues (virtual data system) Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 5
f Large Hadron Collider Energy frontier high luminosity p-p-collider at CERN order-of-magnitue step in energy and luminosity for particle physics 1 1 0 0 0 0 0 0 0 Constituent Center-of-Mass Energy (GeV) Constituent Center-of-Mass Energy (GeV) � � L L H H C L H H C C L L E E P ( ( C C E E R R N N ) 1 1 0 0 0 0 0 0 T T e e v v a a t t r r o o n n 1 1 9 9 9 9 5 5 : : t t o o p p ( ( F F e e r r m m i i l l a a b b ) L L E E P P 2 2 0 0 0 S S p p p p S 1 1 9 9 8 8 3 3 : : W W , , Z Z H H E E R R A ( ( C C E E R R N N ) ) 1 1 0 0 0 0 L L E E P P 1 1 1 1 9 9 8 8 9 9 : : 3 3 f f a a m m i i l l i i e e s s ( ( C C E E R R N N ) ) 1 1 9 9 7 7 9 9 : : G G L L U U O O N N P P E E T T R R A ( ( D D E E S S Y Y ) ) 1 1 0 0 S S P P E E A A R R ( ( S S t t a a n n f f o o r r d d ) � � 1 1 9 9 7 7 4 4 : : J J / / � � 1 1 9 9 7 7 5 5 : : 1 1 + + – – e e e e C C O O L L L L I I D D E E R R S e e - - p p C C O O L L L L I I D D E E R R S H H A A D D R R O O N N C C O O L L L L I I D D E E R R S 1 1 9 9 6 6 0 0 1 1 9 9 7 7 0 0 1 1 9 9 8 8 0 0 1 1 9 9 9 9 0 0 2 2 0 0 0 0 0 2 2 0 0 1 1 0 0 2 2 0 0 2 2 0 Y Y e e a a r r o o f f F F i i r r s s t t P P h h y y s s i i c c s Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 6
f Emerging LHC Production Grids LHC first to put “real”, multi-organizational, global Grids to work large resources become available to experiments “opportunistically” Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 7
f Grid2003 project in 2003 U.S. science projects and Grid projects coming together to build a multi-organizational Infrastructure: Grid3 US LHC projects Korea CMS testbeds, data challenges Tevatron RHIC BaBar end-to-end HENP virtual data research applications VDT U.Buffalo virtual data grid laboratory BTeV Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 8
Grid3 Initial multi-organizational f Grid infrastructure Common Grid operating as coherent loosely-coupled infrastructure. Applications running on Grid3 (Trillium, U.S. LHC), benefiting LHC (3), SDSS (2), LIGO (1), Biology (2), Computer Science (3). 25 Universities 4 National Labs 2800 CPUs July-26, 2004, 11:35pm CDT Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 9
f Resource Sharing Works example: U.S. CMS Data Challenge Simulation production running on Grid3 since Nov 2003 profited at least 40% non-CMS resources in first quarter 2004 Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 10
f Important Role of Tier2 Centers Tier2 facility logically grouped around their Tier1 regional center 20 – 40% of Tier1? “1-2 FTE support”: commodity CPU & disk, no hierarchical storage Essential university role in extended computing infrastructure Validated by 3 years of experience with proto-Tier2 sites Specific Functions for Science Collaborations Physics analysis Simulation Experiment software Support smaller institutions Official role in Grid hierarchy (U.S.) Sanctioned by MOU (ATLAS, CMS, LIGO) Local P.I. with reporting responsibilities Selection by collaboration via careful process Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 11
Grid3 infrastructure built upon f the Virtual Data Toolkit Grid environment built from core Globus and Condor middleware, as delivered through the Virtual Data Toolkit (VDT) GRAM, GridFTP, MDS, RLS, VDS, VOMS, … VDT sponsored through GriPhyN and iVDGL, contributions from LCG …equipped with VO and multi-VO security, monitoring, and operations services …allowing federation with other Grids where possible, eg. CERN LHC Computing Grid (LCG) U.S.ATLAS: GriPhyN Virtual Data System execution on LCG sites U.S.CMS: storage element interoperability (SRM/dCache) Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 12
f Grid3 Principles Simple approach: Sites consisting of Computing element (CE) Storage element (SE) Information and monitoring services VO level, and multi-VO VO information services Operations (iGOC) Minimal use of grid-wide systems No centralized resource broker, replica/metadata catalogs, or command line interface to be provided by individual VO’s Application driven adapt application to work with Grid-3 services prove application on VO testbeds Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 13
f “loosely coupled” set of services The Grid3 environment consists on a “loosely coupled” set of services Processing Service Globus-Gram bridge from Condor-G for central submission four separate queueing systems are being supported Data Transfer Services GridFTP interfaces on all sites through gateway systems Files are transferred into processing sites Results are transferred directly into MSS GridFTP door CMS has moved to SRM-based storage element functionality VO Management Services Need central service for authentication, VOMS Monitoring Services System and application level monitoring allows status verification and diagnoses Software Distribution Services lightweight, based on Pacman Information Services top help applications and monitoring, based on MDS Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 14
f Site Services and Installation Goal is to install and configure with minimal human intervention Use Pacman tool and distributed software “caches” Registers site with VO and Grid3 level services Accounts, application install areas & working directories %pacman –get iVDGL:Grid3 Grid3 Site VDT $app VO service $tmp GIIS register Compute Info providers Element Grid3 Schema Storage Log management 4 hours to install and validate Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 15
f VO centric model “what are the services to enable application VOs” “what do providers need to provide resources to their VOs” Lightweight-nes at the cost of centrally provided functionality VOMS examples for this approach: servers flexible VO security infrastructure SDSS DOEGrids Certificate Authority US CMS PPDG and iVDGL Registration Authorities, with VO or site sponsorship Grid3 Automated multi-VO authorization, US ATLAS gridmap using EDG-developed VOMS Grid3 Sites Each VO manages a service and it’s members BTeV Each Grid3 site is able to generate and locally adjust LSC gridmap file with authenticated query to each VO service iVDGL VOs negotiate policies & priorities with provider directly VOs can run their own storage services U.S. CMS sites run SRM/dCache storage services on Tier-1 and Tier-2s Lothar A T Bauerdick Fermilab ISGC 2004 Academia Sinica July 27, 2004 16
Recommend
More recommend