cms computing using the worldwide lhc computing grids
play

CMS Computing Using the Worldwide LHC Computing Grids Lothar A. T. - PowerPoint PPT Presentation

CMS CMS CMS Computing Using the Worldwide LHC Computing Grids Lothar A. T. Bauerdick/Fermilab Talk at the International Symposium for Grid Computing 2006 Academia Sinica, Taipei, May 2006 LATBauerdick Fermilab ISGC 2006 CMS Computing


  1. CMS CMS CMS Computing Using the Worldwide LHC Computing Grids Lothar A. T. Bauerdick/Fermilab Talk at the International Symposium for Grid Computing 2006 Academia Sinica, Taipei, May 2006 LATBauerdick Fermilab ISGC 2006 — CMS Computing May 3, 2006 1

  2. f Worldwide LHC Computing Grid WLCG ✦ Oct 2005 founded collaboration of regional centers: WLCG ★ to provide the computing infrastructure and services for LHC computing ✦ WLCG-MoU Uni x Tier-2 Tier-2 ★ resource pledges Uni u ✦ WLCG Service Open Science Lab m Uni x Tier-2 ★ storage services Grid Uni z ★ “batch slots” Uni a USA Tier-2 Brookhaven Lab a ★ data transfer infrastructure UK USA FermiLab ★ middleware France Tier 1 ★ Grid operations,support Physics CERN Uni Tier2 Department ★ Service Challenges ………. Italy Desktop ★ ... Germany NL � ✦ 7 Tier-1s, ~14 Tier-2s Lab b Lab c ★ ready to accept � Uni y Uni b CMS datasets and jobs � LATBauerdick/ Fermilab 2 ISGC 2006 — CMS Computing May 3, 2006

  3. U.S. Open Science Grid USCMS US CMS f participating in WLCG ✦ Open Science Grid is common U.S. CMS and U.S. ATLAS strategy ★ opened in July 05 with a diverse community using it and providing resources to it, including the U.S. LHC resources at Tier-1, Tier-2, Tier-3 ★ LHC driving schedule, close collaboration with EGEE through WLCG ✦ U.S. LHC has worked with many other communities ★ to seek a solution to end of funding of Grid middleware projects currently supported by external U.S. Grid programs ★ to generate compelling proposals for common projects in grid operations ★ to develop strategy with OSG toward getting these proposals funded ✦ OSG governance is in place and very active ★ executive director elected with strong team and strong LHC participation ➡ Ruth Pordes, Miron Livny, Frank Würthwein, Torre Wenaus, Rob Gardner, and others ✦ applying for funding to build infrastructure that can sustain LHC computing support: “sustained robust distributed facility” ★ proposal submitted, expecting to hear soon: OSG Facility and Extension LATBauerdick/ Fermilab 3 ISGC 2006 — CMS Computing May 3, 2006

  4. U.S. CMS Computing Infrastructure US USCMS CMS f Is Part of OSG ✦ Tier-1 and 7 U.S. CMS Tier-2s are online&active DISUN Tier-2C ✦ plan to support ~500 physics users Tier-2 MIT U.Nebraska LATBauerdick/ Fermilab 4 ISGC 2006 — CMS Computing May 3, 2006

  5. f Computing Resources on the Grid ✦ All of Tier-0, Tier-1, Tier-2 and CAFs essential for success ★ Tier-1 for organized mass processing, custodial storage ★ Tier-2 and CMS analysis facilities for analysis computing ★ driving computing by placing and managing distributed CMS data ✦ Large Aggregate Computing Resources Required: ★ in 2008 CMS requests total of 45 MSI2k CPU, 14 PB disk, 24 PB tape LATBauerdick/ Fermilab 5 ISGC 2006 — CMS Computing May 3, 2006

  6. f LHC Computing Resource Requests ✦ Computing Models + Set of Input Parameters ★ —> estimated resource needs — examples for CMS and Atlas: ATLAS CMS Raw evt. size A 1.6 1.5 MB Trigger rate B 200 150 Hz ESD/RECO Size C 0.5 0.25 MB ESD/RECO Copies D 2 1 Reco. time E 15 25 kSi2k-sec Analysis Time F 0.5 0.25 kSi2k-sec Simulation Time G 100 45 kSi2k-sec Fraction of Sim evts. H 0.2 1.0 Following table: A few simple products illustarting the effect on the overall computing requirements. The units in some of these should not be taken literally. These products are intended to serve as "indicators" of the combined effect of some of the input variables of the table above. ATLAS CMS Raw Reco CPU B x E 3000 3750 kSi2k Raw Volume per sec (of data-taking) A x B 320 225 MB/s ESD Volume on storage per sec (of data-taking) B x C x D 200 37.5 MB/s CPU for one Analysis pass B x F 100 37.5 kSi2k Simulation CPU B x G x H 4000 6750 kSi2k LATBauerdick/ Fermilab 6 ISGC 2006 — CMS Computing May 3, 2006

  7. f Large Computing Resource Pledges ✦ Seven Tier-1 centers catering to CMS ✦ Also 28 Tier-2 sites have come to the table ★ most of them listed in the WLCG-MoU ★ some eleven Tier-2s already working actively ✦ There are enormous resources planned for in the system! ✦ Still, CMS could not get enough pledges for data storage at Tier-1 ★ CMS needs to store about 17 PB of data samples at Tier-1s ★ there is not enough tape library space to host CMS data in 2008! CPU [MSI2k] Disk [PB] Tape [PB] avail. to CMS and Atlas in all countries 34.2 18.2 17.4 pledged to CMS 11.6 5.5 8.1 required by CMS 15.2 7 16.7 LATBauerdick/ Fermilab 7 ISGC 2006 — CMS Computing May 3, 2006

  8. f Disk vs Tape ✦ the resources pledges from Tier-1 centers are very light on tape ★ while very strong on disk, a total of 18 PB planned for CMS+Atlas! ✦ with current planning, many of the CMS Tier-1 centers don’t have the requested storage to host a reasonable share of CMS data ★ most centers below minimal size in terms of tapes offered to CMS ✦ this is a serious problem that would impact CMS physics potential LATBauerdick/ Fermilab 8 ISGC 2006 — CMS Computing May 3, 2006

  9. f Tapes Are Really A Good Thing! Tape Reading at FNAL Tape Writing at FNAL Fermilab is already writing 9 TB of data to tape each day. Delivering more than 15 TB/day of data from tape to users is • 3.6 PB/year is an average of 10 TB/day common at Fermilab. • Experience exists to do this now Jon Bakken DOE/NSF-2006 Review February 7, 2006 Jon Bakken DOE/NSF-2006 Review February 7, 2006 45 46 ✦ CMS requires “library style” data storage at Tier-1 centers! ★ a lot of sequential access to data, in particular to MC data! ✦ Tape libraries = cheap storage w/ very performant access ★ Fermilab Tier-1 estimated costs to deploy the 2008 resources: ★ for disk: $1400/TB (2.0 PB), for tape: $400/TB (4.7 PB) ★ initial tape library costs ~$700k, incremental media costs $200/TB LATBauerdick/ Fermilab 9 ISGC 2006 — CMS Computing May 3, 2006

  10. High Throughputs With f Tapes + Disks ✦ Combination of high-throughput tape libraries and disk caches ★ amazing performance is feasible, see Suen Hou ‘ s talk w/ CDF numbers! ★ to achieve this does require to gain quite a bit of experience Data Disk Reads dCache plot of data delivered to the users. Maximum in this plot is about 85 TB/day. The record maximum day was ~200 TB/day. (We suspect much more since monitoring cell was on too weak of a node and could not keep track of all the transfers.) Note the small red histogram below y=0. This represents the data read from tape. Jon Bakken DOE/NSF-2006 Review February 7, 2006 35 ➡ but PBs of disks w/o local tape potential commissioning&maintenance nightmare! LATBauerdick/ Fermilab 10 ISGC 2006 — CMS Computing May 3, 2006

  11. f CMS using the WLCG Service ✦ SC3: Integration into functional Computing Service ★ End-to-end test of WLCG computing system for “CMS use case” ★ Impressive progress, in particular Tier-2s ➡ various CMS computing components have done rather well ➡ well-performing large and complex storage systems at many sites ★ reasonable throughputs achieved: total 0.3 PB, 32000 jobs, 38M events ➡ daily aggregate rate T0 —> T1 peaked at 120MB/s, T1 —> T2 ~ 30MB/s ✦ The WLCG Distributed Computing System for CMS is a reality! ★ it works, has lots of resources, many good people working hard ★ there is a lot of momentum and some convergence ★ at this point the WLCG system is scarily fragile ➡ still many severe problems — users are the “first to know” :-( ✦ Next is Service Challenge 4 and CMS Data Challenge CSA06 ★ SC4 goal for CMS: establish main data flows and “analysis” workflow ★ CSA2006 September/October: end-to-end system test of CMS computing, software, analysis LATBauerdick/ Fermilab 11 ISGC 2006 — CMS Computing May 3, 2006

  12. f SC4: Goal Metric for CMS Computing ✦ CMS needs to be at production scale services in 2008 ➡ From experience, cannot more than double the scale each year ★ we should be able to demonstrate 25% of the 2008 scale now in 2006! ✦ C-TDR computing model defines scales: ★ Network transfers between T0-T1 centers ➡ 2008 scale is roughly 600MB/s ★ Network transfers between T1-T2 centers ➡ 2008 Peak rates from Tier-1 to Tier-2 of 50-500MB/s ★ Selection Submissions and Transfers to Tier-1 centers ➡ 2008 submission rate 50k jobs per day to integrated Tier-1 centers ★ Analysis Submissions to Tier-2 Centers centers ➡ 2008 Submission rate 150k jobs to integrated Tier-2 centers ★ MC Production jobs at Tier-2 centers ➡ 2008 rate is 1.3 x10^9 Events per year ✦ Define the goals for SC4 Performance Metrics accordingly LATBauerdick/ Fermilab 12 ISGC 2006 — CMS Computing May 3, 2006

  13. CMS Data Flows, Workflows, f Performance Goals for 2008 40 MB/s Tier-0 (FEVT, AOD) Tier-1 re-processed, Tier-1s 2.5 MSI2K AOD 1 PB disk 280 MB/s 2.2 PB tape (RAW, RECO, AOD) 10 Gbps WAN 48 MB/s Tier-2s (MC datasets) 240 MB/s (analysis datasets: 900 MB/s skimmed AOD, FEVT) Tier-0 (AOD, FEVT skimming, data processing etc) CMS CMS 225 MB/s 4.6 MSI2K 280 MB/s Tier-1s (RAW) (FEVT, AOD) 0.4 PB disk 4.9 PB tape WNs 5 Gbps WAN 225 MB/s Tier-2 (RAW) 6 0 M B / s 0.9 MSI2K ( s k i m m e d A O D , F E V T ) 0.2 PB disk Tier-1s 1 2 M B / s 1 Gbps WAN WNs ( M C d a t a s e t s ) 200 MB/s, up to 1GB/s (AOD analysis, calibration) WNs LATBauerdick/ Fermilab 13 ISGC 2006 — CMS Computing May 3, 2006

Recommend


More recommend