CMS CMS CMS Plans and Strategy for Physics Analysis on the Grids Lothar A. T. Bauerdick/Fermilab Invited Talk at the International Symposium for Grid Computing 2006 Academia Sinica, Taipei, May 2006 LATBauerdick Fermilab ISGC 2006 — CMS Computing and Analysis May 2, 2006 1
f Contents of Talk ✦ Introduction: ★ Worldwide LHC Computing Grid should be ready for Physics Soon! ✦ CMS Computing and Data Model ★ computing tiers, data tiers, data structures ✦ Data Analysis Strategy ★ Analysis Process and Model ✦ First experience ★ CMS data analysis on the grid using the CRAB system ✦ first attempts and many open questions ✦ Acknowledgment ★ many slides lifted from CMS talks at the recent CHEP06 in Mumbai! LATBauerdick/ Fermilab 2 ISGC 2006 — CMS Analysis May 2, 2006
f LHC startup plan Stage 1 Initial commissioning 43x43 to 156x156, N=3x10 10 Zero to partial squeeze L=3x10 28 - 2x10 31 Stage 2 75 ns operation 936x936, N=3-4x10 10 partial squeeze L=10 32 - 4x10 32 Stage 3 25 ns operation 2808x2808, N=3-5x10 10 partial to near full squeeze L=7x10 32 - 2x10 33 LATBauerdick/ Fermilab 3 ISGC 2006 — CMS Analysis May 2, 2006
f LHC First Physics Run in 2008 � Integrated luminosity with the current LHC plans Run 2008 Run 2008 Lumi (cm -2 s -1 ) 1.E+04 1.9 fb -1 1.9 fb -1 10 33 1.E+03 1 fb -1 (optimistic?) 10 32 1.E+02 1.E+01 10 31 1.E+00 1.E-01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 � LHC = 30% weeks (optimistic!) luminosity (10**30 cm-2 sec-1) integrated luminosity (pb-1) events/crossing LATBauerdick/ Fermilab 4 ISGC 2006 — CMS Analysis May 2, 2006
f LHC First Physics Run in 2008 � Integrated luminosity with the current LHC plans Run 2008 Run 2008 Higgs (?) Lumi Susy - Susy (cm -2 s -1 ) Z’ � muons 1.E+04 Top re-discovery 1.9 fb -1 1.9 fb -1 10 33 1.E+03 1 fb -1 (optimistic?) 10 32 1.E+02 1.E+01 10 31 1.E+00 1.E-01 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 � LHC = 30% weeks (optimistic!) luminosity (10**30 cm-2 sec-1) integrated luminosity (pb-1) events/crossing LATBauerdick/ Fermilab 4 ISGC 2006 — CMS Analysis May 2, 2006
f Pilot Run PILOT RUN Int. Lumi Pile-up Lumi (pb -1 ) (cm -2 s -1 ) 1.E+02 10 31 10 1.E+01 10 30 1 1.E+00 0.1 10 29 1.E-01 10 28 1.E-02 11 13 15 17 19 21 23 25 27 29 1 3 5 7 9 1.E-03 1.E-04 � LHC = 20% DAYS (optimistic!) luminosity (10**30 cm-2 sec-1) integrated luminosity (pb-1)" events/crossing ✦ 30 days, maybe less (?); 43 x 43 bunches, then 156 x 156 bunches ✦ jets and IVB production — 15 pb -1 ==> 30K W’s and 4K Zs into leptons ✦ Measure cross sections, W and Z charge asymmetry (pdfs; IVB+jet production; top!) LATBauerdick/ Fermilab 5 ISGC 2006 — CMS Analysis May 2, 2006
Distributed Computing f Better Be Delivering on Time! ✦ Last year of preparation for Grid computing for LHC to work ★ computing resources are geographically distributed, interconnected via high throughput networks and operated by means of Grid software ★ WLCG systems is still very fragile, but it is functional and being used ★ Tier-0, Tier-1, Tier-2 and CAFs all are essential for success ✦ Large Aggregate Computing Resources Required: ★ in 2008 CMS requests total of 45 MSI2k CPU, 14 PB disk, 24 PB tape ➡ CMS computing model document (CERN-LHCC-2004-035) ➡ CMS computing Technical Design Report (CERN-LHCC-2005-023) LATBauerdick/ Fermilab 6 ISGC 2006 — CMS Analysis May 2, 2006
f Large Computing Resource Pledges ✦ Seven Tier-1 centers catering to CMS ★ ASGC amongst them! ★ Still, CMS did not get sufficient pledges for data storage at Tier-1 centers ➡ not enough tape library space to store all 2008 event and simulated data ✦ Most of the CMS Data Analysis will happen at Tier-2 centers ★ Tier-2 resource situation looks good! 50 MSI2k ~ 10,000 nodes! ★ some 30 Tier-2 sites are offering computing resources to CMS ➡ most of them listed in the WLCG-MoU ★ some eleven Tier-2s already working actively Summary Tier2s Split 2008 ALICE ATLAS CMS LHCb SUM 2008 Offered 5636 20114 18329 4436 48515 CPU (kSI2K) TDR Req. 14400 19940 19300 7650 61290 Balance -61% 1% -5% -42% -21% Offered 1464 6252 4760 840 13316 Disk (Tbytes) TDR Req. 3479 8748 4900 23 17150 Balance -58% -29% -3% -22% Offered 345 683 500 380 1908 Tape (Tbytes) LATBauerdick/ Fermilab 7 ISGC 2006 — CMS Analysis May 2, 2006
f Principles of CMS Computing Model ✦ Emphasizing the inherent structure of CMS data and data access ★ Structured Data and Structured Grids ✦ Data Granularity and Data Tiers ★ optimize sequential data access to well-defined Data Tiers ➡ eliminate object database philosophy from Event Data Model ★ Data always needs to be considered in its trigger context -> trigger paths ➡ O(2PB)/yr raw data split into O(50) (40TB) trigger-determined datasets ✦ Computing Tiers and hierarchical Data Grid ★ map data flow and data handling functions to a hierarchical structure ➡ event data flows Tier-0 —> Tier-1 —> Tier-2, data being analyzed at Tier-2 ★ facilitates well-defined roles and visible responsibilities for centers ✦ Building ability to prioritize is very important ★ In 2007/8, computing system efficiency may not be 100%.... LATBauerdick/ Fermilab 8 ISGC 2006 — CMS Analysis May 2, 2006
f Computing Tiers Tier-0: Tier-1’s: Tier-2’s: ✦ Accepts data from � making samples accessible � User data Analysis DAQ for selection and distribution � MC production ✦ Prompt reconstruction � data-intensive analysis � Import skimmed ✦ Data archive and � re-processing datasets from Tier-1 distribution to Tier-1’s and export MC data � calibration � Calibration/alignment � FEVT, MC data archiving LATBauerdick/ Fermilab 9 ISGC 2006 — CMS Analysis May 2, 2006
f Computing Tiers CMS-CAF Analysis Facility at CERN LPC-CAF and other User Analysis Facilities � Access to 10% express stream and � Typically associated to Tier-1/2 centers eventually the full raw dataset � Provide interactive and batch analysis � Focused on latency-critical detector environment to users outside CERN trigger calibration and analysis � Sizes from “Tier-3” over “Tier-2s” to activities significantly large analysis facilities, e.g. � Provide some CMS central services at Fermilab and BNL (e.g. store conditions and calibrations) � Backbone for analysis infrastructure LATBauerdick/ Fermilab 10 ISGC 2006 — CMS Analysis May 2, 2006
f CMS Data Analysis at CMS-CAF ✦ LHC running time is precious ★ Require short latency feedback and fast turnaround: hours, not days ✦ fast, efficient monitoring of data quality, trigger quality ★ With ad-hoc study of detector data (special data streams) ★ With a few critical analysis that verify physics (masses, cross sections) ✦ Calibration and Alignment ★ Require fast turn-around for Tier-0 and (potentially) the online filter farm ✦ Physics Assurance and Analysis ★ Are we seeing something unexpected (background or signal) that calls for trigger adjustment now ? Rapid analysis of ‘ express-line ’ physics without initially having to rely on a fully functional and perfectly operating Grid. ✦ As the experiment matures, in 2010 and beyond, some CAF responsibilities can be farmed out to T1s or T2s, but not during the startup phase. LATBauerdick/ Fermilab 11 ISGC 2006 — CMS Analysis May 2, 2006
Data Tiers and f Data Volume for 2008 ✦ RAW ➡ Detector data + L1, HLT results after online formatting ➡ Includes factors for poor understanding of detector, compression, etc ➡ 1.5MB/evt @ ~150 Hz; ~ 4.5 PB/year (two copies) ✦ RECO ➡ Reconstructed objects with their associated hits ➡ 250kB/evt; ~2.1 PB/year (incl. 3 reprocessing versions) ➡ Supports pattern recognition, recalibration, Root-browsable, for interactive analysis ✦ FEVT=RAW+RECO ➡ ~1.75MB/event, to keep RAW and RECO together for data handling ➡ 1 copy at Tier-0 and one spread over all Tier-1 ’ s ✦ AOD ➡ The main analysis format; fragment of RECO for analysis: objects + minimal hit info ➡ 50kB/evt; ~2.6PB/year - whole copy at each Tier-1 ➡ shipped out to all T1s and on demand to T2 and laptops ✦ Should be inclusive so that all groups can use it. ✦ may allow some re-calibration and re-alignment (refit) ✦ Plus MC in ~ 1:1 ratio with data LATBauerdick/ Fermilab 12 ISGC 2006 — CMS Analysis May 2, 2006
f Event Data Model ✦ “Event” holds all data taken during triggered physics event ✦ Provenance of each Event Data Product is being tracked ✦ Persistent objects designed such that they provide useful analysis information without needing external information ✦ Event Tiers: FEVT contains RAW and RECO which includes AOD RECO/contains AOD Tracks Electrons Photons KtJets ConeJets … TracksExtra BasicClusters SuperClusters TracksHits Cells CaloTowers TrackDigis EcalDigis HcalDigis RAW (includes Digi TrackRaw EcalRaw HcalRaw For Tracking for E/Gamma For 4 Jets LATBauerdick/ Fermilab 13 ISGC 2006 — CMS Analysis May 2, 2006
Recommend
More recommend