cdf data production model cdf data production model
play

CDF Data production model CDF Data production model S. Hou S. Hou - PowerPoint PPT Presentation

CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production team for the CDF data production team 02 May 2006 02 May 2006 CDF data production model 2 Outline Outline Data streams - trigger, streaming,


  1. CDF Data production model CDF Data production model S. Hou S. Hou for the CDF data production team for the CDF data production team 02 May 2006 02 May 2006

  2. CDF data production model 2 Outline Outline Data streams - trigger, streaming, data logging Computing model - architecture, CAF linux farms, SAM data-handling Production tasks - submission, concatenation Monitoring and Bookkeeping - resource, file counting, recovery Scalability - capacity, limits, scaling options

  3. CDF data production model 3 CDF collaboration CDF collaboration Collider Detector experiment at the Fermilab Tevatron collider • Study proton-antiproton collisions at CM energy ~ 2 TeV • Large data volume, computing load

  4. CDF data production model 4 Trigger, detector data flow Trigger, detector data flow 3 Level trigger/data buffer 52 physics triggers CDF detector data taking capacity 2005 2006 Achieved upgrade Tevatron luminosity : 1.8x10 32 cm -2 s -1 3x10 32 cm -2 s -1 Level-1 acceptance : 27 kHz 40 kHz Level-2 acceptance : 850 Hz 1 kHz Event Builder (EVB) : 850X0.2 MB/s 500 MB/s Level-3 acceptance : 110 Hz 150 Hz to Tape storage rate : 20 MB/s 40 MB/s Event size : ~140 kByte ‘06 data taking rate ~ 5 M events/day Upgrade to improve DAQ efficiency

  5. CDF data production model 5 Streams, data logging Streams, data logging Consumer server/Logger (CSL) • receive physics events • write to disks in 8 streams • distribute to online consumers 8 Streams: A B C,D Data in 52 E,J triggers G H Consumers An event may have multiple triggers, Stream overlap ~ 5% increase with Tevatron luminosity

  6. CDF data production model 6 Data logging rate Data logging rate Data logging rate increase w. Tevatron luminosity Good-run physics data Feb 2002 - Dec 2004 1040 M events = 210 k files = 188 TByte Dec 2004 - Feb 2006 1270 M events = 172 k files = 159 TByte 1.6 fb -1 delivered Data logging rate by Tevatron up to Nov 2005 1.3 fb -1 in tape! 1.3 fb -1 of data written to tape

  7. CDF data production model 7 CDF computing model, CDF computing model, ‘ ‘06 06 CDF DAQ Production farm remote remote remote CAFs CAFs CAFs production production datasets datasets raw raw datasets datasets CDF Analysis Farm dCache Enstore User desk top

  8. CDF data production model 8 Computing network, ‘ ‘06 06 Computing network, Remote sites offline users Analysis Production Analysis Production farm farm farm farm 2Gbit 2Gbit 10Gbit CDF Online DAQ Enstore dCache File-servers Servers tape library file-servers Oracle DB

  9. CDF data production model 9 Production data flow Production data flow DataBase Split data in production sub-detector 8 raw data streams � 52 physics datasets Level-1,2 Final storage � Enstore tape library Trigger, DAQ STK 9940B drives 200 GB/tape Level-3 30 MByte/s read/write farm Steady R/W rate ~1TByte/drive/day 8 raw-datasets Run splitter dCache CAF, fileservers Calibration catalog 52 physics File datasets

  10. CDF data production model 10 st model Data production, 1 st model Data production, 1 In service 2000-2004 network MySQL,DB dfarm Register Direct I/O to Enstore tape input 1 1 run-splitter calibration stager • Custom I/O node to Enstore 2 2 FBS batch system • dfarm collection of all worker IDE Register output 3 3 buffer of input and output files Farm Processing system • MySQL for bookkeeping worker concatenated 4 4 • Concatenation in rigid event order 5 5 Register output truncated to 1 GB files Performance concatenator 6 6 • Peak rate at 1 TB input/day

  11. CDF data production model 11 SAM- -farm upgrade, farm upgrade, ‘ ‘05 05 SAM to CAF & SAM Data Handling to CAF & SAM Data Handling Toward a distributed computing infrastructure CAF (CDF Analysis Farm) � Condor system with CAF interface for job submission and monitoring � Advantage: - uniform platform to other CDF computing facilities - compatible to distributed computing development SAM Data handling system � SAM (Sequential Access via Metadata) file delivery and DB service � dCache virtualizes disk usage

  12. CDF data production model 12 SAM production farm SAM production farm CAF/SAM in parallel : network SAM,DB - SAM Project - Activating file delivery of an assigned SAM dataset 1 1 - Tracking file consumption status input-URL dCache 2 2 - Condor batch Job run-splitter calibration - Consume files in SAM project - update/declare SAM metadata for bookkeeping Declare/update worker 3 3 Concatenation of output metadata Merge output files output sorted in run sequence 4 4 merged Store to Enstore via SAM 5 5 fileserver Declare metadata, update file parentage for bookkeeping

  13. CDF data production model 13 P roduction challenge P roduction challenge � Timely process every event collected � Interface to Data-handing, DataBase, multiple CAF’s � Precision bookkeeping on millions of files zero tolerance to error, every event is counted � Operation Resource monitoring Automatic submission and monitoring 1. binary jobs of SAM projects on CAF farms 2. concatenation on Fileservers 3. store to SAM/Enstore � Service interface Network, Enstore tape I/O dCache, SAM Data handling, DB service CDF online, calibration DB, software

  14. CDF data production model 14 Use cases in production Use cases in production Fast beam-line calibration : immediately after data is available on Enstore Raw-data � Histograms � concatenation � Detector Monitoring : quick detector feedback and good-run definition immediate after beam-line is available Raw-data � production/Enstore � Histograms � Physics calibration : statistics required for chosen events Raw-data � Histograms � Physics production : - Raw-data � Multiple outputs � concatenation � Enstore - Production files � Single output � concatenation � Enstore � Multiple outputs � concatenation � Enstore

  15. CDF data production model 15 SAM projects in production SAM projects in production Cron jobs accessing SAM DB 1. Check online DB, make same SAM input data sets 2. Submit SAM projects to condor CAF 3. Merge output files and samStore on fileserver Input datasets Control metadata physics-datasets 1. gphysr_runXX gXcrs0 reco.gphysr_.. gphysr_runXX gXjs00 reco-children gphysr_… gX… Online DB raw merged SAM good_runs nextfile declare declare query Input dataset operation ProExe merge samStore node 2. 3.

  16. CDF data production model 16 Data handling in production Data handling in production Independent cron jobs on operation node/ fileservers 1. Submit a SAM project / CAF job, fetch files in input dataset 2. Concatenation on fileserver 3. samStore to Enstore 1. ProExe 2. merge 3. samStore /pnfs /dCache /samcache /pnfs dCache reco R/W condor merged CAF

  17. CDF data production model 17 Binary jobs on Condor worker Binary jobs on Condor worker Each CPU take one job � CAF headnode dispatch � ProdExe tarball (self contained with all libs, 140 MB) � Production script 1. Fetch one input file in assigned SAM dataset 2. Binary execution � split table � calibration CAF headnode 3. Declear split outputs Unpack Unpack 4. Copy to concatenation area SAM DB tarball tarball 5. Update bookkeeping Worker Worker 6. Cleanup dCache input Scratch Scratch area area Calibration DB ~4 hours per file (1 GByte GByte) ) ~4 hours per file (1 Output to Concatenation on 1 GHz P3 on 1 GHz P3

  18. CDF data production model 18 Concatenation / SAMstore Concatenation / SAMstore � Local on fileservers to reduce IDE, network bandwidth Independent from production submission 8 streams 1. SAM DB query, make files lists in order of a dataset, size varies 5MB to 1GB, output size ~1 GByte Production Production 2. Merge, rootd binary, ~3 min per GByte 52 datasets 52 datasets 3. SAM DB update, declared merged files � SAMstore merged files - Directly to Enstore - SAM DB update file parentage Challenge is on Bookkeeping Concatenation Concatenation - Plural SAM DB query SAMstore SAMstore - No data loss - No duplication � 100% exact in produciton � Easy recovery

  19. CDF data production model 19 R esource monitoring R esource monitoring CDF DB, SAM DB, Data-Handling CAF condor batch system Fileserver storage Prohibited jobs missing required services

  20. CDF data production model 20 CAF farm monitoring CAF farm monitoring Worker CPUs (Ganglia) & input (rcp) waiting Bandwidth limit : Input: Enstore loading to dCache Output: multiple workers to fileservers 1Gbit network port to IDE: 40 MB/s 1output dataset to Enstore: 30 MB/s Traffic to fileserver (xfs)

  21. CDF data production model 21 CAF condor monitoring CAF condor monitoring Tarball (archived execution binary file) distributed to worker CPUs Input files copied via SAM from dCache End of job, output files are copied to assigned fileserver CPU engagement is monitored CPU engagement is monitore d � CPU � CPU’ ’s of a CAF job s of a CAF job � CPU of a section � CPU of a section Commnads Commnads � executed now � executed now

Recommend


More recommend