babar distributed computing
play

BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory - PowerPoint PPT Presentation

BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory Workshop 22 nd April 2005 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 1 Overview Foundations Tier-A Sites Data Distribution 22 nd April 2005 BaBar


  1. BaBar Distributed Computing Stephen J. Gowdy SLAC Super B-Factory Workshop 22 nd April 2005 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 1

  2. Overview ● Foundations ● Tier-A Sites ● Data Distribution 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 2

  3. Software Distribution ● All source is in CVS repository in AFS – Allows code to be seen anywhere in the world ● SoftRelTools/SiteConfig used to configure each site – Location of external software, compilers – Server names (Objectivity lock servers, etc.) ● UserLogin package to set up environment – More site customisation here (these modifications are not in CVS) 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 3

  4. Software Distribution (Cont.) ● “bin” package contains bootstrapping scripts – Installed at sites as $BFROOT/bin ● importrel used to import a BaBar Software Release – By default imports all architectures, can use importarch to only import selected platforms (would tell importrel to not import any) – Once local run “gmake siteinstall” to reconfigure the release for local site ● Should now be able to run applications as would at SLAC 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 4

  5. Eventstore ● Collection names are trivially mapped to first logical file name (LFN) /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00 /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root ● Mapping from LFN to physical file name via site specific configuration file – $BFROOT/kanga/config/KanAccess.cfg [yakut06] ~/reldirs/tstanalysis-24/workdir > KanAccess /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root root://kanolb-a:1094///store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.01.root – This one uses xrootd for access 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 5

  6. Eventstore (Cont.) ● xrootd used for production data access – Resilient against many failure modes – Very little overhead to disk IO – Now part of ROOT distribution ● Latest versions at http://xrootd.slac.stanford.edu From 15 th April 2005 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 6

  7. Eventstore (Cont.) ● Collections made up of different components – Generally two classes of files from production ● Micro – Header – User Data (ntuple-like information associated with particles) – (B)Tag Information (event level information) – Candidates (physics level reconstructed objects) – Analysis Object Data (AOD, detector level information) – Truth (if MC data) ● Mini – Event Summary Data (ESD) ● (Third class contains RAW and Simulation data) ● File names let you know what is in them /store/PR/R14/AllEvents/0004/02/14.3.1a/AllEvents_00040228_14.3.1aV00.02E.root 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 7

  8. Eventstore (Cont.) ● Production skimming done on data and Monte Carlo – Currently have 189 skims defined ● Vary a great deal in selection rate (<% of % to ~10%) – Each skim can decide to only be a pointer, deep copy the micro or deep copy the micro and mini ● All include the Tag, Candidates and User data ● Pointer skims require underlying production collections (not available at all sites) ● Deep copy skims expected to be more performant – Analysis runs on skims 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 8

  9. Bookkeeping ● RDBMS based system – Support Oracle and mySQL ● Knows about collections (Data Set Entity) ● Groups collections in to datasets – Analysis performed on datasets – Example datasets are; ● AllEvents-Run5-OnPeak-R18 (data) ● SP-998-Run4 (MC) ● Tool to mirror databases to different sites ● Have a key distribution system to allow off- site access to databases 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 9

  10. Simulation Production

  11. Tier-A Sites ● Currently have 6 Tier-A sites – SLAC (Prompt Calibration, analysis, simulation, skimming) – CC-IN2P3, France (analysis, simulation) – RAL (analysis, simulation) – Padova (Event Reconstruction, skimming, simulation) – GridKa, Germany (analysis, skimming, simulation) – CNAF, Italy (analysis) 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 11

  12. Tier-A Sites (Cont.) ● Tasks at each Tier-A site based on local expertise and needed level of resources ● Countries received a Common Fund rebate based on their resources contributed (50% of the cost saving at SLAC, the other 50% get distributed to all other countries) – Actual usage reported each six months to the International Finance Committee (funding agencies) 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 12

  13. Data Distribution ● Primarily method using Bookkeeping tools to do distribution – Sites can choose to import certain datasets ● Perhaps only the AOD, or the full AOD & ESD – Site can have a local database to remember which files have been imported – Bookkeeping tools warn users if they do not have all of the data locally ● All data import and export is via SLAC – Could set up other Tier-A sites for export – Cluster of servers 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 13

  14. Data Distribution (Cont.) ● Recently decided to allocate datasets to Tier-A sites based on Analysis Working Groups – Each AWG has a set of skims associated with it – All the skims for an AWG are put at one site 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 15

  15. Summary ● BaBar has a very productive Distributed Computing system ● For analysis users have an inconvenience of using a specific site (that they may not have used before) – In the future the “Grid” is forecast to solve this 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 16

  16. Backup Slides 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 17

  17. BbkDatasetTcl [yakut06] ~ > BbkDatasetTcl -l '*BSemiExcl-Run4-*R16a' BbkDatasetTcl: 7 datasets found:- BSemiExcl-Run4-OffPeak-R16a BSemiExcl-Run4-OnPeak-R16a SP-1005-BSemiExcl-Run4-R16a SP-1235-BSemiExcl-Run4-R16a SP-1237-BSemiExcl-Run4-R16a SP-3429-BSemiExcl-Run4-R16a SP-998-BSemiExcl-Run4-R16a [yakut06] ~ > BbkDatasetTcl 'BSemiExcl-Run4-*R16a' BbkDatasetTcl: wrote BSemiExcl-Run4-OffPeak-R16a.tcl (7 collections, 1300477/132941301 events, ~9990.6/pb) BbkDatasetTcl: wrote BSemiExcl-Run4-OnPeak-R16a.tcl (73 collections, 22851621/1448776065 events, ~99532.6/pb) Selected 80 collections, 24152098/1581717366 events, ~109523.1/pb, from 2 datasets 22 nd April 2005 BaBar Distributed Computing - S. J. Gowdy 18

  18. SLAC Usage Disk space Batch time + Dedicated CPU – Extra disk space originally made available for CM2 conversion, ~80 TB to be freed of old Kanga+Objy – SLAC CPU time is a mix of dedicated and batch use

  19. IN2P3 Usage Disk space Batch time – Note: IN2P3 uses dynamic staging system (HPSS) – Batch utilization has come back strong after decline last summer

  20. RAL Usage Disk space Batch time – Actual disk space slightly exceeding 2004 MOU – Batch use peaked in Oct, recent drop the effect of transitioning away from old Kanga

  21. INFN Usage Disk space Dedicated CPU + Batch time – Disk already above 2004 MOU, including CNAF – Dedicated CPU reached 2004 MOE in Dec 2004, analysis started to add to that

  22. GridKa Usage Disk space Batch time – Disk space reached MOU in mid2004 – CPU usage continued positive trend of 1 st half of 2004 – With analysis use, has peaked above MOU level

Recommend


More recommend