belle ii computing
play

Belle II computing Ikuo UEDA (KEK IPNS) The 40th Anniversary - PowerPoint PPT Presentation

1 Belle II computing Ikuo UEDA (KEK IPNS) The 40th Anniversary Symposium of the US-Japan Science and Technology Cooperation Program in High Energy Physics April 15, 2019 @ University of Hawaii 2 JFY 2012 - 2015 Project Completed Japan:KEK +


  1. 1 Belle II computing Ikuo UEDA (KEK IPNS) The 40th Anniversary Symposium of the US-Japan Science and Technology Cooperation Program in High Energy Physics April 15, 2019 @ University of Hawaii

  2. 2 JFY 2012 - 2015 Project Completed Japan:KEK + US:PNNL Establishment of a remote data center for acceleration of Belle II data center Belle II データ再プロセス高速化のためのリモート・データセンターの設立 JFY 2016 - 2018 Project Completed Japan:KEK + US:PNNL  BNL(since 2018) Development of a scalable and automatized production system for the Belle II experiment (US side research title : 2016) Automatized Production System for Belle II (resarch title was renamed : 2017 or later : funded by the Japan side only) Belle II 実験における拡張性を考慮した自動化プロダクション・システムの開発 JFY 2019 (- 2020) Project Applied Japan:KEK + US:PNNL Hiding Data Access Times in HEP Distributed Workflow

  3. 3 JFY 2012 - 2015 Project Completed Japan:KEK + US:PNNL Establishment of a remote data center for acceleration of Belle II data center Belle II データ再プロセス高速化のためのリモート・データセンターの設立 JFY 2016 - 2018 Project Completed Japan:KEK + US:PNNL  BNL(since 2018) Development of a scalable and automatized production system for the Belle II experiment (US side research title : 2016) Automatized Production System for Belle II (resarch title was renamed : 2017 or later : funded by the Japan side only) Belle II 実験における拡張性を考慮した自動化プロダクション・システムの開発 JFY 2019 (- 2020) Project Applied Japan:KEK + US:PNNL Hiding Data Access Times in HEP Distributed Workflow

  4. 4 Purposes JFY 2012 - 2015 Establishment of a remote data center for acceleration of Belle II data center Goal : Acceleration of the speed of the Belle II data reprocessing by establishing the remote data center in U.S.A. to trigger the Belle II computing activity in U.S.A. to let the KEK computing resource concentrate on RAW data process to reduce the risk of data loss in unexpected contingency to develop human resources for computing and middleware JFY 2016 - 2018 Automatized Production System for Belle II Goal : Integration of the scalable and automatized production system to the Belle II experiment to reduce the burden on expert time and chance of human errors to control complicated and different types of jobs smoothly and effectively to deliver physics data to users as soon as data-taking finishes

  5. 5 Belle II Computing Model RAW data + produced mDST Skim uDST MC mDST end of year 3 Raw data Detector mdst Data mdst MC inputs for dashed KEK Data Center Data Center in US udst Ntuple raw data storage and (re)process CPU Raw Data Center Disk mdst storage Tape GRID sites Asia Europe site B Storage for original + copy Physics analysis skim MC production and Storage for copy Regional Data Center Temporary storage Computer cluster sites HPC sites Cloud sites GRID sites MC production site user analysis (Ntuple level) Local resource

  6. CE Web Portal FTS Fabrication System LFC RMS SE SE SE Cluster VMDIRAC Cluster local I/O remote I/O CVMFS Distributed Data Management System CE Production Management System AMGA Monitoring DMS WMS CE: grid computing element SE : grid storage element DIRAC slave VCYCLE Cloud I/F Client Tools 6 Belle II Distributed Computing Structure Human Production Manager Data Manager End Users BelleDIRAC Software interface v4r6p0 + Interware extention + Analysis user interface Interware + management system v6r20p26 GRID services Cyberinfrastructure for Belle II + Services cloud Sites Platform site cloud + GRID Middleware } site + OS + Hardware Infrastructure cloud + Network site 2017.Dec.13. Computing in HEP - Ueda I.

  7. 7 Automatized Production System Manual operation Automatic operation Fast 24/7 Data delivery Safe Production manager KEK+PNNL(BNL) (human) Different types of production define the project for: Project management - raw data process - simulation MC production (w/ or w/o BG) system - user analysis Skim production create a project for: - raw data process RAW data process - simulation - user analysis KEK KEK Huge variety of modes Data quality system BB, udsc, signal, background Fabrication system Distributed data management system (DDM) verify that outputs can be used control jobs for: many physics skims control the data management in physics analysis - raw data process - simulation - bulk replication - feedback to production manager (human) - user analysis - bulk deletion Complicated data management PNNL(BNL) over world-distributed sites Nagoya+Niigata Verification system Monitoring system Reduce human error and verify that tasks are correctly finished check the jobs/network status KEK - feedback to Fabrication & - feedback to Fabrication & perform effective operation Distributed data management Distributed data management - sending problematic status to GOCDB

  8. 8 Research Highlight : One page summary Proto-Production system Normalized CPU power (Automatic Data distributed) (kHS06) 11th 300 10th (ongoing) Proto-Production system Full-Production system 9th (Automatic job submission Automatic Issue detection [monitor])) 8th 200 Manual job submission 5th US is increasing 7th (no automated Production system) resources 4th Coninuous operation gradually 6th 100 running various 3rd types of productions 2nd 1st US-Japan project (JYF 2012-2015) US-Japan project (JYF 2016-2018) US joined since 2013 Establish the data center in US

  9. 9 JFY 2012 - 2015 Project Completed Japan:KEK + US:PNNL Establishment of a remote data center for acceleration of Belle II data center Belle II データ再プロセス高速化のためのリモート・データセンターの設立 JFY 2016 - 2018 Project Completed Japan:KEK + US:PNNL  BNL(since 2018) Development of a scalable and automatized production system for the Belle II experiment (US side research title : 2016) Automatized Production System for Belle II (resarch title was renamed : 2017 or later : funded by the Japan side only) Belle II 実験における拡張性を考慮した自動化プロダクション・システムの開発 JFY 2019 (- 2020) Project Applied Japan:KEK + US:PNNL Hiding Data Access Times in HEP Distributed Workflow

  10. 10 MC production jobs “Jobs go to Data”  requires storage at each site MC job Data distributed w/ BG before submitting jobs BG files computing site with Storage Jobs access data on local storage Issue : Inefficient use of compute resources without local storage

  11. 11 MC production jobs “Remote Data Access” “Jobs go to Data”  enables conribution of compute resources w/o local GRID storage Storage-server-less MC job computing site w/ BG necessary BG files from other sites BG files computing site with Storage Issue : Time consumed in Remote Accesses

  12. 12 Belle II computing sites GRID sites Computer cluster sites KEK, BNL, DESY, Many Universities in GridKA, KISTI, CNAF, Japan, Korea, many European sites India, China, Russia, ~30 sites : ~75% Mexico, ~25 sites : ~10% Cloud sites Large contribution from Univ. of Victoria, Compute Resources Univ. of Melbourne w/o local GRID storage several sites : ~15%

  13. 13 Remote Data Access - Download  copying whole files unnecessarily  CPU idle during download existing remote I/O technique - Direct I/O ( e.g. xrootd)  chaotic remote accesses can be inefficient - Organized Streaming (TAZeR : T ransparet A synchronous Z ero-copy R emote I/O ) support from DOE as a part of  I/O optimization with pre-fetching to memory “Integrated End-to-end Performance  Intelligent job scedhuling Prediction and Diagonosis” Execution time [min] Download Direct I/O TAZeR 500min 30min 200min exp time Network read time Network read time

  14. 14 Applying TAZeR to Belle II TAZeR  Hiding netowkr and I/O latencies with - I/O optimization - Intelligent job scheduling Belle II  Efficient use of compute resources without local GRID storage Proposed project : “Hiding Data Access Times in HEP Distributed Workflow” To icrease throughput of Belle II Monte Carlo simulations To identify the conditions under which TAZeR improves HEP workflow

Recommend


More recommend