huf 2017 kek site report
play

HUF 2017 KEK site report Share Our Experience Koichi Murakami - PowerPoint PPT Presentation

HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1 Oct/19/2017 HUF 2017 KEK, TSUKUBA KEK Diversity in accelerator based sciences High Energy Accelerator Research Organization Pursuing


  1. HUF 2017 KEK site report Share Our Experience Koichi Murakami (KEK/CRC) HUF 2017 KEK, Tsukuba 1 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  2. KEK Diversity in accelerator based sciences High Energy Accelerator Research Organization Pursuing fundamental laws of nature Pursuing origin of function in materials Material science Basic science and its applications Photon factory X-ray as a probe KEK Technical development 技術の波及 and its applications J-PARC MLF neutron and m Superconducting as a probe Energy recovery linac accelerator T2K neutrino exp. SuperKEKB and Belle II Accelerator- based BNCT COMET J-PARC Hadron hall 4 2 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  3. Super KEKB / Belle II SuperKEKB/Belle II is 40 times more powerful machine compered to the previous B factory experiment, KEKB/Belle. Integ. Lum. (ab -1 ) Assumptions: Goal 9 months / year 20 days / month 3 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  4. KEKCC 2016 System Resources GPFS Disk HSM Belle II front-end disk IBM ESS x 8 CPU : 10,024 cores 10 PB p Intel Xeon E5-2697v3 (2.6GHz, 14cores) x 2 358 nodes p 4GB/core (8,000 cores) / 8GB/core (2,000 cores) (for app. use) p 236 kHS06 / site IBM ESS IBM TS3500 / TS1150 HSM Cache Disk 600 TB Tape : 70 PB (max) DDN SFA14K 3PB Disk : 10PB (GPFS) + 3PB (HSM cache) IB 4xFDR Interconnect : IB 4xFDR 10 GbE Tape : 70 PB (max cap.) HSM data : 11 PB data, 220 M files, Work / Batch Servers SX6518 5,200 tapes Nexsus 7018 40 GbE Lenovo NextScale 358nodes Grid EMI servers 10,024 cores Total throughput : 100 GB/s (Disk, GPFS), 55TB memory Lenovo x3550 M5 50 GB/s (HSM, GHI) 36 nodes FW SRX3400 Belle II front-end servers JOB scheduler : Platfrom LSF v10.1 Lenovo x3550 M5 5 nodes Facility Tour on Friday 4 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  5. HSM system HPSS/GHI servers TS3500 GPFS (GHI) : 3 PB DDN SFA 12K Total throughput : > 50 GB/s 5 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  6. Tape System Tape Library p IBM TS3500 (13 racks) p Max. capacity : 70 PB Tape Drive p TS1150 : 54 drives p TS1140 : 12 drives (for media conversion) IBM TS3500 Tape Media p JD : 10TB, 360 MB/s p JC5 : 7TB, 300 MB/s (reformatted) p JC4 : 4TB, 250 MB/s p Reformatting was done in background for 10 months (expected). p Users (experiment groups) pay tape media they use. 6 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  7. GHI, GPFS + HPSS : The Best of Both Worlds HPSS p We have used HPSS as HSM system for last 15+ years. p 1 st layer : GGPS DDN 3PB + 2 nd layer : IBM Tape GHI, GPFS + HPSS p GPFS parallel file system staging area p Perfect coherence with GPFS access (POSIX I/O) p KEKCC is the pioneer of GHI customers (since 2012). p Data access with high I/O performance and good usability. p Same access speed as GPFS, once data staged p No HPSS client API, no changes in user codes p small file aggregation helps tape performance for small data 7 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  8. New system configuration params. Component Qty. Software Version HPSS 1 Core Server HPSS 7.4.3 p2 efix1 HPSS 4 Disk Mover GHI 2.5.0 p1 HPSS 3 Tape Mover GPFS 4.2.0.1 Mover Storage 600 TB OS RHEL 6.7 (HPSS nodes) Max. #Files 2 Billion OS RHEL7.1 GHI IOM 6 (GHI nodes) GHI Session 3 Server 8 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  9. Data Processing Cycle Raw data Requirements for storage system p Experimental data from detectors, transferred to p High availability storage system in real-time. (considering electricity cost for operating acc.) p 2GB/s, sustained for Belle II experiment p Scalability up to hundreds PB p x5 the amount of simulation data p Data-intensive processing w/ high I/O performance p Migration to tape, processing to DST, then purged p Hundreds MB/s I/O for many concurrent accesses (Nx10k) from jobs p “Semi-Cold” data (tens to hundreds PB) p Local jobs and GRID jobs (distributed analysis) p Reprocessed sometimes p Data portability to GRID services (POSIX access) DST ( Data Summary Tapes ) p “Hot data” ( ~ tens PB) p Data processing to make physics data p Data shared with various ways (GRID access) Physics summary data p Handy data set for reducing physics results (N- tuple data) 9 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  10. system improvements (1) Separated GPFS clusters ☐ GPFS disk system (10PB) and GHI GPFS system (3PB) ☐ using GPFS remote cluster mount ☐ ITO stability and system management (maintenance, updates,..) COS supports mixed media types. ☐ Can mix different types of tape media as RW COS ☐ JB/JC/JD Purge policy changed for small size files ☐ number of small files is huge, but less impact on disk space ☐ do not purge small size of files ☐ < 8 MB to < 40 MB / 100 MB ☐ depends on file size distributions in file system 10 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  11. system improvements (2) Improve GHI migration ☐ old : listing all migration files, then migrate one time: ☐ Single migration requests for >100 k files overflows the hpss queues, migration stalled. ☐ new : migration by 10k files with ghi_backup 11 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  12. system migration works HSM service on the old system ☐ 3-days downtime for system migration (backup of the current / restore in the new) ☐ Keep GPFS disk mount (read-only) for 2 weeks before the new system ☐ Only staged data on disk is accessible. System migration ☐ 8.5 PB data, 170 M files, 5,000 tapes ☐ 3-days work on Aug / 15 – 17, 2016 ☐ Move physical tapes from the current to new tape library ☐ DB2 migration using QRep ☐ GHI backup and restore ☐ Staging is necessary in the new system. ☐ Admin staging for important data Take checksum for tape data ☐ 6 months work for higher priority data ☐ Taken directly from tapes (tape-ordered, htared file for small files, as hpss file) ☐ 200 MB/s in average, 4,000 vols. ☐ Store checksum and timestamp into GPFS UDA 12 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  13. Operational issues (HpSS/GHI) “Overload request on staging” ☐ All data is in the status of purged in the new system due to system migration. ☐ We did not take system downtime for data staging. ☐ data staging in operation : both admin and user staging ☐ GHI staging priority ☐ Initially : user staging > admin staging (ghi_stage, tape-ordered) ☐ Admin staging was not processed.-> user staging piled up ( Bad spiral ) ☐ Heavy load staging process ☐ hits bugs!! ☐ identify some bad points ☐ patches applied ☐ Thoughts on data migration ☐ enable D2D migration for staged data, late biding D/T data? ☐ Runtime conversion between GPFS and HPPSS could help (3.0.0) 13 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  14. review on staging performance CPU usage Belle Belle2 Grid Had T2K CMB ILC Others What is the bottle neck? 250K Staging performance in long-term view 200K Sep CORES*DAYS 150K ☐ Sep – Dec / 2016 100K ☐ staged files / min. (hourly averaged, GHI) 50K ☐ We can see : 2016/04 2016/05 2016/06 2016/07 2016/08 2016/09 2016/10 2016/11 2016/12 2017/01 2017/02 2017/03 ☐ spikes of admin staging (HPSS cache -> GHI, thousands files /min) ☐ Continuous staging important data for three months (Sep-Nov) ☐ Low staging performance in some periods (< 5/min , see next ) 40 Stage Speed fs01 fs02 SUM 40 35 30 n i m 25 / e l i 20 F f 15 o # 10 5 0 1 8 5 2 9 6 3 0 7 3 0 7 4 1 8 5 2 9 0 0 1 2 2 0 1 2 2 0 1 1 2 0 0 1 2 2 Sep Oct Nov Dec / / / / / / / / / / / / / / / / / / 9 9 9 9 9 0 0 0 0 1 1 1 1 2 2 2 2 2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 / / / / / / / / / / / / / / / / / / 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 14 Oct/19/2017 HUF 2017 KEK, TSUKUBA

  15. Library Accessor performance ☐ We have 54 TS1150 drives, but... ☐ Tape mounts are limited up to 4 tapes / min. ☐ TS3500 Library accessor spec. : 15 sec / (u)mount. –> 60 / 15 = 4 tapes /min. ☐ well consistent with observation ☐ ~ 4 files / min staging in case of continuous requests on different tape medias Tape mounts / min 4 Mount Speed 4 3.5 ) 3 n i 2.5 m / t 2 n u o 1.5 m ( 1 0.5 0 0 4 8 2 6 0 0 4 8 2 6 0 0 4 8 2 6 0 0 0 0 0 1 1 2 0 0 0 1 1 2 0 0 0 1 1 2 0 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 / / / / / / / / / / / / / / / / / / / 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TS1140 TS1150 SUM 15 Oct/19/2017 HUF 2017 KEK, TSUKUBA

Recommend


More recommend