high multiplicity multi jet merging with hpc technology
play

High-multiplicity multi-jet merging with HPC technology Stefan - PowerPoint PPT Presentation

High-multiplicity multi-jet merging with HPC technology Stefan Hoeche, Xiangyang Ju, Jim Kowalkowski, Stephen Mrenna, Tom Peterka, Stefan Prestel, Holger Schulz Physics Event Generator Computing Workshop, CERN, November 28, 2018 1/14


  1. High-multiplicity multi-jet merging with HPC technology Stefan Hoeche, Xiangyang Ju, Jim Kowalkowski, Stephen Mrenna, Tom Peterka, Stefan Prestel, Holger Schulz Physics Event Generator Computing Workshop, CERN, November 28, 2018 1/14

  2. Motivation arXiv:1409.8639 [hep-ex] Events 10 -1 10 ATLAS Data, s = 7 TeV, 4.6 fb W e (ALPGEN) → ν Pred sys 9 10 t t Pred sys ⊗ stat Other 8 10 Z → ee Multijets 7 10 W e (SHERPA) → ν 6 10 5 10 4 10 3 10 2 10 10 1.5 0 1 2 3 4 5 6 7 8 Pred / Data 1 0.5 0 1 2 3 4 5 6 7 8 N jets ◮ LHC experiments can see 8 jets ◮ High precision predictions for e.g. searches should reflect that ◮ Can we do this on HPC? 2/14

  3. Trials in (LO) ME level events 10 0 W+0j W+1j 10 1 W+2j W+3j 10 2 W+4j W+5j W+6j 1/ N d N /d N trials 10 3 W+7j W+8j 10 4 10 5 10 6 10 7 0 50000 100000 150000 200000 250000 300000 N trials ◮ Distribution of trials gets flatter with number of jets. ◮ Huge variation of Matrix Element (ME)-level compute time. ◮ Traditional Sherpa way of doing all in one go just does not scale. 3/14 (See also T. Childers et al. doi:10.1088/1742-6596/898/7/072044 )

  4. Our approach to event generation on HPC ◮ Use Sherpa to generate ME-level events (Les Houches like format) ◮ XML output is not a good solution for HPC machines ◮ Use HDF5 instead: Parallel write and read Binary storage of data, built-in compression ◮ Particle level event generation and merging with Pythia8 — we use ASCR’s DIY technology for MPI parallelisation here LHE ME level Sherpa, MPI HDF5 DIY: LHAPDF, Pythia8, Rivet , Particle level 4/14

  5. HDF5 storage Dataset data type Dataset data type nparticles int id int scale double status int aqcd double mother1 int . . . color1 int npLO double px double npNLO double . . . weight double lifetime double double double trials spin Table: Event properties Table: Particle properties Dataset data type start size t end size t Table: Lookup-table ◮ Trivial (parallel) storage of properties in 1D datasets of basic types ◮ Trivial (parallel) access by index, connection between event and particle properties by lookup table 5/14

  6. Technicalities ◮ Requirement: libhdf5 (apt-get / dnf install, standard on HPC) ◮ Header-only library HighFive github.com/BlueBrain/HighFive ◮ N.b. very nice python library h5py , works beautifully with numpy (used this initally to convert LHE XML files to hdf5 but this is quite cumbersome) ◮ Header-only library DIY used in particle level simulation http://diatomic.github.io/diy ◮ Computing model based on “blocks” ◮ Does all the low-level MPI communication for you 6/14

  7. W+jets example ◮ W+jets at √ s = 14 TeV simulation. ◮ Merging scale is at 20 GeV. ◮ The simulation is at leading order, the merging scheme is CKKW-L. ◮ ME-level event generation done at SLAC cluster of Xeon E5 CPUs. ◮ Particle level event generation on NERSC Cori using Haswell nodes. 0 1 2 3 4 5 6 7 8 N jets N events 65M 32M 16M 8M 4M 2M 1M 500k 250k HDF5 (9) [GB] 7.1 4.9 3.0 1.8 1 0.6 0.3 0.2 0.1 HDF5 (0) [GB] 26 16 9.1 5.2 2.9 1.9 1.2 0.62 0.25 ◮ Number of quarks limited to ≤ 6 for N jets = 6 , 7 ◮ Number of quarks limited to ≤ 4 for N jets = 8 7/14

  8. CPU cost analysis ◮ Process ME-samples with different jet multiplicities separately. ◮ Compare ME-level and particle level event simulation. ◮ Note that the measure is CPUh per 1M events Event generation cost for W+jets at 14 TeV 10 6 ME level 10 5 Particle level 10 4 CPUh/Mevt 10 3 10 2 10 1 10 0 10 − 1 0 1 2 3 4 5 6 7 8 10000 ME/Particle 1000 100 10 1 0 1 2 3 4 5 6 7 8 Number of additional jets 8/14

  9. Benefits ◮ The CPU expensive part of the simulation is stored in a parton-shower independent format. ◮ Running the particle level simulation now cheap in comparison, allows e.g. PDF re-weighting All sorts of variation studies Tuning and similar parameter space exploration ◮ Can think of a hybrid strategy for event generation: Do low multiplicity as per usual Generate higher multiplicities with this approach 9/14

  10. Jet rates Differential 0 → 1 jet resolution Differential 2 → 3 jet resolution d σ /d log 10 ( d 01 /GeV ) [pb] d σ /d log 10 ( d 23 /GeV ) [pb] 10 4 10 4 ∑ ∑ 0 j 0 j 10 3 10 3 1 j 1 j 10 2 2 j 10 2 2 j 3 j 3 j 10 1 10 1 4 j 4 j 5 j 5 j 1 1 6 j 6 j 7 j 7 j 10 − 1 10 − 1 8 j 8 j 10 − 2 10 − 2 1 1 Ratio Ratio 10 − 1 10 − 1 10 − 2 10 − 2 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 log 10 ( d 01 /GeV ) log 10 ( d 23 /GeV ) 10/14

  11. Jet rates Differential 4 → 5 jet resolution Differential 7 → 8 jet resolution d σ /d log 10 ( d 56 /GeV ) [pb] d σ /d log 10 ( d 78 /GeV ) [pb] 10 4 10 4 ∑ ∑ 0 j 0 j 10 3 10 3 1 j 1 j 10 2 2 j 10 2 2 j 3 j 3 j 10 1 10 1 4 j 4 j 5 j 5 j 1 1 6 j 6 j 7 j 7 j 10 − 1 10 − 1 8 j 8 j 10 − 2 10 − 2 1 1 Ratio Ratio 10 − 1 10 − 1 10 − 2 10 − 2 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 0 0 . 5 1 1 . 5 2 2 . 5 3 3 . 5 log 10 ( d 56 /GeV ) log 10 ( d 78 /GeV ) 11/14

  12. Scaling ◮ Scaling of pure particle level event generation for total samples ◮ Software stack compiled on NERSC Cori (gcc7.3), measurements done on Haswell nodes ◮ N.b. with 16 nodes (512 ranks): 15 minutes — with HepMC+Rivet as in plots above: 25 minutes W + 0 j W + 3 j W + 6 j W + 1 j W + 4 j W + 7 j 700 600 W + 2 j W + 5 j W + 8 j 500 400 300 t [s] 200 100 4 8 16 32 N nodes (32 ranks per node) 12/14

  13. Summary and outlook ◮ Prototype for relatively efficient merged LO W+8j event simulation workflow ◮ For pragmatic reasons: Sherpa for ME level event generation and Pythia8 for particle level simulation Store CPU expensive part (ME-level) on disk Particle level run-time up to 4 orders of magnitude faster than ME Main technologies used for parallelisation: DIY and HDF5 ◮ Although we use technology aimed at HPC architectures, the code runs well on laptops, clusters etc. ◮ Want to understand scaling better, investigate with vtune ◮ Look at Z+jets, higgs, ttbar next. ◮ Would a hybrid strategy for event generation be a good idea? 13/14

  14. Acknowledgement This material is based upon work supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of HEP, Scientific Discovery through Advanced Computing (SciDAC) program. 14/14

  15. Timing and memory usage (Sherpa 3.x.y + HDF5) LO ME level event generation only (Comix; γ, Z , h , µ, ν µ , τ, ν τ off) Process W + + 1j 2j 3j 4j RAM Usage 21 MB 43 MB 48 MB 85 MB Init/startup time < 1s / < 1s < 1s / < 1s 2s / < 1s 32s / < 1s Integration time 8 × 4m26s 16 × 16m42s 32 × 20m26s 64 × 1h32m MC uncertainty 0.22% 0.46% 0.89% 0.97% 6 . 59 · 10 − 3 7 . 50 · 10 − 4 2 . 71 · 10 − 4 1 . 47 · 10 − 4 Unweighting eff 10k evts 1m 2s 15m 5s 1h 3m 5h 56m � Xeon R � E5-2660 @ 2.20GHz Numbers generated on dual 8-core Intel R Process W + + 6j ∗ 7j ∗ 8j † 5j RAM Usage 189 MB 484 MB 1.32 GB 1.32 GB Init/startup time 3m5s / 1s 24m52s / 5s 3h6m / 18s 5h55m / 29s Integration time 128 × 4h38m 256 × 13h53m 512 × 19h0m 1024 × 23h8m MC uncertainty 1.0% 0.99% 2.38% 4.68% 9 . 56 · 10 − 5 7 . 66 · 10 − 5 7 . 20 · 10 − 5 7 . 51 · 10 − 5 Unweighting eff 10k evts 24h 40m 2d 11h 10d 15h 78d 1h � Xeon R � E5-2660 @ 2.20GHz Numbers generated on dual 8-core Intel R ∗ , † Number of quarks limited to ≤ 6/4

  16. Plans for NLO event generation ◮ For large class of processes, NLO fixed-order and MC@NLO agree well with each other and with MEPS@NLO ( ր e.g. plots below) ◮ Indicates best technical option: Store MC@NLO simulated events ◮ Pro: Parton-shower independent results ◮ Con: Restricted possibility for variations ◮ H+jet @ LHC 13 TeV ◮ Z+jet @ LHC 13 TeV 10 − 2 1 d σ / dp T , j [pb/GeV] S herpa +B lack H at d σ / dp T , j [pb/GeV] S herpa +B lack H at MEPS@NLO ( 1 ) 2 j MEPS@NLO ( 1 ) 2 j 10 − 1 10 − 3 MCNLO MCNLO NLO NLO 10 − 2 10 − 4 10 − 3 10 − 5 10 − 4 10 − 6 10 − 5 H+jet, √ s = 13 TeV H+jet, √ s = 13 TeV 10 − 7 10 − 6 anti- k T R= 0 . 7 anti- k T R= 0 . 8 Ratio to MEPS@NLO Ratio to MEPS@NLO 1 . 4 1 . 4 1 . 2 1 . 2 1 1 0 . 8 0 . 8 0 . 6 0 . 6 200 400 600 800 1000 1200 1400 1600 1800 2000 200 400 600 800 1000 1200 1400 1600 1800 2000 p T , j [GeV] p T , j [GeV]

  17. 10 2 W+0j W+1j 10 1 W+2j W+3j 10 0 W+4j W+5j 10 1 W+6j Frequency W+7j 10 2 W+8j 10 3 10 4 10 5 10 6 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 log 10 w

Recommend


More recommend