Portable Monte Carlo Transport Performance Evaluation in the PATMOS - PowerPoint PPT Presentation

Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype Tao CHANG 1 DEN-Service d’Etudes des R´ eacteurs et de Math´ ematiques Appliqu´ ees (SERMA) November 27, 2019 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 1/35

Outline Introduction 1 Monte Carlo Neutron Transport PATMOS Objective Implementations 2 Tests 3 Conclusions 4 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 2/35

Monte Carlo Neutron Transport In the nuclear field, Monte Carlo (MC) simulation is widely used to compute physical quantities such as: density of particles reaction rates fission power ... List of MC codes: TRIPOLI-4 � (CEA, France) MCNP-5 (LANL, USA) OpenMC (MIT, USA) SERPENT (VTT, Finland) RMC (Tsinghua, China) ... Credit: ANS Nuclear Cafe Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 3/35

Monte Carlo Neutron Transport The Monte Carlo transport codes simulate the life of a particle from birth to death A succession of transports and collisions Advantages: ∗ precision, few approximations complex geometries ∗ Drawbacks: ∗ high computational cost Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 4/35

Monte Carlo Neutron Transport Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 5/35

Monte Carlo Neutron Transport Cross section Address the interaction probability of the particle with the different nuclides composing the material Pre-tabulated method (load precalculated total cross sections at (E, T)) On-the-fly Doppler Broadening method (calculate cross sections at (E, T) before each random flight) Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 6/35

Monte Carlo Neutron Transport Run time percentage Total macroscopic cross section is the most consuming part Processing Step Run Time Percentage (%) Total Cross Section 95.4 17.6 exp 49.4 erfc 2.4 binary search 79.2 compute integral Partial Cross Section 1.7 0.2 exp 0.6 erfc 0.1 binary search 1.4 compute integral Initialization 1.8 1.5 buildMedium Others 1.1 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 7/35

PATMOS A prototype dedicated to the testing of algorithms for high performance computations on modern architectures Prepare next generation of TRIPOLI Written in C++ A subset of neutron physics is implemented but representative for performance analysis Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 9/35

PATMOS A prototype dedicated to the testing of algorithms for high performance computations on modern architectures Prepare next generation of TRIPOLI Written in C++ A subset of neutron physics is implemented but representative for performance analysis Hybrid parallelism: MPI + OpenMP + GPU offload GPU version written in CUDA Only the microscopic cross section calculation is offloaded Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 9/35

Objective The implemented CUDA version in PATMOS is not ”portable” as it is only for Nvidia GPU A variety of architectures to address: Many-core: Intel Xeon Phi Arm Heterogeneous architecture Intel + Nvidia GPU OpenPower + Nvidia GPU AMD + GPU ... Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 11/35

Objective The implemented CUDA version in PATMOS is not ”portable” as it is only for Nvidia GPU A variety of architectures to address: Many-core: Intel Xeon Phi Arm Heterogeneous architecture Intel + Nvidia GPU OpenPower + Nvidia GPU AMD + GPU ... Develop portable codes on a large variety of architectures Evaluate the different programming models in terms of performance of implemented benchmark Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 11/35

Outline Introduction 1 Implementations 2 Programming Model Algorithms Benchmark Tests 3 Conclusions 4 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 12/35

Programming Model Only consider intra-node parallelism OpenMP thread + { X } { X } can be any languages or libraries which are capable of parallel programming on modern architectures, such as: Low-level: CUDA High-level: OpenACC OpenMP Kokkos SYCL Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 13/35

Algorithms Algorithm 1: History-based algorithm Each MPI Rank foreach batch or generation do initialize particle state from source; OpenMP Thread Level foreach particle in batch do while particle is alive do calculation of macroscopic cross section: • do microscopic cross section lookups ⇒ offloaded ; • sum up total cross section; sample distance, move particle, do interaction; end end end Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 15/35

Algorithms Algorithm 2: Microscopic cross section lookup Input: randomly sampled a group of N tuples of materials, energies and temperatures, { ( m i , E i , T i ) } i ∈ N Result: caculated microscopic cross sections for N materials, { σ ik } i ∈ N , k ∈| m i | CUDA Threadblock Level #pragma acc parallel loop gang or #pragma omp target teams distribute for (n ik , E i , T i ) where n ik ∈ m i do σ ik = pre calcul () ; CUDA Thread Level #pragma acc loop vector or #pragma omp parallel for foreach thread in warp do σ ik += compute integral () ; end end Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 16/35

Algorithms History-based (HB) algorithm on GPU: Too many small data transfers Many memcpy calls Small kernel Tuning solutions: Reduce memcpy calls, enlarge kernel size A new method called pseudo event-based (PEB) algorithm Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 17/35

Algorithms Algorithm 3: Pseudo event-based algorithm Each MPI Rank foreach batch or generation do initialize particle state from source; OpenMP Thread Level foreach bank of N particles in batch do while particles remain in bank do foreach remaining particle in bank do bank required data; end • do microscopic cross section lookups ⇒ offloaded ; foreach remaining particle in bank do • sum up total cross section; sample distance, move particle, do interaction; end end end end Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 18/35

Benchmark slabAllNulides Fixed source MC simulation Slab geometry 10,000 volumes, 900K each material ⇒ 355 nuclides main components: H1 and U238 Pressurized Water Reactor (PWR) spectrum On-the-fly Doppler broadening method Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 20/35

Outline Introduction 1 Implementations 2 Tests 3 Parameters Results CUDA Profiling Conclusions 4 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 21/35

Parameters Machine Ouessant: 2 × 10-core IBM Power8, SMT8 + 4 × Nvidia P100 (GENCI IDRIS) Cobalt-hybrid: 2 × 14-core Intel Xeon E5-2680 v4, HT2 + 2 × Nvidia P100 (CEA-CCRT) Cobalt-V100: 2 × 20-core Intel Skylake + 4 × Nvidia V100 (CEA-CCRT) slabAllNuclides Inputs : 20,000 particles, 10 cycles, 100 as bank size Outputs : particles/sec (higher is better) Environment GCC Intel Compiler PGI XLC CUDA 7.3.0 18.10 16.1.0 9.2 Ouessant 7.1.0 17.0.6 18.7 9.0 Cobalt-hybrid Cobalt-V100 7.1.0 17.0.6 18.7 9.2 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 22/35

Outline Introduction 1 Implementations 2 Tests 3 Parameters Results CUDA Profiling Conclusions 4 Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype November 27, 2019 23/35

Portable Monte Carlo Transport Performance Evaluation in the PATMOS - PowerPoint PPT Presentation

Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype Tao CHANG 1 DEN-Service dEtudes des R eacteurs et de Math ematiques Appliqu ees (SERMA) November 27, 2019 Portable Monte Carlo Transport Performance

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Introduction to Monte Carlo Method Andrzej Palczewski and Jan Palczewski Introduction to Monte

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

An Agent-Based Boom-Bust Business Cycle Model with Search-for-Yield and Heterogeneous

Kevin McLaughlin Outline Advance of Fab technologies and the evolution of raw materials for

Monte Carlo simulation for a doubly nonlinear problem in finance Lokman Abbas-Turki First part

Proving the Convergence of Monte Carlo Tree Search to Brownian Motion Elana Kozak United States

Machine learning techniques in predicting uncertainty of environmental models Dimitri Solomatine

Sequential Monte Carlo Methods for State and and Parameter Estimation (with application to ocean

Statistical Thermodynamics of Polymers with a Biophysics Emphasis Continued development of

Aim Provide a strategic overview of how simulation can enhance individual training scheduling

Portable Monte Carlo Transport Performance Evaluation in the PATMOS - PowerPoint PPT Presentation

Portable Monte Carlo Transport Performance Evaluation in the PATMOS Prototype Tao CHANG 1 DEN-Service dEtudes des R eacteurs et de Math ematiques Appliqu ees (SERMA) November 27, 2019 Portable Monte Carlo Transport Performance

Monte Carlo Generators Monte Carlo Generators Monte Carlo Generators QCD Lecture III P .

Monte Carlo Methods Guojin Chen Christopher Cprek Chris Rambicure Monte Carlo Methods 1.

Monte Carlo Approximation of Monte Carlo Filters Adam M. Johansen et al. Collaborators Include:

PC PORTABLE PC PORTABLE PC PORTABLE Introducing the PC Portable Lamp, one of a range of

BROCHURE 2019 TETRA JUICES DEL MONTE DEL MONTE 6 x 1L GOLD PINEAPPLE 6 x 1L 6 x 1L 6 x 1L

Chapter 5: Monte Carlo Methods Monte Carlo methods are learning methods Experience

Draft Introduction to (randomized) quasi-Monte Carlo Pierre LEcuyer MCQMC Conference,

Monte Carlo Estimation 7 January 2019 OSU CSE 1 Monte Carlo Methods Class of computational

Monte Carlo Localization Ximing Yu March 24, 2009 Ximing Yu Monte Carlo Localization 1

Monte Carlo Control CMPUT 366: Intelligent Systems S&amp;B 5.3-5.5, 5.7 Lecture Outline 1.

4. THE MONTE CARLO METHOD 4.1 I ntroduction This chapter is aimed at describing the Monte Carlo

Monte Carlo methods for volumetric light transport Monte Carlo methods for volumetric light

Techniques in Artificial Intelligence - Part I Todd W. Neller Gettysburg College Monte Carlo

Introduction to Monte Carlo Method Andrzej Palczewski and Jan Palczewski Introduction to Monte

Draft 1 Density estimation by Monte Carlo and randomized quasi-Monte Carlo (RQMC) Pierre

Markov Chain Monte Carlo Methods Michel Bierlaire michel.bierlaire@epfl.ch Transport and

An Agent-Based Boom-Bust Business Cycle Model with Search-for-Yield and Heterogeneous

Kevin McLaughlin Outline Advance of Fab technologies and the evolution of raw materials for

Monte Carlo simulation for a doubly nonlinear problem in finance Lokman Abbas-Turki First part

Proving the Convergence of Monte Carlo Tree Search to Brownian Motion Elana Kozak United States

Machine learning techniques in predicting uncertainty of environmental models Dimitri Solomatine

Sequential Monte Carlo Methods for State and and Parameter Estimation (with application to ocean

Statistical Thermodynamics of Polymers with a Biophysics Emphasis Continued development of

Aim Provide a strategic overview of how simulation can enhance individual training scheduling

Monte Carlo Control CMPUT 366: Intelligent Systems S&B 5.3-5.5, 5.7 Lecture Outline 1.