PIC codes in the HPC environment PIC codes in the HPC environment - A. Beck SMILEI training workshop 1 / 35
Structure HPC environment, trends and prospectives 1 The PIC method and its parallelization 2 The load balancing issue 3 PIC codes in the HPC environment - A. Beck SMILEI training workshop 2 / 35
Structure HPC environment, trends and prospectives 1 The PIC method and its parallelization 2 The load balancing issue 3 PIC codes in the HPC environment - A. Beck SMILEI training workshop 3 / 35
What is a super computer ? Compute node Compute node Compute node Compute node Network Distributed computing PIC codes in the HPC environment - A. Beck SMILEI training workshop 4 / 35
What is a super computer ? Compute node Memory Compute unit Compute node Compute node Compute node Network Distributed memory system PIC codes in the HPC environment - A. Beck SMILEI training workshop 5 / 35
What is a super computer ? Compute node Core Core Memory Core Core Compute node Compute node Compute node Network Distributed {shared memory} system PIC codes in the HPC environment - A. Beck SMILEI training workshop 6 / 35
Objective exaflop/s Tianhe (China, June 2013) : 31 PFLOPS for 17 MW gives 1.85 GFLOPS/W. Extrapolation : 1000 PFLOPS ==> 540 MW ! = ? = The objective is P < 20 MW The challenge for constructors is to increase both total performance and energy efficiency of computing nodes. PIC codes in the HPC environment - A. Beck SMILEI training workshop 7 / 35
Constructors strategy : 1) Many core Increased performances Reasonable energy budget PIC codes in the HPC environment - A. Beck SMILEI training workshop 8 / 35
Constructors strategy : 2) GPGPU NVIDIA & AMD :General Purpose Graphical Processor Unit Most energy efficient architecture today Difficult to adress : Libraries : Cuda, OpenCl. Directives programming : OpenMP 4 ou openACC. PIC codes in the HPC environment - A. Beck SMILEI training workshop 9 / 35
Constructors strategy : 3) Xeon Phi Intel Powers several top HPC systems. + Irene (France) - Aurora (U.S) Supposedly accessible through “Normal” programming but relies criticaly on the SIMD instruction set. PIC codes in the HPC environment - A. Beck SMILEI training workshop 10 / 35
Constructors strategy : 4) China Architecture SunWay Most powerful system in the world : 93 PFLOPS. 15 MW The SunWay architecture mimicks Xeon Phi. PIC codes in the HPC environment - A. Beck SMILEI training workshop 11 / 35
Constructors strategy : 5) Vectorization Excellent potential speed up, very good power budget. Heavy constraints on data structure and algorithm. Difficult to use at its full extent in a PIC code. PIC codes in the HPC environment - A. Beck SMILEI training workshop 12 / 35
Official announcments for Exascale U.S. : Exascale for 2021. No specifications. Japan : “Post K Supercomputer”. EFLOPS for 2020. Architecture ARM. China : 3 exascale systems for 2020. Europe : 2 Exascale systems for 2022. At least 1 powered by European technology (probably ARM). PIC codes in the HPC environment - A. Beck SMILEI training workshop 13 / 35
Why am I concerned ? What should I do ? As a developer Expose parallelism. Massive parallelization is key. 1 Focus on the algorithm and data structures. Not on architectures. 2 Reduce data movement : Computation is becoming cheaper, loads and 3 stores not so much. Be aware of the increasing gap between peak power and effective 4 performances. The race to exascale is becoming a race to exaflops. As a scientist Collaborate with experts : complexity of HPC systems increases a lot ! 1 PIC codes in the HPC environment - A. Beck SMILEI training workshop 14 / 35
Structure HPC environment, trends and prospectives 1 The PIC method and its parallelization 2 The load balancing issue 3 PIC codes in the HPC environment - A. Beck SMILEI training workshop 15 / 35
PIC codes in the HPC environment - A. Beck SMILEI training workshop 16 / 35
Explicit PIC code principle Pusher Interpolator Projector Solve Vlasov Solve Maxwell PIC codes in the HPC environment - A. Beck SMILEI training workshop 17 / 35
Domain decomposition PIC codes in the HPC environment - A. Beck SMILEI training workshop 18 / 35
Domain decomposition : MPI PIC codes in the HPC environment - A. Beck SMILEI training workshop 19 / 35
Domain decomposition : MPI Compute node Compute node Compute node Compute node Core Core Core Core Core Core Core Core Memory Memory Memory Memory Core Core Core Core Core Core Core Core Network PIC codes in the HPC environment - A. Beck SMILEI training workshop 20 / 35
Domain decomposition : MPI + openMP in SMILEI + Patch PIC codes in the HPC environment - A. Beck SMILEI training workshop 21 / 35
Domain synchronization If processors have a shared memory ==> OpenMP If processors have ditributed memory ==> MPI Same logic for particles PIC codes in the HPC environment - A. Beck SMILEI training workshop 22 / 35
Message Passing Interface (MPI) Characteristics Issues Library Latency Coarse grain OS jitter Inter node Distributed memory Global communication scalability Almost all HPC codes PIC codes in the HPC environment - A. Beck SMILEI training workshop 23 / 35
Open Multi-Threading (openMP) Characteristics Issues Compiler Directives Thread creation overhead Medium grain Memory/core affinity Intra node Shared memory Interface with MPI ( MPI_THREAD_MULTIPLE ) Many HPC codes PIC codes in the HPC environment - A. Beck SMILEI training workshop 24 / 35
Structure HPC environment, trends and prospectives 1 The PIC method and its parallelization 2 The load balancing issue 3 PIC codes in the HPC environment - A. Beck SMILEI training workshop 25 / 35
Domain decomposition : MPI + openMP in SMILEI PIC codes in the HPC environment - A. Beck SMILEI training workshop 26 / 35
Domain decomposition : MPI + openMP in SMILEI PIC codes in the HPC environment - A. Beck SMILEI training workshop 27 / 35
Domain decomposition : MPI + openMP in SMILEI PIC codes in the HPC environment - A. Beck SMILEI training workshop 28 / 35
openMP dynamic scheduler benefits MPI × OpenMP 600 768X1 384X2 500 Time for 100 iterations [s] 256X3 128X6 400 64X12 300 200 100 0 0 2000 4000 6000 8000 10000 12000 Number of iterations OpenMP dynamic scheduler is able to smooth the load but only at the node level. PIC codes in the HPC environment - A. Beck SMILEI training workshop 29 / 35
Patched base data structure PIC codes in the HPC environment - A. Beck SMILEI training workshop 30 / 35
Hilbert ordering We need a policy to assign patches to MPI processes. To do so, patches are organized along a one dimensional space-filling curve. Continuous curve which goes across all patches. 1 Each patch is visited only once. 2 Two consecutive patches are neighbours. 3 In addition we want compactness ! 4 PIC codes in the HPC environment - A. Beck SMILEI training workshop 31 / 35
Hilbert ordering We need a policy to assign patches to MPI processes. To do so, patches are organized along a one dimensional space-filling curve. Continuous curve which goes across all patches. 1 Each patch is visited only once. 2 Two consecutive patches are neighbours. 3 In addition we want compactness ! 4 PIC codes in the HPC environment - A. Beck SMILEI training workshop 32 / 35
With dynamic load balancing activated MPI × OpenMP 160 128X6 140 64X12 Time for 100 iterations [s] 128X6 + DLB 120 64X12 + DLB 100 80 60 40 20 0 0 2000 4000 6000 8000 10000 12000 14000 Number of iterations Yellow and red are copied from previous figure. PIC codes in the HPC environment - A. Beck SMILEI training workshop 33 / 35
Dynamic evolution of MPI domains Color represents the local patch computational load imbalance I loc = log 10 ( L loc / L av ) PIC codes in the HPC environment - A. Beck SMILEI training workshop 34 / 35
Dynamic evolution of MPI domains Color represents the local patch computational load imbalance I loc = log 10 ( L loc / L av ) PIC codes in the HPC environment - A. Beck SMILEI training workshop 35 / 35
Recommend
More recommend