www.bsc.es
Petascale Tools Workshop, Madison, August 4th 2014
Jesús Labarta BSC
Analysis and Parallelization Optimizations
- f Weather Codes
Analysis and Parallelization Optimizations of Weather Codes Jess - - PowerPoint PPT Presentation
www.bsc.es Analysis and Parallelization Optimizations of Weather Codes Jess Labarta BSC Petascale Tools Workshop, Madison, August 4 th 2014 Earth and Climate A complex system Multicomponent Dynamic High impact Societal,
Petascale Tools Workshop, Madison, August 4th 2014
2
3
– Cooperation with Rich Loft/John Dennis (NCAR) – Full scale code – G8 ECS project
– Ocean model Kernel – G8 ECS Project
– Cooperation with Oriol Jorba, Georgios Markomanolis (BSC) – Full scale code – Developing chemical and transport modules on top of NMMB by NCEP
– Kernel by George Mozdzynski (ECMRWF) – … mimicking some aspects of the IFS weather forecast code … – … to investigate issues and potential of hybrid task based models – Some very important restrictions
– More load imbalance than the real code
4
5
7
0 3.5 s
8
adv2 (gather–fft-scatter)* mono
speedup of MPI applications”. ICS 2008.
9
10
11
12
13
Convect_shallow_tend Microp_driver_tend aer_rad_props_sw aer_rads_prop_lw rrtmg_sg rad_rrtmg_lw
14
16
17
18
20
21
22
– Pattern often generates MPI imbalance
– FIRSTPRIVATE does useful memory management
do jv=1,nvars2d ifld=ifld+1 do j=1,ngptot znorms(j)=zgp(ifld,j) enddo call mpi_gatherv(znorms(:),ngptot,MPI_REAL8,znormsg(:),…) if( myproc==1 )then !$OMP TASK PRIVATE (zmin, zmax, zave) INOUT(ZDUM) & !$OMP& FIRSTPRIVATE(ngptotg, nstep, jv, znormsg) & !$OMP& DEFAULT(NONE) LABEL(MIN_MAX) zmin=minval(znormsg(:)) zmax=maxval(znormsg(:)) zave=sum(znormsg(:))/real(ngptotg) write(*,…) nstep,jv,zmin,zmax,zave !$OMP END TASK endif enddo
23
for (latitudes) physics for (latitudes) pack send/recv unpack/transpose ffts(); … ffts() { for (fields) ffts } for (latitudes) irecv for (latitudes) physics pack isend for (latitudes) wait for (latitudes) unpack/transpose ffts(); … for (latitudes) physics for (latitudes) pack for (latitudes) irecv for (latitudes) isend for (latitudes) wait for (latitudes) unpack/transpose ffts(); …
24
25
26
27
28
29
31
32
00:00:00 00:02:53 00:05:46 00:08:38 00:11:31 00:14:24 00:17:17 00:20:10 00:23:02 00:25:55 00:28:48 16 32 64 128 Model Execution Time MPI processes
MPI OmpSs+DLB 80,00% 85,00% 90,00% 95,00% 100,00% 105,00% 110,00% 115,00% 120,00% 125,00% 130,00% 16 32 64 128 Speedup MPI processes
OmpSs+DLB
33
35