Technische Universit¨ at M¨ unchen SIAM EX 14 – Workshop on Exascale Applied Mathematics Challenges and Opportunities Sustained Petascale Performance of Seismic Simulations with SeisSol M. Bader, A. Breuer, A. Heinecke, S. Rettenberger C. Pelties, A.-A. Gabriel Technische Universit¨ at M¨ unchen, Ludwig-Maximilians-Universit¨ at M¨ unchen M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 1
Technische Universit¨ at M¨ unchen HPC Meets Geoscience Alexander Alice-Agnes Alexander Christian Sebastian Breuer Gabriel Heinecke Pelties Rettenberger M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 2
Technische Universit¨ at M¨ unchen Overview and Agenda SeisSol: • dynamic rupture and seismic wave propagation • unstructured tetrahedral meshes • high-order ADER-DG discretisation Optimisation for Heterogeneous Petascale Platforms: • code generation to optimize element-local matrix kernels • hybrid MPI/OpenMP parallelisation • offload scheme to address multiphysics Performance on Tianhe-2, Stampede and SuperMUC: • weak scaling of wave propagation component • strong scaling for 1992 Landers M7.2 earthquake M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 3
Technische Universit¨ at M¨ unchen Dynamic Rupture and Earthquake Simulation Tohoku subduction zone: CAD model and tetrahedral mesh (C. Pelties) Use of Adaptive Tetrahedral Meshes: • curved subduction zones that meet surface at shallow angles → high impact on uplift for tsunamigenic earthquakes • complicated fault systems with multiple branches → non-linear multiphysics dynamic rupture simulation • goal: automated meshing process (incl. CAD generation) M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 4
Technische Universit¨ at M¨ unchen Dynamic Rupture and Earthquake Simulation Landers fault system: simulated ground motion and tetrahedral mesh Use of Adaptive Tetrahedral Meshes: • curved subduction zones that meet surface at shallow angles → high impact on uplift for tsunamigenic earthquakes • complicated fault systems with multiple branches → non-linear multiphysics dynamic rupture simulation • goal: automated meshing process (incl. CAD generation) M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 4
Technische Universit¨ at M¨ unchen Seismic Wave Propagation with SeisSol Elastic Wave Equations: (velocity-stress formulation) q t + Aq x + Bq y + Cq z = 0 q = ( σ 11 , σ 22 , σ 33 , σ 12 , σ 23 , σ 13 , u , v , w ) T with 0 0 0 0 0 0 − λ − 2 µ 0 0 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 0 0 − λ − 2 µ 0 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 0 0 − µ 0 0 0 0 0 0 0 − µ 0 0 A = 0 0 0 0 0 0 0 0 0 B = 0 0 0 0 0 0 0 0 − µ 0 0 0 0 0 0 0 0 − µ 0 0 0 0 0 0 0 0 0 − ρ − 1 − ρ − 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 − ρ − 1 − ρ − 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 − ρ − 1 − ρ − 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 • high order discontinuous Galerkin discretisation • ADER-DG : high approximation order in space and time: • additional features: local time stepping, high accuracy of earthquake faulting (full frictional sliding) → Dumbser, K¨ aser et al. [3,5] M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 5
Technische Universit¨ at M¨ unchen SeisSol in a Nutshell – ADER-DG 4 = Q k − | S k | � X Q n + 1 | J k | M − 1 F − , i I ( t n , t n + 1 , Q n k ) N k , i A + k N − 1 k k , i Update scheme i = 1 4 � X F + , i , j , h I ( t n , t n + 1 , Q n k ( i ) ) N k , i A − k ( i ) N − 1 + k , i i = 1 + M − 1 K ξ I ( t n , t n + 1 , Q n k ) A ∗ k + M − 1 K η I ( t n , t n + 1 , Q n k ) B ∗ k + M − 1 K ζ I ( t n , t n + 1 , Q n k ) C ∗ k Kovalewski J ( t n + 1 − t n ) j + 1 ∂ j Cauchy X I ( t n , t n + 1 , Q n ∂ t j Q k ( t n ) k ) = ( j + 1 ) ! j = 0 ( Q k ) t = − M − 1 � ( K ξ ) T Q k A ∗ k + ( K η ) T Q k B ∗ k + ( K ζ ) T Q k C ∗ � k M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 6
Technische Universit¨ at M¨ unchen Optimisation of Sparse Matrix Operations Apply sparse matrices to multiple DOF-vectors Q k 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 0 1 14 14 2 15 15 16 16 3 17 17 4 5 18 18 19 19 6 20 20 7 21 21 8 22 22 0 1 2 3 4 5 6 7 8 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 Code Generator for Sparse Kernels: (Breuer et al. [1]) • avoid overhead of CSR (or similar) data structures; store CSR elements vector, only • full “unrolling” of all element operations using a code generator • use intrinsics and apply blocking to improve vectorisation M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 7
Technische Universit¨ at M¨ unchen Optimisation of Sparse Matrix Operations Apply sparse matrices to multiple DOF-vectors Q k 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 0 12 12 13 13 0 1 13 13 14 14 1 2 14 14 15 15 15 2 3 15 16 16 16 3 4 16 17 17 17 4 17 18 18 5 18 5 18 19 19 6 19 6 19 20 20 7 20 7 20 21 21 8 21 21 8 22 22 0 1 2 3 4 5 6 7 8 22 22 0 1 2 3 4 5 6 7 8 23 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 32 32 32 32 33 33 33 33 34 34 34 34 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 Dense vs. Sparse Kernels: (Breuer et al. [2]) • switch to dense kernels depending on achieved time to solution • for sparse and dense kernels: exploit zero-blocks generated during recursive CK computation M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 7
Technische Universit¨ at M¨ unchen Mesh Generation and Partitioning Mesh Generation: • high-quality meshes required (shallow subduction zones, complicated fault structures) • with 10 8 –10 9 grid cells • using SimModeler by Simmetrix ( http://simmetrix.com/ ) Two-stage approach to provide parallel mesh partitions: • graph-based partitioning (ParMETIS) • create customised parallel format (based on netCDF) for mesh partitions • highly scalable mesh input via netCDF/MPI-IO in SeisSol M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 8
Technische Universit¨ at M¨ unchen Optimization for Intel Xeon Phi Platforms Host PCIe Xeon Phi Offload Scheme: • to address load time integration of MPI boundary cells imbalances of download cells for receivers, DR, MPI multiphysics simulation time integration of MPI comm., non-MPI cells, • hides communication receiver output volume integration upload MPI- with Xeon Phi and received cells dynamic rupture between nodes fluxes, fault output wave propagation upload dynamic fluxes rupture updates OpenMP parallelisation: apply dynamic • to address manycore rupture updates, pack transfer data parallelism with 1–3 download all data coprocessors (if required) plot wave field • careful parallelisation (if required) of all loops M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 9
Technische Universit¨ at M¨ unchen Supercomputing Platforms SuperMUC @ LRZ, Munich • 9216 compute nodes (18 “thin node” islands) 147,456 Intel SNB-EP cores (2.7 GHz) • Infiniband FDR10 interconnect (fat tree) • #12 in Top 500: 2.897 PFlop/s Stampede @ TACC, Austin • 6400 compute nodes, 522,080 cores 2 SNB-EP (8c) + 1 Xeon Phi SE10P per node • Mellanox FDR 56 interconnect (fat tree) • #7 in Top 500: 5.168 PFlop/s Tianhe-2 @ NSCC, Guangzhou • 8000 compute nodes used, 1.6 Mio cores 2 SNB-EP (12c) + 3 Xeon Phi 31S1P per node • TH2-Express custom interconnect • #1 in Top 500: 33.862 PFlop/s M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 10
Recommend
More recommend