sustained petascale performance of seismic simulations
play

Sustained Petascale Performance of Seismic Simulations with SeisSol - PowerPoint PPT Presentation

Technische Universit at M unchen SIAM EX 14 Workshop on Exascale Applied Mathematics Challenges and Opportunities Sustained Petascale Performance of Seismic Simulations with SeisSol M. Bader, A. Breuer, A. Heinecke, S. Rettenberger


  1. Technische Universit¨ at M¨ unchen SIAM EX 14 – Workshop on Exascale Applied Mathematics Challenges and Opportunities Sustained Petascale Performance of Seismic Simulations with SeisSol M. Bader, A. Breuer, A. Heinecke, S. Rettenberger C. Pelties, A.-A. Gabriel Technische Universit¨ at M¨ unchen, Ludwig-Maximilians-Universit¨ at M¨ unchen M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 1

  2. Technische Universit¨ at M¨ unchen HPC Meets Geoscience Alexander Alice-Agnes Alexander Christian Sebastian Breuer Gabriel Heinecke Pelties Rettenberger M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 2

  3. Technische Universit¨ at M¨ unchen Overview and Agenda SeisSol: • dynamic rupture and seismic wave propagation • unstructured tetrahedral meshes • high-order ADER-DG discretisation Optimisation for Heterogeneous Petascale Platforms: • code generation to optimize element-local matrix kernels • hybrid MPI/OpenMP parallelisation • offload scheme to address multiphysics Performance on Tianhe-2, Stampede and SuperMUC: • weak scaling of wave propagation component • strong scaling for 1992 Landers M7.2 earthquake M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 3

  4. Technische Universit¨ at M¨ unchen Dynamic Rupture and Earthquake Simulation Tohoku subduction zone: CAD model and tetrahedral mesh (C. Pelties) Use of Adaptive Tetrahedral Meshes: • curved subduction zones that meet surface at shallow angles → high impact on uplift for tsunamigenic earthquakes • complicated fault systems with multiple branches → non-linear multiphysics dynamic rupture simulation • goal: automated meshing process (incl. CAD generation) M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 4

  5. Technische Universit¨ at M¨ unchen Dynamic Rupture and Earthquake Simulation Landers fault system: simulated ground motion and tetrahedral mesh Use of Adaptive Tetrahedral Meshes: • curved subduction zones that meet surface at shallow angles → high impact on uplift for tsunamigenic earthquakes • complicated fault systems with multiple branches → non-linear multiphysics dynamic rupture simulation • goal: automated meshing process (incl. CAD generation) M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 4

  6. Technische Universit¨ at M¨ unchen Seismic Wave Propagation with SeisSol Elastic Wave Equations: (velocity-stress formulation) q t + Aq x + Bq y + Cq z = 0  q = ( σ 11 , σ 22 , σ 33 , σ 12 , σ 23 , σ 13 , u , v , w ) T with             0 0 0 0 0 0 − λ − 2 µ 0 0 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 − λ 0 0 0 0 0 0 0 0 0 − λ − 2 µ 0         0 0 0 0 0 0 0 − λ 0  0 0 0 0 0 0 − λ 0 0        0 0 0 0 0 0 0 − µ 0  0 0 0 0 0 0 − µ 0 0         A = 0 0 0 0 0 0 0 0 0 B = 0 0 0 0 0 0 0 0 − µ         0 0 0 0 0 0 0 0 − µ 0 0 0 0 0 0 0 0 0      − ρ − 1   − ρ − 1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0      − ρ − 1   − ρ − 1  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     − ρ − 1 − ρ − 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0     • high order discontinuous Galerkin discretisation • ADER-DG : high approximation order in space and time: • additional features: local time stepping, high accuracy of earthquake faulting (full frictional sliding) → Dumbser, K¨ aser et al. [3,5] M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 5

  7. Technische Universit¨ at M¨ unchen SeisSol in a Nutshell – ADER-DG 4 = Q k − | S k | � X Q n + 1 | J k | M − 1 F − , i I ( t n , t n + 1 , Q n k ) N k , i A + k N − 1 k k , i Update scheme i = 1 4 � X F + , i , j , h I ( t n , t n + 1 , Q n k ( i ) ) N k , i A − k ( i ) N − 1 + k , i i = 1 + M − 1 K ξ I ( t n , t n + 1 , Q n k ) A ∗ k + M − 1 K η I ( t n , t n + 1 , Q n k ) B ∗ k + M − 1 K ζ I ( t n , t n + 1 , Q n k ) C ∗ k Kovalewski J ( t n + 1 − t n ) j + 1 ∂ j Cauchy X I ( t n , t n + 1 , Q n ∂ t j Q k ( t n ) k ) = ( j + 1 ) ! j = 0 ( Q k ) t = − M − 1 � ( K ξ ) T Q k A ∗ k + ( K η ) T Q k B ∗ k + ( K ζ ) T Q k C ∗ � k M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 6

  8. Technische Universit¨ at M¨ unchen Optimisation of Sparse Matrix Operations Apply sparse matrices to multiple DOF-vectors Q k 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 0 1 14 14 2 15 15 16 16 3 17 17 4 5 18 18 19 19 6 20 20 7 21 21 8 22 22 0 1 2 3 4 5 6 7 8 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31 31 32 32 33 33 34 34 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 Code Generator for Sparse Kernels: (Breuer et al. [1]) • avoid overhead of CSR (or similar) data structures; store CSR elements vector, only • full “unrolling” of all element operations using a code generator • use intrinsics and apply blocking to improve vectorisation M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 7

  9. Technische Universit¨ at M¨ unchen Optimisation of Sparse Matrix Operations Apply sparse matrices to multiple DOF-vectors Q k 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 11 11 11 11 12 12 0 12 12 13 13 0 1 13 13 14 14 1 2 14 14 15 15 15 2 3 15 16 16 16 3 4 16 17 17 17 4 17 18 18 5 18 5 18 19 19 6 19 6 19 20 20 7 20 7 20 21 21 8 21 21 8 22 22 0 1 2 3 4 5 6 7 8 22 22 0 1 2 3 4 5 6 7 8 23 23 23 23 24 24 24 24 25 25 25 25 26 26 26 26 27 27 27 27 28 28 28 28 29 29 29 29 30 30 30 30 31 31 31 31 32 32 32 32 33 33 33 33 34 34 34 34 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 0 1 2 3 4 5 6 7 8 Dense vs. Sparse Kernels: (Breuer et al. [2]) • switch to dense kernels depending on achieved time to solution • for sparse and dense kernels: exploit zero-blocks generated during recursive CK computation M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 7

  10. Technische Universit¨ at M¨ unchen Mesh Generation and Partitioning Mesh Generation: • high-quality meshes required (shallow subduction zones, complicated fault structures) • with 10 8 –10 9 grid cells • using SimModeler by Simmetrix ( http://simmetrix.com/ ) Two-stage approach to provide parallel mesh partitions: • graph-based partitioning (ParMETIS) • create customised parallel format (based on netCDF) for mesh partitions • highly scalable mesh input via netCDF/MPI-IO in SeisSol M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 8

  11. Technische Universit¨ at M¨ unchen Optimization for Intel Xeon Phi Platforms Host PCIe Xeon Phi Offload Scheme: • to address load time integration of MPI boundary cells imbalances of download cells for receivers, DR, MPI multiphysics simulation time integration of MPI comm., non-MPI cells, • hides communication receiver output volume integration upload MPI- with Xeon Phi and received cells dynamic rupture between nodes fluxes, fault output wave propagation upload dynamic fluxes rupture updates OpenMP parallelisation: apply dynamic • to address manycore rupture updates, pack transfer data parallelism with 1–3 download all data coprocessors (if required) plot wave field • careful parallelisation (if required) of all loops M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 9

  12. Technische Universit¨ at M¨ unchen Supercomputing Platforms SuperMUC @ LRZ, Munich • 9216 compute nodes (18 “thin node” islands) 147,456 Intel SNB-EP cores (2.7 GHz) • Infiniband FDR10 interconnect (fat tree) • #12 in Top 500: 2.897 PFlop/s Stampede @ TACC, Austin • 6400 compute nodes, 522,080 cores 2 SNB-EP (8c) + 1 Xeon Phi SE10P per node • Mellanox FDR 56 interconnect (fat tree) • #7 in Top 500: 5.168 PFlop/s Tianhe-2 @ NSCC, Guangzhou • 8000 compute nodes used, 1.6 Mio cores 2 SNB-EP (12c) + 3 Xeon Phi 31S1P per node • TH2-Express custom interconnect • #1 in Top 500: 33.862 PFlop/s M. Bader et al.: Sustained Petascale Performance of Seismic Simulations with SeisSol SIAM EX 14, July 7, 2014 10

Recommend


More recommend