Flexible, Scalable Mesh and Data Management using PETSc DMPlex M. Lange 1 M. Knepley 2 G. Gorman 1 1 AMCG, Imperial College London 2 Computation Institute, University of Chicago April 23, 2015 M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Unstructured Mesh Management Mesh management ◮ Many tasks are common across applications: Mesh input, partitioning, checkpointing, . . . ◮ File I/O can become severe bottleneck! Mesh file formats ◮ Range of mesh generators and formats Gmsh, Cubit, Triangle, ExodusII, CGNS, SILO, . . . ◮ No universally accepted format ◮ Applications often “roll their own” ◮ No interoperability between codes M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Unstructured Mesh Management Finding the right level of abstraction ◮ Abstract mesh topology interface ◮ Provided by a widely used library ◮ Extensible support for multiple formats ◮ Single point for extension and optimisation ◮ Many applications inherit capabilities ◮ Mesh management optimisations ◮ Scalable read/write routines ◮ Parallel partitioning and load-balancing ◮ Mesh renumbering techniques ◮ Parallel mesh adaptivity M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
DMPlex: Mesh topology abstraction 1 13 DMPlex - PETSc’s unstructured mesh API 1 11 4 10 ◮ Abstract mesh connectivity 14 12 ◮ Directed Acyclic Graph (DAG) 2 ◮ Dimensionless access 9 2 3 ◮ Topology separate from discretisation ◮ Pre-allocate data structures 1 2 3 4 ◮ Enables new preconditioners 9 10 11 12 13 14 ◮ FieldSplit ◮ Geometric Multigrid 5 6 7 8 0 1 M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program. , 17(3):215–230, August 2009 2 Anders Logg. Efficient representation of computational meshes. International Journal of Computational Science and Engineering , 4:283–295, 2009 M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
DMPlex: Mesh topology abstraction 1 13 DMPlex - PETSc’s unstructured mesh API 1 11 4 10 ◮ Input: ExodusII, Gmsh, CGNS, Fluent-Case 14 12 ◮ Output: VTK, HDF5 + Xdmf 9 2 3 ◮ Visualizable checkpoints ◮ Parallel distribution 1 2 3 4 ◮ Partitioners: Chaco, Metis/ParMetis 9 10 11 12 13 14 ◮ Automated halo exchange via PetscSF 5 6 7 8 ◮ Mesh renumbering ◮ Reverse Cuthill-McGee (RCM) 0 1 M. Knepley and D. Karpeev. Mesh Algorithms for PDE with Sieve I: Mesh Distribution. Sci. Program. , 17(3):215–230, August 2009 M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Fluidity-DMPlex Integration 2 Fluidity 1 ◮ Unstructured finite element code 0 ◮ Anisotropic mesh adaptivity 0 2 ◮ Uses PETSc as linear solver engine 4 6 2 ◮ Applications: 8 1 0 ◮ CFD, geophysical flows, ocean modelling, reservoir modelling, mining, nuclear safety, renewable energies, etc. Bottleneck : Parallel pre-processing 1 1 X. Guo, M. Lange, G. Gorman, L. Mitchell, and M. Weiland. Developing a scalable hybrid MPI/OpenMP unstructured finite element model. Computers & Fluids , 110(0):227 – 234, 2015. ParCFD 2013 M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Fluidity - DMPlex Integration Preprocessor Fluidity Original Mesh Fields Fields Mesh Fields Zoltan Fluidity Current Mesh DMPlex DMPlex Fields DMPlexDistribute Fluidity Goal Mesh DMPlex DMPlex Fields Load Balance M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Fluidity - DMPlex Integration DMPlexDistribute on unit cube ( 2048 3 cells) DMPlexDistribute 10 1 time [sec] ◮ Before: ◮ One-to-many 10 0 Distribute ◮ Single-level overlap DistributeOverlap Distribute::Mesh Partition ◮ Overlap is expensive Distribute::Mesh Migration 2 6 12 24 48 96 ◮ After: Number of processors Load balancing a unit cube ( 2048 3 cells) ◮ Generic mesh migration 10 1 Redistribute::Mesh Partition ◮ Parallel N-level overlap Redistribute::Mesh Migration ◮ All-to-all via ParMetis time [sec] ◮ Available to other codes 10 0 ◮ Firedrake, Moose, . . . 2 6 12 24 48 96 Number of processors M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Fluidity-DMPlex Integration Native ordering RCM reordering Mesh reordering ◮ Fluidity Halos Serial ◮ Separate L1/L2 regions ◮ “Trailing receives” ◮ Requires permutation ◮ DMPlex provides RCM ◮ Generated locally ◮ Fields inherit reordering Parallel ◮ Better cache coherency M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Benchmark Archer ◮ Cray XC30 ◮ 4920 nodes (118,080 cores) ◮ 12-core E5-2697 (Ivy Bridge) Simulation ◮ Flow past a square cylinder ◮ 3D mesh, generated with Gmsh M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Results - Simulation Startup Fluidity Startup 120 Preprocessor Fluidity (preprocessed) 100 Startup on 4 nodes Fluidity Total Fluidity-DMPlex 80 time [sec] ◮ Runtime distribution wins 60 ◮ Fast topology distribution 40 ◮ No clear I/O gains 20 ◮ Gmsh does not scale 0 8615 1183692 1942842 2944992 Mesh Size [elements] Fluidity Startup - Distribute Fluidity Startup - File I/O 70 Zoltan + Callbacks Preprocessor-Read 30 DMPlexDistribute Preprocessor-Write 60 Fluidity-Read 25 Fluidity-Total 50 DMPlex-DAG time [sec] time [sec] 20 40 15 30 10 20 10 5 0 8615 1183692 1942842 2944992 8615 1183692 1942842 2944992 Mesh Size [elements] Mesh Size [elements] M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Results - Simulation Performance Full Simulation Fluidity-DMPlex: RCM Fluidity-DMPlex: native 10 3 Performance Fluidity-Preprocessor time [sec] ◮ Mesh with ∼ 2 mio elements ◮ Preprocessor + 10 timesteps ◮ RCM brings improvements ◮ Pressure solve ◮ Velocity assembly 2 6 12 24 48 96 Number of Processes Pressure Solve Velocity Assembly Fluidity-DMPlex: RCM Fluidity-DMPlex: RCM Fluidity-DMPlex: native Fluidity-DMPlex: native Fluidity-Preprocessor Fluidity-Preprocessor time [sec] time [sec] 10 2 10 2 2 6 12 24 48 96 2 6 12 24 48 96 Number of Processes Number of Processes M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Discussion and Future Work DMPlex mesh management for Fluidity ◮ No need to preprocess ◮ Increased interoperability: ◮ ExodusII, CGNS, Fluent-Case ◮ Performance benefits ◮ Fast runtime mesh distribution ◮ Optional RCM renumbering Future work ◮ DMPlex-based checkpointing in Fluidity ◮ Scalable parallel mesh reads with DMPlex ◮ Anisotropic mesh adaptivity via DMPlex M. Lange, M. Knepley, G. Gorman DMPlex Mesh Management
Recommend
More recommend