U NITING P ERFORMANCE AND E XTENSIBILITY IN A DAPTIVE F INITE E LEMENT C OMPUTATIONS Toby Isaac tisaac@ices.utexas.edu The University of Chicago at Austin September 14, 2015 CAAM Colloqium Rice University T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 1 / 43
A DAPTIVITY T HROUGH THE L ENS OF p4est 1 N ONCONFORMING M ESHES IN PETS C 2 T HE I NTERACTIVE P ORTION . . . 3 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 2 / 43
W HY A DAPTIVE M ESH R EFINEMENT (AMR)? W HY A DAPTIVE A NYTHING ? Non-adaptive (branch-free) calculations are fast. Why bother? 1 Your non-adaptive calculations have reached the end of your resources (or the end of weak-scalability), and you want to push back. 2 (Ideally) you have a performance model that predicts it can help. E XAMPLE : hp -FEM T HEORY Predicts exponential convergence in N dof : If we want zero error, it’s worth it. If we have a nonzero tolerance, we must consider that hp systems require more resources per dof to solve than uniform, low-order systems. There is always a crossover. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 3 / 43
E XAMPLE A PPLICATIONS M ANTLE C ONVECTION Subduction zone resolution globally ⇒ trillions of dofs. AMR reduces to O (10 8 − 10 9 ) . [Stadler et al., 2010]: Stabilized [ Q 1 ] 3 × Q 1 elements, black-box algebraic multigrid. [Rudi et al., 2015]: Stable [ Q 2 ] 3 × Q disc elements, custom hybrid 0 algebraic/geometric multigrid solver, demonstrated implicit solver weak-scalability to 1 . 5 million BG/Q cores and O (10 11 ) dofs. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 4 / 43
E XAMPLE A PPLICATIONS I CE S HEET D YNAMICS [T.I. et al., 2015c]: Stable [ Q k ] 3 × Q disc k − 2 finite elements, complex domain with variable resolution demands, Robin-type boundary conditions, domain anisotropy, unusual solver demands. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 5 / 43
E XAMPLE A PPLICATIONS U NCERTAINTY Q UANTIFICATION [T.I. et al., 2015b]: Inversion (deterministic and Bayesian) for unknown boundary coefficient fields ( O (10 5 ) parameters) from surface observations in the previous ice sheet model. [Not in the above work] The tools that drive adaptivity are important for quantifying model error in Bayesian inversion. Bayesian inversion requires two components: a prior distribution on the parameters and a likelihood function of the parameters given data ∼ the probability of the data given parameters, π ( d | p ) . This should incorporate not only the “noise” of the data, but the uncertainty due to error in the model-to-parameter map, i.e., the a posteriori error of the finite element solution. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 6 / 43
A PPROACHES TO M ESHING AND A DAPTIVITY W HICH S HOULD I C HOOSE ? S TRUCTURED (G RID /L ATTICE ) Fast Adaptivity: uniform, (occasionally) tensor U NSTRUCTURED (A DJACENCY G RAPH /CW-C OMPLEX ) Flexible Adaptivity: arbitrary S EMI - STRUCTURED (E XPLICIT T REE /I MPLICIT T REE ) Dynamic Adaptivity: local, (occasionally) anisotropic T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 7 / 43
C OMPOSITION OF M ESHING A PPROACHES L IBRARIES /F RAMEWORKS FOR C OMPOSITE M ESHING Several examples exist: P ATCH - BASED AMR (C HOMBO , SAMRAI, ETC .) Fast stencil-based computations with local refinement & unstructured trees. H IERARCHICAL H YBRID G RIDS [G MEINER ET AL ., 2015] Fast stencil-based computations on non-Cartesian geometries. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 8 / 43
p4est : F ORESTS OF Q UADTREES /O CTREES Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 9 / 43
p4est : F ORESTS OF Q UADTREES /O CTREES Main developers: C. Burstedde, T.I., many other contributors. [Burstedde et al., 2011, T.I. et al., 2012, 2015a], p4est.org Backend: deal.II, PETSc (in progress). An unstructured hexahedral mesh (“the forest”); where each hexahedron contains an arbitrarily refined octree; space-filling curve (SFC) orders elements; philosophy: as-simple-as-possible coarse mesh describes geometry, refinement captures all detail. x 0 k 0 k 1 y 0 k 1 k 0 x 1 p 0 p 1 p 1 p 2 y 1 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 9 / 43
p4est ’ S R EFINEMENT C YCLE C REATE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43
p4est ’ S R EFINEMENT C YCLE R EFINE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43
p4est ’ S R EFINEMENT C YCLE 2:1 B ALANCE T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43
p4est ’ S R EFINEMENT C YCLE R EPARTITION ( LOAD BALANCE ) T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43
p4est ’ S R EFINEMENT C YCLE R EPARTITION ( LOAD BALANCE ) Not pictured: construct FE basis and communication patterns. T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 10 / 43
p4est ’ S S CALABILITY W EAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED ) Partition Balance Ghost Nodes 100 90 80 70 Percentage of runtime 60 50 40 30 20 10 0 12 60 432 3444 27540 220320 Number of CPU cores T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 11 / 43
p4est ’ S S CALABILITY W EAK SCALING OF MESH REFINEMENT CYCLE (2:1 BALANCE HIGHLIGHTED ) Old New 6 5 Seconds per (million elements / core) 4 3 2 1 0 12 96 768 6144 49152 112128 Number of CPU cores T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 11 / 43
E XAMPLE : A N I CE S HEET M ODEL B UILT ON p4est Ice sheet thickness: ∼ 2 km Ice sheet extent: O (10 3 ) km T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 12 / 43
T HE P ROBLEM WITH O CTREES IN T HIN D OMAINS The space filling curve does not respect column order: Columns split between processors when partitioning. Dofs not ordered in columns for efficient preconditioning (e.g., ILU). T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 13 / 43
A N A NISOTROPIC S OLUTION A NOTHER L AYER OF M ESH C OMPOSITION partition 0 partition 1 A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est : hybrid routines have the prefix “ p6est_ ”, reproduce most of the standard p4est API, are documented on the website 1 . 1 p4est.github.io/api T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 14 / 43
A N A NISOTROPIC S OLUTION A NOTHER L AYER OF M ESH C OMPOSITION A p4est forest of quadtrees to manage columns, with each column stored as a flat, linear binary tree of layers, which guarantees column integrity. An extension to p4est : hybrid routines have the prefix “ p6est_ ”, reproduce most of the standard p4est API, are documented on the website 1 . 1 p4est.github.io/api T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 14 / 43
A NOTHER A PPLICATION NUMA: N ONHYDROSTATIC U NIFIED M ODEL OF THE A TMOSPHERE Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere 2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation]. 2 faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 15 / 43
A NOTHER A PPLICATION NUMA: N ONHYDROSTATIC U NIFIED M ODEL OF THE A TMOSPHERE Well-suited for other climate and earth systems models. NUMA: Non-hydrostatic Unified Model of the Atmosphere 2 [Giraldo et al., 2013] is using p6est for partitioning (adaptivity in progress). Scalability to 1M processes on Mira BG/Q [in preparation]. Ω p 2 faculty.nps.edu/fxgirald/projects/NUMA/Introduction_to_NUMA.html T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 15 / 43
T ESTING THE L IMITS OF p4est ’ S P ARTITIONING Scalability to 458K BG/Q cores of JUQUEEN from [T.I. et al., 2015a]. forest-to-mesh runtime in seconds 10 2 8B 64B510B 16M130M 1B 2M 10 0 240k 28k 5.7k 10 − 2 10 1 10 2 10 3 10 4 10 5 10 6 P P , 16-way: 16 128 1024 8192 65536 458752 P , 32-way: 32 256 2048 16384 131072 917504 P , 64-way: 64 512 4096 32768 262144 T. Isaac (U. Chicago) Adaptivity: Performance & Extensibility September 14, 2015 16 / 43
Recommend
More recommend