Extruded meshes for high aspect ratio simulations in Firedrake and - PowerPoint PPT Presentation

Extruded meshes for high aspect ratio simulations in Firedrake and PyOP2 Gheoghe-Teodor (Doru) Bercea (a) , Andrew TT McRae (b) , Lawrence Mitchell (c) , David A Ham, Paul HJ Kelly (a) gheorghe-teodor.bercea08@imperial.ac.uk (b) a.mcrae12@imperial.ac.uk (c) lawrence.mitchell@imperial.ac.uk 1

• (Very) high aspect ratio domains • Want vertically column-aligned meshes • e.g. important that vertical gradients don't leak into horizontal (and vice-versa) • Possible to achieve with fully unstructured meshes • Often don't need unstructured benefits in short "vertical" direction 4

Exploit column structure • Extrude an unstructured base mesh • New cells gain a dimension (interval → quad, triangle → prism) • Arrange for extruded direction to be innermost iteration index: direct addressing (see, e.g., MacDonald et al. Int. J. HPC App. 25 (4):392 (2010)) • Exploit this in execution 5

Direct addressing in vertical ⊗ 6

Direct addressing in vertical 6

Direct addressing in vertical a + n c + n a + ( n − 1) c + ( n − 1) a + 2 c + 2 a + 1 c + 1 a c 6

Direct addressing in vertical a + n • Connectivity specified by base cell indirection and fixed c + n offset for each dof in extruded a + ( n − 1) cells c + ( n − 1) a + 2 • Walking up column is prefetch-friendly stride-n c + 2 access a + 1 • Also decreases memory c + 1 footprint of indirections a c 6

Does it work? • What to measure? • Bandwidth, flops? Depends what regime we're in. • For bandwidth bound problems, we measure valuable bandwidth • assume there is a perfect ordering that allows us to stream data from main memory, use it and then evict, always using full cache lines • Doesn't count memory footprint of indirections • Easy to measure: data volume / execution time • Lower bound on actual data movement, can compare to STREAM 7

(Very) simple (bandwidth bound) rhs assembly: � Base mesh: ~800000 triangles (Hilbert curve numbering) 75 cell layers 12 MPI processes 29688 ● ● ● ● ● ● ● 25000 ● ● ● ● ● ● ● 20000 ● VBW [GB/s] ● 15000 ● ● ● ● ● ● ● ● ● 10000 ● P1dgxP0 P0xP0 P1xP0 ● ● ● 5000 ● P1dgxP1 P0xP1 P1xP1 P0xP1dg P1xP1dg ● ● ● 916 1 4 12 24 1 8 16 25 50 MPI processes Number of cell layers 12 core Intel Xeon E5-2620 @ 2.0 GHz Intel 14.0.1 (-O3 -xAVX), Intel MPI 3.1.038 8 STREAM Triad 42 GB/s

What about FLOP-limited cases? • We use (modified) versions of the FEniCS toolchain to generate assembly kernels • These are further optimised by an FE-aware loop-nest compiler, Luporini et al. arxiv:1407.0904 [cs.MS] • Performs loop-invariant code motion, vector-alignment and padding, loop unrolling/permutation and manual vectorisation • For P2 Helmholtz operator, we sustain ~20GFlop/s on a single core • Guaranteed not to exceed 30.4 GFlop/s, with simultaneous issue of 1 AVX mul and 1 AVX add per cycle (not Haswell, so no FMA) 9

Building elements • Performance no good if we can't express variational problems • Drive Firedrake using (mostly) DOLFIN-compatible Python interface • Express variational problems on extruded meshes using extensions to UFL 10

m = UnitIntervalMesh (5) � mesh = ExtrudedMesh ( m , layers =5) � U0 = FiniteElement ( "CG" , interval , 1) U1 = FiniteElement ( "DG" , interval , 0) V0 = FiniteElement ( "CG" , interval , 1) V1 = FiniteElement ( "DG" , interval , 0) � W0_elt = OuterProductElement ( U0 , V0 ) W1_a = HDiv ( OuterProductElement ( U1 , V0 )) W1_b = HDiv ( OuterProductElement ( U0 , V1 )) � W1_elt = W1_a + W1_b � W0 = FunctionSpace ( mesh , W0_elt ) W1 = FunctionSpace ( mesh , W1_elt ) � W = W0 * W1 sigma , u = TrialFunctions ( W ) tau , v = TestFunctions ( W ) � L = assemble (( sigma * tau - inner ( rot ( tau ), u ) + inner ( rot ( sigma ), v ) + div ( u )* div ( v ))* dx )

Product complexes • We support elements that are a tensor product of base mesh basis functions, and interval basis functions • Motivating application: mimetic FE for numerical weather prediction, requires a discrete de Rham complex • Construct from product of base mesh and interval complexes 12

Product complexes • We support elements that are a tensor product of base mesh basis functions, and interval basis functions • Motivating application: mimetic FE for numerical weather prediction, requires a discrete de Rham complex • Construct from product of base mesh and interval complexes d d U 0 U 1 U 2 12

Product complexes • We support elements that are a tensor product of base mesh basis functions, and interval basis functions • Motivating application: mimetic FE for numerical weather prediction, requires a discrete de Rham complex • Construct from product of base mesh and interval complexes d d U 0 U 1 U 2 d V 0 V 1 12

Product complexes • We support elements that are a tensor product of base mesh basis functions, and interval basis functions • Motivating application: mimetic FE for numerical weather prediction, requires a discrete de Rham complex • Construct from product of base mesh and interval complexes d d U 0 U 1 U 2 d V 0 V 1 d d d U 0 ⊗ V 0 U 0 ⊗ V 1 ⊕ U 1 ⊗ V 0 U 1 ⊗ V 1 ⊕ U 2 ⊗ V 0 U 2 ⊗ V 1 12

Example: lowest order RT and Nedelec 1st kind U0 = FiniteElement ( "CG" , interval , 1) U1 = FiniteElement ( "DG" , interval , 0) V0 = FiniteElement ( "CG" , interval , 1) V1 = FiniteElement ( "DG" , interval , 0) � W1_a = OuterProductElement ( U0 , V1 ) � W1_b = OuterProductElement ( U1 , V0 ) � W1 = W1_a + W1_b � RT1 = HDiv ( W1 ) � N1 = HCurl ( W1 ) 13

Lowest order Nedelec 2nd kind on prism N2_1 = FiniteElement ( "N2curl" , triangle , 1) CG_2 = FiniteElement ( "CG" , interval , 2) � Ned_horiz = HCurl ( OuterProductElement ( N2_1 , CG_2 )) � P2tri = FiniteElement ( "CG" , triangle , 2) P1dg = FiniteElement ( "DG" , interval , 1) Ned_vert = HCurl ( OuterProductElement ( P2tri , P1dg )) � Ned_wedge = Ned_horiz + Ned_vert

Extruded meshes for high aspect ratio simulations in Firedrake and - PowerPoint PPT Presentation

Extruded meshes for high aspect ratio simulations in Firedrake and PyOP2 Gheoghe-Teodor (Doru) Bercea (a) , Andrew TT McRae (b) , Lawrence Mitchell (c) , David A Ham, Paul HJ Kelly (a) gheorghe-teodor.bercea08@imperial.ac.uk (b)

Challenge toward Lower Aspect Ratio Challenge toward Lower Aspect Ratio JAERI Tokamak Tokamak

Aspect-Oriented Programming and Aspect-J TDDD05 Ola Leifer Most slides courtesy of Jens

Exploiting Performance Benefits of Extruded Meshes in PyOP2 Department of Computing - Software

Best Practices Workshop: Overset Meshing Overview Introduction to Overset Meshes Range of

Progressive Meshes (96) Hugues Hoppe and Efficient Implementation of P-Meshes (98) Hugues Hoppe

THE GOLDEN RATIO AND THE FIBONACCI NUMBERS Common Measures 1 foot 2 feet 3 feet 3 2 Ratio

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Meshes/no meshes, radiation and tasks Numerical algorithms for the future of astrophysical

Ope ratio ns Ope ratio ns Wo rksho p Wo rksho p 2005 2005 USCG Auxiliary Ope ratio ns De

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Surfaces/Meshes Well stick to triangles Working with Meshes CS 176 Winter 2011 CS 176 Winter

Edgebreaker Connectivity Compression for Triangle Meshes Jarek Rossignac, TVCG 1999 Contribution

Meshes CS418 Computer Graphics John C. Hart Simple Meshes Cylinder ( x , y , z ) = (cos q

Articulated Meshes Eric Landreneau Scott Schaefer Texas A&M University Introduction

On 48 and Quasi 48 Meshes Luiz Velho Jonas Gomes Visgraf Laboratory IMPA Instituto de

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

Responsive Web Design Introduction to Web Design Responsive Web Design Introduction to Web

Toward the Optimal Bit Aspect Ratio in Magnetic Recording William E. Ryan Associate Professor

A L T EX template for presentation A Your name here Department of XXXX XXXX University Joint

Origin and future draft-kristensen-avt-rtp-h264-extension-00 split in two based on last IETF

Wing Design I Lecture 8 ME EN 415 Andrew Ning aning@byu.edu Span b Area S Sweep

CS615 - Aspects of System Administration Networking I Department of Computer Science Stevens

SAMSUNG SAAT: Reverse Engineering for Perform ance Analysis 2004.05.25. Seon-Ah Lee, Seung-Mo

Graph-theoretic methods in combinatorial (algebraic) topology Micha l Adamaszek Universit

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Extruded meshes for high aspect ratio simulations in Firedrake and - PowerPoint PPT Presentation

Extruded meshes for high aspect ratio simulations in Firedrake and PyOP2 Gheoghe-Teodor (Doru) Bercea (a) , Andrew TT McRae (b) , Lawrence Mitchell (c) , David A Ham, Paul HJ Kelly (a) gheorghe-teodor.bercea08@imperial.ac.uk (b)

Challenge toward Lower Aspect Ratio Challenge toward Lower Aspect Ratio JAERI Tokamak Tokamak

Aspect-Oriented Programming and Aspect-J TDDD05 Ola Leifer Most slides courtesy of Jens

Exploiting Performance Benefits of Extruded Meshes in PyOP2 Department of Computing - Software

Best Practices Workshop: Overset Meshing Overview Introduction to Overset Meshes Range of

Progressive Meshes (96) Hugues Hoppe and Efficient Implementation of P-Meshes (98) Hugues Hoppe

THE GOLDEN RATIO AND THE FIBONACCI NUMBERS Common Measures 1 foot 2 feet 3 feet 3 2 Ratio

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Meshes/no meshes, radiation and tasks Numerical algorithms for the future of astrophysical

Ope ratio ns Ope ratio ns Wo rksho p Wo rksho p 2005 2005 USCG Auxiliary Ope ratio ns De

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Surfaces/Meshes Well stick to triangles Working with Meshes CS 176 Winter 2011 CS 176 Winter

Edgebreaker Connectivity Compression for Triangle Meshes Jarek Rossignac, TVCG 1999 Contribution

Meshes CS418 Computer Graphics John C. Hart Simple Meshes Cylinder ( x , y , z ) = (cos q

Articulated Meshes Eric Landreneau Scott Schaefer Texas A&amp;M University Introduction

On 48 and Quasi 48 Meshes Luiz Velho Jonas Gomes Visgraf Laboratory IMPA Instituto de

Aspect Extraction with Automated Prior Knowledge Learning Zhiyuan (Brett) Chen Arjun Mukherjee

Responsive Web Design Introduction to Web Design Responsive Web Design Introduction to Web

Toward the Optimal Bit Aspect Ratio in Magnetic Recording William E. Ryan Associate Professor

A L T EX template for presentation A Your name here Department of XXXX XXXX University Joint

Origin and future draft-kristensen-avt-rtp-h264-extension-00 split in two based on last IETF

Wing Design I Lecture 8 ME EN 415 Andrew Ning aning@byu.edu Span b Area S Sweep

CS615 - Aspects of System Administration Networking I Department of Computer Science Stevens

SAMSUNG SAAT: Reverse Engineering for Perform ance Analysis 2004.05.25. Seon-Ah Lee, Seung-Mo

Graph-theoretic methods in combinatorial (algebraic) topology Micha l Adamaszek Universit

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Articulated Meshes Eric Landreneau Scott Schaefer Texas A&M University Introduction