Algebraic multigrid methods for mechanical engineering applications - PowerPoint PPT Presentation

Algebraic multigrid methods for mechanical engineering applications Mark F. Adams St. Wolfgang/Strobl Austria - 3 July 2006 17th International Conference on Domain Decomposition Methods 0

Outline • Algebraic multigrid (AMG) • Coarse grid spaces • Smoothers: Add. (Cheb.) and Mult. (G-S) • Industrial applications • Micro-FE bone modeling • Scalability/performance studies • Weak and strong (scaled/unscaled) speedup • Multigrid algorithms for KKT system • New AMG framework for KKT systems 17th International Conference on Domain Decomposition Methods 1

Multigrid smoothing and coarse grid correction (projection) smoothing The Multigrid V-cycle Finest Grid Restriction (R) Note: Prolongation smaller grid (P=R T ) First Coarse Grid 17th International Conference on Domain Decomposition Methods 2

Multigrid components • Smoother S ν (f,u 0 ), ν iterations of simple PC (Schwarz) • Multiplicative: great theoretical properties, parallel problematic • Additive: requires damping (eg, Chebyshev polynomials) • Prolongation (interpolation) operator P • Restriction operator R ( R = P T ) • Map residuals from fine to coarse grid • Columns of P : discrete coarse grid functions on fine grid • Algebraic Algebraic coarse grid (Galerkin) A H = RA h P • • AMG method defined by S and P operator 17th International Conference on Domain Decomposition Methods 3

Smoothed Aggregation Piecewise constant function: “Plain” agg. (P 0 ) Start with kernel vectors B of operator eg, 6 RBMs in elasticity Nodal aggregation B P 0 “Smoothed” aggregation: lower energy of functions One Jacobi iteration: P  ( I - ω D -1 A ) P 0 17th International Conference on Domain Decomposition Methods 5

Smoothers • CG/Jacobi: Additive • Essentially damped by CG - Adams SC1999 • Dot products, non-stationary • Gauss-Seidel: multiplicative (Optimal MG smoother) • Complex communication and comput. - Adams SC2001 • Polynomial Smoothers: Additive • Chebyshev ideal for MG - Adams et.al. JCP 2003 • Chebychev chooses p(y) such that • |1 - p(y) y | = min over interval [ λ * , λ max ] • Estimate of λ max easy • Use λ * = λ max / C (No need for lowest eigenvalue) • C related to rate of grid coarsening 17th International Conference on Domain Decomposition Methods 7

Parallel Gauss-Seidel Example: 2D, 4 proc • Multiplicative smoothers • (+) Powerful • (+) Great for MG • (-) Difficult to parallelize • Ideas: • Use processor partitions • Use ‘internal’ work to hide communication • Symmetric! 17th International Conference on Domain Decomposition Methods 8

Cray T3E - 24 Processors – About 30,000 dof Per Processor 17th International Conference on Domain Decomposition Methods 9

Iteration counts (80K to 76M equations) 17th International Conference on Domain Decomposition Methods 10

Aircraft carrier • 315,444 vertices • Shell and beam elements (6 DOF per node) • Linear dynamics – transient (time domain) • About 1 min. per solve (rtol=10 -6 ) • 2.4 GHz Pentium 4/Xenon processors • Matrix vector product runs at 254 Mflops 17th International Conference on Domain Decomposition Methods 12

17th International Conference on Domain Decomposition Methods 13

17th International Conference on Domain Decomposition Methods 14

“BR” tire 17th International Conference on Domain Decomposition Methods 15

Math does matter! 17th International Conference on Domain Decomposition Methods 16

Trabecular Bone 5-mm Cube FE mesh generation Cortical bone Trabecular bone 17th International Conference on Domain Decomposition Methods 18

µ FE Mesh Computational Input File Architecture Athena ParMetis Partition to SMPs FE input file FE input file (in memory) (in memory) Athena: Parallel FE Athena Athena ParMetis ParMetis Parallel Mesh Partitioner File File File File (Univerisity of Minnesota) Prometheus Material Card FEAP FEAP FEAP FEAP Multigrid Solver FEAP pFEAP Serial general purpose FE application (University of Silo DB California) Olympus Silo DB Silo DB PETSc Silo DB Parallel numerical libraries Prometheus METIS (Argonne National Labs) METIS METIS METIS Visit ParMetis PETSc 17th International Conference on Domain Decomposition Methods 20

Viz: • Geometric & Material non- linear • 2.25% strain • 8 procs. DataStar (SP4 at UCSD) 17th International Conference on Domain Decomposition Methods 21

Scalability: Vertebral Body • Large deformation elast. • 6 load steps (3% strain) • Scaled speedup • ~131K dof/processor • 7 to 537 million dof • 4 to 292 nodes • IBM SP Power3 • 14 of 16 procs/node used Double/Single Colony switch • 80 µ m w/ shell 17th International Conference on Domain Decomposition Methods 22

Scalability • Inexact Newton • CG linear solver • Variable tolerance • Smoothed aggregation AMG preconditioner • (vertex block) Diagonal smoothers: • 2 nd order Chebeshev (add.) • Gauss-Seidel (multiplicative) 80 µ m w/o shell 17th International Conference on Domain Decomposition Methods 23

Computational phases • Mesh setup (per mesh): • Coarse grid construction (aggregation) • Graph processing • Matrix setup (per matrix): • Coarse grid operator construction • Sparse matrix triple product RAP (expensive for S.A.) • Subdomain factorizations • Solve (per RHS): • Matrix vector products (residuals, grid transfer) • Smoothers (Matrix vector products) 17th International Conference on Domain Decomposition Methods 24

131K dof/proc - Flops/sec/proc .47 Teraflop/s - 4088 processors 17th International Conference on Domain Decomposition Methods 25

Sources of inefficiencies: Linear solver iterations Newton Small (7.5M dof) Large (537M dof) Load 1 2 3 4 5 1 2 3 4 5 6 1 5 14 20 21 18 5 11 35 25 70 2 2 5 14 20 20 20 5 11 36 26 70 2 3 5 14 20 22 19 5 11 36 26 70 2 4 5 14 20 22 19 5 11 36 26 70 2 5 5 14 20 22 19 5 11 36 26 70 2 6 5 14 20 22 19 5 11 36 26 70 2 17th International Conference on Domain Decomposition Methods 26

Sources of scale inefficiencies in solve phase 7.5M dof 537M dof #iteration 450 897 #nnz/row 50 68 Flop rate 76 74 #elems/pr 19.3K 33.0K model 1.00 2.78 Measured 1.00 2.61 run time 17th International Conference on Domain Decomposition Methods 27

Strong speedup: 7.5M dof (1 to 128 nodes) 17th International Conference on Domain Decomposition Methods 28

Nodal Performance of IBM SP Power3 and Power4 • IBM power3, 16 processors per node • 375 Mhz, 4 flops per cycle • 16 GB/sec bus (~7.9 GB/sec w/ STREAM bm) • Implies ~1.5 Gflops/s MB peak for Mat-Vec • We get ~1.2 Gflops/s (15 x .08Gflops) • IBM power4, 32 processors per node • 1.3 GHz, 4 flops per cycle • Complex memory architecture 17th International Conference on Domain Decomposition Methods 29

Speedup 17th International Conference on Domain Decomposition Methods 30

Algebraic multigrid methods for mechanical engineering applications - PowerPoint PPT Presentation

Algebraic multigrid methods for mechanical engineering applications Mark F. Adams St. Wolfgang/Strobl Austria - 3 July 2006 17th International Conference on Domain Decomposition Methods 0 Outline Algebraic multigrid (AMG) Coarse grid

Compact Fourier Analysis for Multigrid Methods Cortona 2008 Thomas Huckle joint work with

Algebraic multigrid in PETSc Mark Adams Lawrence Berkeley National Laboratory PETSc user

REVOLUTIONIZING LATTICE QCD PHYSICS WITH HETEROGENEOUS MULTIGRID Kate Clark, April 6th 2016

Multigrid methods for zero-sum two player stochastic games with mean reward Sylvie Detournay and

AN INTRODUCTION TO MULTIGRID METHODS VIA SUBSPACE CORRECTION FRAMEWORK LUDMIL ZIKATANOV,

Algebraic Multigrid Methods on GPU-Accelerated Hybrid Architectures Manfred Liebmann Institute

Gmunu : Toward multigrid based Einstein field equations CHEONG, Chi-Kit field equations solver

MECHANICAL ENGINEERING MECHANICAL ENGINEERING Assoc. Prof. K stutis Pilkauskas Faculty of

Mechanical Engineering Department of Mechanical Engineering Department Delivery Delivery

A Multigrid Optimization Framework for Centroidal Voronoi Tessellation Zichao Di Department of

CLOVER HMC AND STAGGERED MULTIGRID ON SUMMIT AND VOLTA Kate Clark, July 25th 2018 OUTLINE with

Multigrid preconditioning for anisotropic positive semidefinite block Toeplitz systems Rainer

A massivelly parallel multigrid solver using PETSc for unstructured meshes on Tier0

Driving Improvements to Algebraic Multigrid Through Performance Modeling Hormozd Gahvari 1,3 ,

Learning Algebraic Multigrid using Graph Neural Networks Ilay Luz, Meirav Galun, Haggai Maron,

Using non-Galerkin coarse grid operators in multigrid methods General considerations and case

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Statistics on Lie groups: using the pseudo-Riemannian framework? Nina Miolane, Xavier Pennec

5.8 Bibliotheken f ur PostgreSQL Haskell/WASH: Modul Dbconnect PHP: pqsql-Funktionen

Infrastructure as a Service Am Beispiel von Amazon EC2 und Eucalyptus Jrg Brendel

5 Semi-Supervised Learning BVM Tutorial: Advanced Deep Learning Methods David Zimmerer, Division

Variational image segmentation Nicolas Rougon Institut Mines-Tlcom / Tlcom SudParis

Introduction to English Linguistics 7: Semantics and the Lexicon Using OED Online

Sisters Class - Fiqh-us-Sunnah: Fiqh from a hadith perspective An Introduction to