Multiple Right-hand-side Implementation for DD AMG Shuhei Yamamoto - PowerPoint PPT Presentation

Multiple Right-hand-side Implementation for DD α AMG Shuhei Yamamoto s.yamamoto@cyi.ac.cy Simone Bacchio, Jacob Finkenrath September 22, 2020 1 / 51

Outline Introduction and Motivation Overview of DD α AMG SAP Coarse-grid correction Performance Implementation of Multiple R.H.S. Motivation Implementation details Scaling results Tuning Fast Accurate Block Linear krylOv Solver (Fabulous) Basics Parameters Tuning plots Outlook 2 / 51

QFT In path-integral formulation, all physical predictions in QFT is the generating functional: � D φ e iS . Z = In particular, for QCD, � d 4 x L QCD = S ψ + S G S QCD = where � � ¯ d 4 x ψ f ( i γ µ D µ − m f ) ψ f and S G = S G ( A a S ψ = µ ) . f The expectation value of an observable can be computed using � � O � = 1 D φ O ( φ ) e iS . Z 3 / 51

Lattice QCD Now the task is to evaluate � O � . We do this non-perturbatively. To be more specific, ◮ We perform Wick rotation from Minkovski to Eulcidean space-time to obtain � � O � = 1 D φ O ( φ ) e − S E Z where S E is a real-valued Euclidean action. ◮ Then, we use e − S E as a Monte-Carlo weight and have N N 1 O i ≈ 1 � � � O � = lim O i . N N N →∞ i i To make the computation doable on a computer, ◮ We discretize the space-time into the lattice with its spacing a and dimension L 3 × T . ◮ Fermion fields, ψ , live on a lattice site, and gauge fields are replaced by a link, U µ , connecting the adjacent points. 4 / 51

The Problem As the fermion fields in the action are Grassmann numbers, using its properties, we can take integration over fermion fields explicitly to obtain � � O � = 1 D UO ( D − 1 , U )Det D ( U ) e − S G Z � / ψ D ψ = � � where D is a Dirac matrix in S ψ = ¯ f ¯ ψ f D E + m f ψ f . Evaluation of the expectation value requires ◮ computation of the inverse of the Dirac matrix repeatedly many times ◮ generation of gauge configurations according to the weight e − S G Det D ( U ) via importance sampling, e.g., HMC methods In lattice QCD calculations, the main computational task is to solve the system Dx = b . 5 / 51

The Matrix ◮ Dirac matrix: D = / D E + m ◮ Wilson Dirac operator: � 3 D W = 1 µ =0 ( γ µ (∆ µ + ∆ µ ) − a ∆ µ ∆ µ ) + m a where 2 ∆ µ ψ = 1 a ( U µ ( x ) ψ ( x + ˆ µ ) − ψ ( x )) ∆ µ ψ = 1 � � ψ ( x ) − U † µ ( x − ˆ µ ) ψ ( x − ˆ µ ) − ψ ( x ) . a 6 / 51

The Matrix for Twisted Mass Fermions ◮ The clover-improved Wilson Dirac operator: 3 D cW = D W − c sw � ( γ µ γ ν ) ⊗ ( Q µν ( x ) − Q νµ ( x )) 32 µ,ν =0 n =0 U � n where Q µν ( x ) = � 3 µν ( x ) and µ ) U † ν ) U † U µν ( x ) = U µ ( x ) U ν ( x + ˆ µ ( x + ˆ ν ( x ) U � µ ) U † ν ) U † µν ( x ) = U ν ( x + ˆ µ ( x + ˆ ν ( x ) U µ ( x ) ◮ The degenerated twisted mass operator: D D ( µ ) = ( D cW ⊗ I 2 ) + i µ (Γ 5 ⊗ τ 3 ) � � D TM ( µ ) 0 = 0 D TM ( − µ ) where D TM ( µ ) = D cW + i µ Γ 5 7 / 51

Why do we need a linear solver? ◮ Typically, for modern simulations, the size of the lattice is V = 128 × 64 3 ∼ 3 × 10 7 . ◮ Then, a typical size of the matrix is n × n with ∼ 8 × 10 8 ≃ 6 . 4 GB n = V × 4 × 3 × 2 �� spins colors complex ◮ For general matrix inversion, the computational cost is O ( n 2 . 376 ) (Optimized CW-like algorithms) ◮ If we were to store all entries of the inverse, we need the memory space of n × n ≃ 6 . 5 × 10 17 ≃ 5 × 10 9 GB = 5EB ◮ It is impractical in lattice computations to invert D explicitly and store all its entries. 8 / 51

Why do we need a good linear solver? ◮ We switch to solving Dx = b . ◮ b is usually a random vector or a point source, i.e., b = δ xa , yb where x , y are lattice sites and a , b are internal quantum numbers. ◮ In practice, computation of point-to-all propagators, i.e., the solutions of Dx = b with a point source, is often sufficient, e.g., for two-point correlation functions. ◮ The inversion needs to be done a lot of times. ◮ For example, computation of nucleon structure requires ∼ 10 7 inversions. 4 × 3 × 2 × 5 × 200 × 400 �� spins proj src pos colors flavors configs We need a good solver! 9 / 51

Krylov Solver ◮ Due to sparse nature of the matrix, the computational cost for matrix-vector multiplication is O ( n ) ◮ We use a Krylov method, which consists of constructing a Krylov subspace K k = span { b , Db , D 2 b , D 3 b , · · · , D k − 1 b } , projecting the system onto K , and solving the projected system to obtain an approximate solution x k . ◮ As it is a projection-based method, it does not require a large memory space. 10 / 51

Krylov Solver - Critical Slowing Down ◮ The condition number for D is roughly proportional to the inverse of the mass. ◮ The matrix D becomes singular as the mass approaches its critical value m critc . ◮ Convergence to the solution slows down. ◮ Inversion on and generation of gauge configuration at physical light quark masses leads to a larger condition number and thus increasing solver time. 11 / 51

Outline Introduction and Motivation Overview of DD α AMG SAP Coarse-grid correction Performance Implementation of Multiple R.H.S. Motivation Implementation details Scaling results Tuning Fast Accurate Block Linear krylOv Solver (Fabulous) Basics Parameters Tuning plots Outlook 12 / 51

Multigrid Solvers Among various solvers, a class of solvers based on multigrid approaches in preconditioning Krylov subspace solvers has turned out to be very successful (Luscher 2007b; Luscher 2007a; Osborn et al. 2010; R. Babich et al. 2010; Frommer et al. 2013). Several of the implementations for clover Wilson fermions that are openly available includes ◮ A two-level multigrid approach based on L¨ uschers inexact deflation (Luscher 2007b) available as OpenQCD at http://luscher.web.cern.ch/luscher/openQCD/ ◮ Multigrid with generalized conjugate residual (MG-GCR) (J. Brannick et al. 2008; Clark et al. 2008; Ronald Babich et al. 2009; R. Babich et al. 2010) available as part of the USQCD package QOPQDP at https://github.com/usqcd-software/qopqdp and QUDA at http://lattice.github.io/quda/ ◮ An aggregation-based domain decomposition multigrid (DD α AMG) approach (Frommer et al. 2013) available at https://github.com/DDalphaAMG/DDalphaAMG . 13 / 51

Extending Multigrid Solvers Extension of DD α AMG to twisted mass fermions: ◮ C. Alexandrou, S. Bacchio, J. Finkenrath, A. Frommer, F. Kahl, and M. Rottmann, Adaptive Aggregation-based Domain Decomposition Multigrid for Twisted Mass Fermions , Phys. Rev. D 94 , 11 (2016) ◮ The generalization to non-degenerate twisted mass fermions is discussed in (Alexandrou, Bacchio, and Finkenrath 2019) ◮ Publicly available at https://github.com/sbacchio/DDalphaAMG ◮ A version with new features at https://github.com/sy3394/DDalphaAMG Extension to other fermions: ◮ MG-GCR to domain-wall fermions in (Cohen et al. 2011) ◮ DD α AMG to overlap fermions in (James Brannick et al. 2016) 14 / 51

DD α AMG - Basics ◮ Adaptive Aggregation-based Domain Decomposition Multigrid method ◮ The algorithm is designed to invert a large sparse matrix effectively ◮ In particular, the algorithm is applied to Dirac matrices for Wilson or twisted mass fermions ◮ It takes advantage of the sparse structure of the Dirac matrix. 15 / 51

DD α AMG - Preconditioners ◮ To circumvent the issue of critical slowing down and effectively invert the large sparse matrix, DD α AMG uses two preconditioners: a smoother and coarse grid correction ◮ For a smoother, we use red-black Schwarz Alternating Procedure (SAP) (Luscher 2007a). ◮ For coarse grid correction, we use Algebraic MultiGrid (AMG) (Wesseling 1995). Picture Courtesy: Luke Olson, http://lukeo.cs.illinois.edu/cs556 16 / 51

SAP ◮ The lattice is divided into red blocks and black blocks in a chessboard manner. ◮ Due to nearest neighbor interactions, block of a color couples to a block of the other color � D rr � D rb ◮ After reordering, D = D br D bb 17 / 51

SAP ◮ In SAP, we neglect off-diagonal parts, D rb , D br , i.e., inter-block interactions. ◮ Then, within each group of blocks, we visit blocks sequentially and invert the block matrix locally. 18 / 51

SAP ◮ Depending on when we perform the global residual update, error propagates differently. ◮ In additive SAP, residual update, r ← b − Dx , is done once before local inversion over blocks. � � � D − 1 ε ← I − D ε i i ◮ In multiplicative SAP, residual update is done every time before local inversion on each block. �� ( I − D − 1 ε ← D ) ε i i ◮ Information spread faster with multiplicative SAP. 19 / 51

SAP ◮ In red-black SAP, within a group of blocks of each color, additive SAP is adopted, and over groups of blocks, multiplicative SAP is used. ◮ Define � D − 1 � � 0 � 0 0 rr B rr = and B bb = . D − 1 0 0 0 bb ◮ Then, error propagation is given by ε ← ( I − MD ) ε = ( I − B rr D )( I − B bb D ) ε. ◮ Since local block matrices live on a single MPI rank, their inversion does not call for global reduction so that SAP is suitable for parallelization. 20 / 51

SAP ◮ As it involves inversion of local block matrices, it removes UV-modes from the residual 21 / 51

Multiple Right-hand-side Implementation for DD AMG Shuhei Yamamoto - PowerPoint PPT Presentation

Multiple Right-hand-side Implementation for DD AMG Shuhei Yamamoto s.yamamoto@cyi.ac.cy Simone Bacchio, Jacob Finkenrath September 22, 2020 1 / 51 Outline Introduction and Motivation Overview of DD AMG SAP Coarse-grid correction

AM Gradiometer (AMG) Technology Underground Utility/Substructure Detection Charbel Khoury, PhD,

Meeting ST - Universities Catania, March 31, 2017 AMG Group 2 Contribution to ST revenues

AMG Reports Financial and Operating Results for the Fourth Quarter and Full Year 2016 Company

Right Hand Coordinate System Right Hand Rule A rectangular or Cartesian coordinate system

List of hand outs for this session Hand out 1: Incident decision tree Hand out 2: Yorkshire

Rev 5:1, And I saw in the right hand of Him And I saw in the right hand of Him Rev 5:1,

Automated test of the AMG Speedshift DCT control software M. Tatar QTronic GmbH, Berlin R.

SIMULATIONS WITH ALGEBRAIC MULTI GRID Chris Gottbrath, Nov 2016 Challenges What is Algebraic

Routing in Networks of Varying Connectivity Andrew Grundy amg@cs.nott.ac.uk

Hand Hygiene Stefan Morton Hand Hygiene Coordinator Evidence Improved adherence to hand

Right hemisphere Motor functions on left side of body Perceives left side of space Left

SCHISM numerical formulation Joseph Zhang Horizontal grid: hybrid 2 3 Side 2 4 3 Side 1

Creators of the finest hand painted wallpapers and fabrics, hand carved furniture and hand painted

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Raise your hand in Zoom Click on Participants Your hand is raised Click hand to lower it

Automatic configuration of IPv6 addresses for MANET with multiple gateways (AMG)

Matrix Elements Lattice 2018: Michigan State University Arjun Singh Gambhir with Evan Berkowitz,

PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases

Participant Reaction and The Performance of Funds Offered by 401(k) Plans By Edwin J. Elton

2009 half-year results 27 July 2009 Forward-looking statements Except for the historical

Parallel solution of large sparse eigenproblems using a Block-Jacobi-Davidson method Melven

Copenhagen, December 5th, 2014 The ECB has a dual mandate Price stability (below, but close to 2

befuddled.org cointagion.com Why you should love bitcoin: As a buyer As a seller How to

Software Libraries for PGMs Kevin Rothi Very popular tools for ML/NNs/Deep Learning... - SciKit