Scalable GW software for excited electrons using OpenAtom Kavitha - PowerPoint PPT Presentation

Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn Martyna Pimpernel Science, Software and Information Technology

Electronic structure calculations § Time independent Schrodinger equation for a many-body system 𝑗ℏ 𝜖 Ψ(𝑢) = + ⟩ ⟩ 𝜖𝑢 | 𝐼| Ψ(𝑢) Many R i & r j § Density functional theory (DFT) simplifies this to one-body problem Solve for wavefunctions 𝜔 ! (𝑠) and energies 𝜗 !

Comparison of the methods Exact Schrödinger Equation FCI O(N!) CCSD(T) Chemical Chemical O(N 7 ) Accuracy Accuracy QMC Computational Cost O(N 3-4 ) GW Relative Relative HF, DFT O(N 3 ) Energies Energies Transition Transition Tight binding States? States? O(N 3 ) 1 10 100 1,000 10,000 Number of atoms

DFT problem with excitations DFT: ground state . Conduction band . (empty) . 𝜗 !"# Band gap 𝜗 ) 𝜗 ! . Valence band . (filled) . Janak’s theorem 𝜖𝐹 𝜖𝐹 𝐹 $%& = % − % = 𝜗 !"# − 𝜗 ! 𝜖𝑂 !"' 𝜖𝑂 !('

DFT problem with excitations DFT: ground state . Why band gap/excitations in a material is important? Conduction band . Metallic, semiconducting or insulating? § (empty) . 𝜗 !"# Light-matter interactions in general § Band gap 𝜗 ) A lot of engineering implications: PV, lasers, luminescence … § 𝜗 ! . Valence band Band gaps (eV) . (filled) . Material DFT GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 𝜖𝐹 𝜖𝐹 𝐹 $%& = % − % = 𝜗 !"# − 𝜗 ! 𝜖𝑂 !"' 𝜖𝑂 !(' LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25

GW method Challenges § Memory intensive § Much larger number of conduction bands: Huge number of FFTs § Large and dense matrix multiplications § Unfavorable scaling 𝑃(𝑂 4 ) Goal § Efficient and highly scalable GW software § 𝑃(𝑂 3 ) scaling method

What is expensive in GW? ~ 𝑂 * + 𝑂 + 𝑂 , ln 𝑂 , 𝑄 𝑠, 𝑠 ) = - = 𝑃(𝑂 . ) ~ 𝑂 * 𝑂 + 𝑂 , .1234 𝜔 * (𝑠)𝜔 0 (𝑠)𝜔 * (𝑠 ) )𝜔 0 (𝑠 ) ) +,--./ −2 4 4 𝐹 * − 𝐹 0 * 0 - ln 𝑂 , ~2𝑂 , Lots of FFTs to get 𝜔 ! (𝑠) functions § However, 𝜗 "# can converge using a § small r-grid * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

O(N 3 ) algorithm (CTSP) for P CTSP: Complex time shredded propagator > $%"## 𝜔 <,* > & > ' > "## ∗ 𝜔 <,0 𝜔 < ! ,0 ∗ 𝜔 < ! ,* 𝐵 <,<) 𝐶 <,<) 𝑌 <,< ! = 4 4 𝑄 <,< ! = −2 4 4 N r2 N unocc N occ ~ N 4 𝑥 + 𝑏 @ − 𝑐 𝐹 0 − 𝐹 * A @ A * 0 ' ' ' 1 𝑔(𝜐)𝑓 ") 𝑒𝜐 (1) Laplace transform: 𝑓 " ( ! "( " ) 𝑒𝜐 = * 𝑓 "( ! ) 𝑓 ( " ) 𝑒𝜐 = * = * 𝐹 $ − 𝐹 % & & & + # ' 𝑔(𝜐)𝑓 ") 𝑒𝜐 ≈ 0 N r2 N q (N unocc +N occ )~ N 3 (2) Gauss-Laguerre quadrature: * 𝜕 * 𝑔 𝜐 * & * 𝑂 " 𝑂 !

O(N 3 ) algorithm (CTSP) for P > ( > "## > $%"## ∗ 𝜔 <,0 𝜔 < ! ,0 ∗ 𝑄 <,< ! = −2 4 4 𝜔 <,* 𝜔 < ! ,* 4 𝜕 B 𝑔 𝜐 B * 0 B > ( > "## > $%"## ∗ ∗ 𝑓 C ) D * ][ 4 𝑓 EC # D * ] = 4 𝜕 B [4 𝜔 <,* 𝜔 < ! ,* 𝜔 <,0 𝜔 < ! ,0 N q (N unocc +N occ ) N r2 ~ N 3 B * 0 ( /0 ( 10 (3) Energy windows: ') 𝑄 $,$& = ( ( 𝑄 $,$& ' ) 𝐹 ! a) , ',- , ',# , ',$ , & , *,- , *,# , *,$ , *,. , *,/ " #$ ($ #$ , &; ( = 0) (&'() b) , #$ - ! !,# (*+) , #$ ,,$

Steps for typical GW calculations Most expensive • Real-space P • O(N 3 ) method Also expensive - O(N 4 )

O(N 3 ) method for self-energy J 𝜔 <I 𝜔 < ! I > & > ' ∗ 𝐶 <,< ! 𝐵 <,<) 𝐶 <,<) & : residues GHI = 4 Σ ± (𝜕) <,< ! 𝐶 $,$ ! 𝑌 <,< ! = 4 4 𝑥 + 𝑏 @ − 𝑐 𝜕 − 𝐹 I ± 𝜕 J 𝜕 & : energies of the poles of 𝑋(𝑠) $,$' A @ A J,I § 𝜕 − 𝜗 I ± 𝜕 J =0 is possible: Gauss-Laguerre quadrature not applicable § New quadrature is needed and was developed: Hermite-Gauss-Laguerre quadrature L 1 𝑒𝜐𝑓 EDED + /N 𝑓 @(OEC % ±O , )D = 𝐽𝑛 L 𝜕 − 𝐹 I ± 𝜕 J K

Results: Energy gap § MgO crystal (16 atoms) § Si crystal (16 atoms) § Number of bands: 433 § Number of bands: 399 § 𝑂 Q* =1, 𝑂 Q0 =4 § 𝑂 Q* =1, 𝑂 Q0 =4 * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

Performance against other codes § Si crystal (16 atoms) § Number of bands: 399 § 𝑂 JQ =15, 𝑂 IQ =30 http://charm.cs.illinois.edu/OpenAtom/ * Kim et al., (2019), Comput. Phys. Commun., 244, pp. 427-441

OpenAtom GW Parallel Scaling OpenAtom Team

GW-BSE Parallelization Phase Serial Parallel 1 Compute P in Rspace Complete Complete (N 4 and N 3 methods) 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete Future Work 5 COHSEX Self-energy Complete Complete 6 Dynamic Self-energy Complete Future Work 14

GW Phase-I P Matrix Computation (N 4 and N 3 method) Ψ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R 15

Parallel Decomposition: Input state vectors Duplicate occupied and unoccupied states on each node ψ ψ ψ ψ ψ 16

Computation of Pmatrix using N 3 method • Outer loops are windows of occupied and unoccupied states • Most expensive computation - 𝜍 and 𝜍 ) matrices for l = 1:Nvw for m = 1:Ncw for j = 1:Nquad lm calculate 𝜍 01')! calculate 𝜍 &01')! P[r,r’] += 𝜍 01')! [r,r’] x 𝜍 &01')! [r,r’]

Computation 𝜍 matrix (Using occupied states) • State vectors are represented with ψ ○ Number of occupied states = L, each state has N elements ○ All occupied states can be represented as a matrix ψ V [1: L][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ V and all ψ VT 𝜍 2345) -> Add elements of outer product of ψ V [1:L] ZGEMM ( ψ VT [1: N][1:L] , ψ V [1: L][1:N]) (i.e matrix multiply ) for l=1:L for r=1:N for r=1:N for r’=1:N for r’=1:N 𝜍 2345) [r,r’] += ψ V [l] T [r] x ψ V [l][r’] for l=1:L 𝜍 2345) [r,r’] += ψ VT [r] [l] x ψ V [l][r’]

Computation 𝜍 ’ matrix (Using unoccupied states) • Number of unoccupied states = M, each state has N elements • All unoccupied states can be represented as a matrix ψ C [1:M ][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ C and all ψ CT 𝜍 2345) -> Add elements of outer product of ψ C [1:M] ZGEMM ( ψ CT [1: N][1:M] , ψ C [1:M ][1:N]) (i.e matrix multiply ) for m=1:M for r=1:N for r=1:N for r’=1:N for r’=1:N 𝜍′ 2345) [r,r’] += ψ C [m] T [r] x ψ C [m][r’] for m=1:M 𝜍′ 2345) [r,r’] += ψ CT [r] [m] x ψ C [m][r’]

Computation of P-matrix (tiled) (N 3 ) Occupied states ψ V (1:L) L Unoccupied states ψ C (1:M) M N N N N M L (ZGEMM) (ZGEMM) P Matrix 𝜍 matrix 𝜍 ’ matrix N (Element-wise multiply) N N of 𝜍 & 𝜍 ’ matrix N N N

Performance of N 3 method Intel KNL nodes (Stampede2) 10000 N 4 method N 3 method Execution Time • N 3 method is an order faster than 1000 N 4 method for Si108 atoms dataset ○ 20k X 20k output matrix size 100 8 16 32 64 • Scales well on Intel KNL and Node count (128 cores per node) SkyLake nodes Intel Skylake nodes (Stampede2) • Future scaling results for larger 10000 N 4 method N 3 method datasets Execution Time 1000 100 10 8 16 32 64 Node count (48 cores per node)

Questions?

Scalable GW software for excited electrons using OpenAtom Kavitha - PowerPoint PPT Presentation

Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn

EXCITED EXCITED Report from the EXCITED Workshop held in Arlington, VA An NSF An NSF February

Linux and real-time Thomas Petazzoni Free Electrons 1 Free Electrons . Kernel, drivers and

Molecular and Electronic Dynamics Using the OpenAtom Software Sohrab Ismail-Beigi (Yale Applied

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

Electroluminescence The result of radiative recombination of electrons and holes in a material.

Anatomy of cross-compilation toolchains Thomas Petazzoni thomas.petazzoni@free-electrons.com

ARM support in the Linux kernel Thomas Petazzoni Free Electrons

Quantization of Spatial (Orbital) Angular Momentum of Electrons in the atom Na/Ag atoms: ns1:

Projector Augmented Wave based Kohn Sham Density Functional Theory in OpenAtom with N 2 log

OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and

Cache Coherence in Scalable Machines Scalable Cache Coherent Systems Scalable, distributed

Using Yocto Project for module manufacturers Alexandre Belloni Free Electrons

Chemistry beyond ground state Excited state phenomena Fluorescence spectra are mirror images

Large Deviations and Slowdown Asymptotics of Excited Random Walks Jonathon Peterson Department

First excited state of a 100*100 box vs number of iterations (N) Graphing: Starting from a

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T.

Working Principle of a Semiconductor Based Solar Cell Band Gap II - Electrons in Molecular Bonds

Lecture 02 Digital Modulation I-Hsiang Wang ihwang@ntu.edu.tw National Taiwan University

Red-Edge Bands for Wetland Classification Gordana KAPLAN, PhD Assoc. Prof. Dr. Ugur AVDAN

Violation of bulk-edge correspondence in a hydrodynamic model Gian Michele Graf ETH Zurich PhD

158 SiO 2 can experience visco-elastic flow at high temperature. Viscosity is the resistance shown

nearly free electron approxima0on Bloch equa0on E ! ( b p + ~ ~ k ) 2

4. Electrons in a solid 3. Crystal vibrations 4.2. Free-electron gas 3. Crystal vibrations 90

The FP-LAPW and APW+lo bandstructure methods as implemented in WIEN2k Peter Blaha Institute of

Sambuz

Useful Links

Newsletter

Mail Us