scalable gw software for excited electrons using openatom
play

Scalable GW software for excited electrons using OpenAtom Kavitha - PowerPoint PPT Presentation

Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn


  1. Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric Bohm and Laxmikant Kale University of Illinois at Urbana-Champaign Kayahan Saritas, Minjung Kim and Sohrab Ismail-Beigi Yale University Glenn Martyna Pimpernel Science, Software and Information Technology

  2. Electronic structure calculations Β§ Time independent Schrodinger equation for a many-body system 𝑗ℏ πœ– Ξ¨(𝑒) = + ⟩ ⟩ πœ–π‘’ | 𝐼| Ξ¨(𝑒) Many R i & r j Β§ Density functional theory (DFT) simplifies this to one-body problem Solve for wavefunctions πœ” ! (𝑠) and energies πœ— !

  3. Comparison of the methods Exact SchrΓΆdinger Equation FCI O(N!) CCSD(T) Chemical Chemical O(N 7 ) Accuracy Accuracy QMC Computational Cost O(N 3-4 ) GW Relative Relative HF, DFT O(N 3 ) Energies Energies Transition Transition Tight binding States? States? O(N 3 ) 1 10 100 1,000 10,000 Number of atoms

  4. DFT problem with excitations DFT: ground state . Conduction band . (empty) . πœ— !"# Band gap πœ— ) πœ— ! . Valence band . (filled) . Janak’s theorem πœ–πΉ πœ–πΉ 𝐹 $%& = % βˆ’ % = πœ— !"# βˆ’ πœ— ! πœ–π‘‚ !"' πœ–π‘‚ !('

  5. DFT problem with excitations DFT: ground state . Why band gap/excitations in a material is important? Conduction band . Metallic, semiconducting or insulating? Β§ (empty) . πœ— !"# Light-matter interactions in general Β§ Band gap πœ— ) A lot of engineering implications: PV, lasers, luminescence … Β§ πœ— ! . Valence band Band gaps (eV) . (filled) . Material DFT GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 πœ–πΉ πœ–πΉ 𝐹 $%& = % βˆ’ % = πœ— !"# βˆ’ πœ— ! πœ–π‘‚ !"' πœ–π‘‚ !(' LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25

  6. GW method Challenges Β§ Memory intensive Β§ Much larger number of conduction bands: Huge number of FFTs Β§ Large and dense matrix multiplications Β§ Unfavorable scaling 𝑃(𝑂 4 ) Goal Β§ Efficient and highly scalable GW software Β§ 𝑃(𝑂 3 ) scaling method

  7. What is expensive in GW? ~ 𝑂 * + 𝑂 + 𝑂 , ln 𝑂 , 𝑄 𝑠, 𝑠 ) = - = 𝑃(𝑂 . ) ~ 𝑂 * 𝑂 + 𝑂 , .1234 πœ” * (𝑠)πœ” 0 (𝑠)πœ” * (𝑠 ) )πœ” 0 (𝑠 ) ) +,--./ βˆ’2 4 4 𝐹 * βˆ’ 𝐹 0 * 0 - ln 𝑂 , ~2𝑂 , Lots of FFTs to get πœ” ! (𝑠) functions Β§ However, πœ— "# can converge using a Β§ small r-grid * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

  8. O(N 3 ) algorithm (CTSP) for P CTSP: Complex time shredded propagator > $%"## πœ” <,* > & > ' > "## βˆ— πœ” <,0 πœ” < ! ,0 βˆ— πœ” < ! ,* 𝐡 <,<) 𝐢 <,<) π‘Œ <,< ! = 4 4 𝑄 <,< ! = βˆ’2 4 4 N r2 N unocc N occ ~ N 4 π‘₯ + 𝑏 @ βˆ’ 𝑐 𝐹 0 βˆ’ 𝐹 * A @ A * 0 ' ' ' 1 𝑔(𝜐)𝑓 ") π‘’πœ (1) Laplace transform: 𝑓 " ( ! "( " ) π‘’πœ = * 𝑓 "( ! ) 𝑓 ( " ) π‘’πœ = * = * 𝐹 $ βˆ’ 𝐹 % & & & + # ' 𝑔(𝜐)𝑓 ") π‘’πœ β‰ˆ 0 N r2 N q (N unocc +N occ )~ N 3 (2) Gauss-Laguerre quadrature: * πœ• * 𝑔 𝜐 * & * 𝑂 " 𝑂 !

  9. O(N 3 ) algorithm (CTSP) for P > ( > "## > $%"## βˆ— πœ” <,0 πœ” < ! ,0 βˆ— 𝑄 <,< ! = βˆ’2 4 4 πœ” <,* πœ” < ! ,* 4 πœ• B 𝑔 𝜐 B * 0 B > ( > "## > $%"## βˆ— βˆ— 𝑓 C ) D * ][ 4 𝑓 EC # D * ] = 4 πœ• B [4 πœ” <,* πœ” < ! ,* πœ” <,0 πœ” < ! ,0 N q (N unocc +N occ ) N r2 ~ N 3 B * 0 ( /0 ( 10 (3) Energy windows: ') 𝑄 $,$& = ( ( 𝑄 $,$& ' ) 𝐹 ! a) , ',- , ',# , ',$ , & , *,- , *,# , *,$ , *,. , *,/ " #$ ($ #$ , &; ( = 0) (&'() b) , #$ - ! !,# (*+) , #$ ,,$

  10. Steps for typical GW calculations Most expensive β€’ Real-space P β€’ O(N 3 ) method Also expensive - O(N 4 )

  11. O(N 3 ) method for self-energy J πœ” <I πœ” < ! I > & > ' βˆ— 𝐢 <,< ! 𝐡 <,<) 𝐢 <,<) & : residues GHI = 4 Ξ£ Β± (πœ•) <,< ! 𝐢 $,$ ! π‘Œ <,< ! = 4 4 π‘₯ + 𝑏 @ βˆ’ 𝑐 πœ• βˆ’ 𝐹 I Β± πœ• J πœ• & : energies of the poles of 𝑋(𝑠) $,$' A @ A J,I Β§ πœ• βˆ’ πœ— I Β± πœ• J =0 is possible: Gauss-Laguerre quadrature not applicable Β§ New quadrature is needed and was developed: Hermite-Gauss-Laguerre quadrature L 1 π‘’πœπ‘“ EDED + /N 𝑓 @(OEC % Β±O , )D = 𝐽𝑛 L πœ• βˆ’ 𝐹 I Β± πœ• J K

  12. Results: Energy gap Β§ MgO crystal (16 atoms) Β§ Si crystal (16 atoms) Β§ Number of bands: 433 Β§ Number of bands: 399 Β§ 𝑂 Q* =1, 𝑂 Q0 =4 Β§ 𝑂 Q* =1, 𝑂 Q0 =4 * Kim et al., (2020), Phys. Rev. B., 101, pp. 035139

  13. Performance against other codes Β§ Si crystal (16 atoms) Β§ Number of bands: 399 Β§ 𝑂 JQ =15, 𝑂 IQ =30 http://charm.cs.illinois.edu/OpenAtom/ * Kim et al., (2019), Comput. Phys. Commun., 244, pp. 427-441

  14. OpenAtom GW Parallel Scaling OpenAtom Team

  15. GW-BSE Parallelization Phase Serial Parallel 1 Compute P in Rspace Complete Complete (N 4 and N 3 methods) 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete Future Work 5 COHSEX Self-energy Complete Complete 6 Dynamic Self-energy Complete Future Work 14

  16. GW Phase-I P Matrix Computation (N 4 and N 3 method) Ξ¨ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R 15

  17. Parallel Decomposition: Input state vectors Duplicate occupied and unoccupied states on each node ψ ψ ψ ψ ψ 16

  18. Computation of Pmatrix using N 3 method β€’ Outer loops are windows of occupied and unoccupied states β€’ Most expensive computation - 𝜍 and 𝜍 ) matrices for l = 1:Nvw for m = 1:Ncw for j = 1:Nquad lm calculate 𝜍 01')! calculate 𝜍 &01')! P[r,r’] += 𝜍 01')! [r,r’] x 𝜍 &01')! [r,r’]

  19. Computation 𝜍 matrix (Using occupied states) β€’ State vectors are represented with ψ β—‹ Number of occupied states = L, each state has N elements β—‹ All occupied states can be represented as a matrix ψ V [1: L][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ V and all ψ VT 𝜍 2345) -> Add elements of outer product of ψ V [1:L] ZGEMM ( ψ VT [1: N][1:L] , ψ V [1: L][1:N]) (i.e matrix multiply ) for l=1:L for r=1:N for r=1:N for r’=1:N for r’=1:N 𝜍 2345) [r,r’] += ψ V [l] T [r] x ψ V [l][r’] for l=1:L 𝜍 2345) [r,r’] += ψ VT [r] [l] x ψ V [l][r’]

  20. Computation 𝜍 ’ matrix (Using unoccupied states) β€’ Number of unoccupied states = M, each state has N elements β€’ All unoccupied states can be represented as a matrix ψ C [1:M ][1:N]) 𝜍 2345) -> Same as ZGEMM of all ψ C and all ψ CT 𝜍 2345) -> Add elements of outer product of ψ C [1:M] ZGEMM ( ψ CT [1: N][1:M] , ψ C [1:M ][1:N]) (i.e matrix multiply ) for m=1:M for r=1:N for r=1:N for r’=1:N for r’=1:N πœβ€² 2345) [r,r’] += ψ C [m] T [r] x ψ C [m][r’] for m=1:M πœβ€² 2345) [r,r’] += ψ CT [r] [m] x ψ C [m][r’]

  21. Computation of P-matrix (tiled) (N 3 ) Occupied states ψ V (1:L) L Unoccupied states ψ C (1:M) M N N N N M L (ZGEMM) (ZGEMM) P Matrix 𝜍 matrix 𝜍 ’ matrix N (Element-wise multiply) N N of 𝜍 & 𝜍 ’ matrix N N N

  22. Performance of N 3 method Intel KNL nodes (Stampede2) 10000 N 4 method N 3 method Execution Time β€’ N 3 method is an order faster than 1000 N 4 method for Si108 atoms dataset β—‹ 20k X 20k output matrix size 100 8 16 32 64 β€’ Scales well on Intel KNL and Node count (128 cores per node) SkyLake nodes Intel Skylake nodes (Stampede2) β€’ Future scaling results for larger 10000 N 4 method N 3 method datasets Execution Time 1000 100 10 8 16 32 64 Node count (48 cores per node)

  23. Questions?

Recommend


More recommend