OpenAtom: First Principles GW method for electronic excitation - PowerPoint PPT Presentation

OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and Sohrab Ismail-Beigi Yale University Eric Mikida, Kavitha Chandrasekar, Eric Bohm, Nikhil Jain, and Laxmikant Kale University of Illinois at Urbana-Champaign Qi Li and Glenn Martyna IBM T.J. Watson Research Center

Density Functional Theory (DFT) Energy functional E [ n ] of electron density n ( r ) Minimizing over n ( r ) gives exact ‣ Ground-state energy E 0 ‣ Ground-state density n ( r ) equivalent to Kohn-Sham equations Minimum condition LDA/GGA for E xc : good geometries and total energies § Bad band gaps and excitations § Hohenberg & Kohn, Phys. Rev. (1964); Kohn and Sham, Phys. Rev. (1965).

DFT: problems with excitations Energy gaps (eV) Material LDA Expt. [1] Diamond 3.9 5.48 Si 0.5 1.17 LiCl 6.0 9.4 [1] Landolt-Bornstien, vol. III; Baldini & Bosacchi, SrTiO 3 2.0 3.25 Phys. Stat. Solidi (1970). Solar spectrum

DFT: problems with energy alignment Interfacial systems: § Electrons can transfer across e - § Depends on energy level alignment across interface § DFT has errors in band energies § Is any of it real?

One particle Green’s function ( r ’ ,0) ( r,t ) Dyson Equation: DFT:

Green’s function successes Quasiparticle gaps (eV) Material LDA GW Expt. Diamond 3.9 5.6* 5.48 Si 0.5 1.3* 1.17 LiCl 6.0 9.1* 9.4 SrTiO 3 2.0 3.4-3.8 3.25 * Hybertsen & Louie, Phys. Rev. B (1986) Band structure of Cu Strokov et al ., PRL/PRB (1998/2001)

What is a big system for GW? P3HT polymer § Band alignment for this potential photovoltaic system? § 100s of atoms/unit cell § Not possible routinely (with current software) Zinc oxide nanowire

GW is expensive Scaling with number of atoms N DFT: N 3 But in practice the GW is the killer GW: N 4 (gives better bands) BSE: N 6 (gives optical excitations) a nanoscale system with 50-75 atoms (GaN) DFT: 1 cpu x hours ∴ Focus on GW GW: 91 cpu x hours BSE: 2 cpu x hours

Steps for typical G 0 W 0 calculation Stage 1 : Run DFT calc. on structure à output : ε i and 𝜔 i ( r ) P ( r, r 0 ) = @ n ( r ) Stage 2.1 : compute Polarizability matrix @ V ( r 0 ) Stage 2.2 : double FFT rows and columns à P(G,G’) Stage 3 : compute and invert dielectric screening function p p → ✏ − 1 ✏ = I − V coul ∗ P ∗ V coul Stage 4 : “plasmon-pole” method à dynamic screening → ✏ − 1 ( ! ) Stage 5 : put together ε i , 𝜔 i ( r ) and à self-energy 𝛵 ( 𝜕 ) ✏ − 1 ( ! )

What is so expensive in GW? One key element : response of electrons to perturbation P ( r,r’ ) = Response of electron density n ( r ) at position r to change of potential V ( r’ ) at position r’

What is so expensive in GW? One key element : response of electrons to perturbation Standard perturbation theory expression Problems: 1. Must generate “all” empty states (sum over c ) 2. Lots of FFTs to get functions 𝜔 i ( r ) functions 3. Enormous outer produce to form P 4. Dense r grid : P huge in memory

Computing P in Charm++ * for all l, m Basic Computation: f lm = ψ l × ψ m † for all f P += f lm f lm Parallel decomposition: Ψ Vectors 1D Chare Array L occupied M unoccupied … R P Matrix 2D Tiles 2D Chare Array R R

Computing P in Charm++ 1.Duplicate occupied states on each node ψ ψ ψ

Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors ψ ψ ψ ψ

Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile P P P P P P P P P

Computing P in Charm++ 1.Duplicate occupied states on each node 2.Broadcast an unoccupied state to compute f vectors 3.Locally update each matrix tile 4.Repeat step 2 for next unoccupied state

Parallel performance: P calculation § 108 atom bulk Si § 216 occupied § 1832 unoccupied § 1 k point § 32 processors per node § FFT grids: same accuracy OA 42x42x22 BGW 111x55x55 Supercomputer : Mira (ANL) : BQ BlueGene/Q

Parallel performance: P calculation § 108 atom bulk Si Scaling/on/BlueWaters/ 1000 § 216 occupied 32/cores/per/node § 1832 unoccupied 100 Time(Sec) § 1 k point § 32 processors per node 10 OpenAtom BerkeleyGW1.2 § FFT grids: same accuracy 1 OA 42x42x22 1 10 100 1000 10000 Number/of/Nodes BGW 111x55x55 Supercomputer : Blue Waters (NCSA) : Cray XE6

Reducing the scaling: quartic to cubic & ×𝑂 ( ×𝑂 ) § O(N 4 ) = 𝑂 % § Sum-over-state (i.e., sum over unoccupied c band) not to blame: removal of unocc. states still O(N 4 ) but lower prefactor* § Working in r-space can reduce to O(N 3 ) [see also †] * Bruneval and Gonze, PRB 78 (2008); Berger, Reining, Sottile, PRB 82 (2010) * Umari, Stenuit, Baroni, PRB 81 , (2010) * Giustino, Cohen, Louie, PRB 81 , (2010) * Wilson, Gygi, Galli, PRB 78 , (2008); Govoni, Galli, J. Chem. Th. Comp ., 11 (2015) * Gao, Xia, Gao, Zhang, Sci. Rep. 6 (2016) † Foerster, Koval, Sanchez-Portal, JCP 135 (2011) † Liu, Kaltak, Klimes and Kresse, PRB 94 , (2016)

� � � � What’s special about r-space? Quasi-philosophical: all basis good in quantum mechanics, why is r-space special? Observable is diagonal in the best basis Practical: P is separable in r-space 4 1 = 1 𝑒𝑦 𝑓 < = > <= @ ? 𝜗 ) − 𝜗 ( 5 4 𝑄 𝑠, 𝑠 - = −2 1 𝑒𝑦 ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? 6 𝜔 ( (𝑠)𝜔 ( ∗ (𝑠′)𝑓 = @ ? 6 𝜔 ) 5 ) ( separable G H 4 1 𝑔(𝑨)𝑓 <D Gauss-Laguerre quadrature: 𝑒𝑦 ≈ 6 𝜕 F 𝑔 𝑨 F 5 F G H 𝑄 𝑠, 𝑠 - = −2 6 𝜕 F 𝑓 ? L ∗ (𝑠)𝜔 ) (𝑠′)𝑓 <= > ? L 6 𝜔 ( (𝑠)𝜔 ( & 𝑂 M (𝑂 ) +𝑂 ( ) ∝ 𝑂 P ∗ (𝑠′)𝑓 = @ ? L 𝑂 M is intensive 𝑂 % 6 𝜔 ) F ) (

Windowed cubic Laplace method 50 § N GL depends on U VS E bw = E cmax - E vmin 40 U WXY 30 N GL § Largest error: 𝐹 ) − 𝐹 ( = 𝐹 [ or 𝐹 \] 20 10 0 0 100 200 300 400 500 E bw /E g 𝑄 = 𝑄 + 𝑄 &T + 𝑄 + 𝑄 && Example: 2 by 2 windows • TT T& 𝑄 &T {E v } 1 {E c } 2 {E v } 2 {E c } 1 E E c,max E v,max E c,min E v,min G S@ G S> N wv : # windows for E v 𝑄 𝑠, 𝑠 - = 6 6 𝑄 QR (𝑠, 𝑠 - ) N wc : # of windows for E c Q R Save computation: small N GL for each window pair § Especially for materials with small band gaps §

� Estimate the computational costs Computation cost can be estimated with E bw and E g : G @S G >S R^? − 𝐹 (Q R^? − 𝐹 )R QR R`b R`b 𝐹 \] 𝐹 (Q R`b 𝑂 ( − 𝐹 )R 𝐷 ∝ 6 6 R`b 𝑂 ) R^? − 𝐹 ( R^? − 𝐹 ) QR 𝐹 [ 𝐹 ( 𝐹 ) Q R Example: 2x2 window Real computational costs Estimated computational costs × 10 4 2.5 200 ∗ − 𝐹 (,R`b 𝐹 (,%^_`a = 𝐹 ( 2 150 ∗ 𝐹 (,R^? − 𝐹 ( C simple C elab 100 1.5 ∗ − 𝐹 ),R`b 𝐹 ),%^_`a = 𝐹 ) 50 1 ∗ 𝐹 ),R^? − 𝐹 ) 0 0.5 9 9 9 9 1 1 1 1 Ec ratio Ev ratio Ec ratio Ev ratio

Windowed Laplace: example § Si crystal (16 atoms) § MgO crystal (16 atoms) § Number of bands: 399 § Number of bands: 433 § 𝑂 ]( =1, 𝑂 ]) =4 § 𝑂 ]( =1, 𝑂 ]) =4 d\a(e %^_`a Compared to O(N 4 ) method, for bigger system ratio is G ^_ Tf ⁄

Do I care in practice? Correct practical comparison: • Our N 3 method vs. available N 4 method with acceleration • Crossover is at very few atoms: N 3 method already competitive for small systems • 2 atoms Si , 8 k-points • Yambo N 4 GW software • BG* acceleration * Bruneval & Gonze, PRB 78 (2008)

� � Windowed Laplace method for self-energy Dynamic GW self-energy: m 𝜔 %b 𝜔 % i b ∗ 𝐶 %,% i jkb = 6 m : residues Σ(𝜕) %,% i 𝐶 %,% i 𝜕 − 𝜗 b + 𝑡𝑕𝑜(𝜈 − 𝜗 b )𝜕 m 𝜕 m : energies of the poles of 𝑋(𝑠) %,%- m,b 𝐺 𝑦 = 1 m 𝜔 %b 𝜔 % i b ∗ = 6 𝐶 %,% i 𝐺(𝜕 − 𝜗 b ± 𝜕 m ) 𝑦 m,b 1 1 Gauss-Laguerre quadrature not < 0 > 0 OR 𝜕 − 𝜗 b ± 𝜕 m 𝜕 − 𝜗 b ± 𝜕 m appropriate G YS G xS R`b ≤ 𝜕 − 𝜗 b < 𝑓 R R^? 𝑓 R Σ(𝜕) = 6 6 Σ(𝜕) QR R`b ≤ ±𝜕 m < Ω Q R^? Ω Q Q R

New quadrature for overlapping windows New quadrature Size of quadrature grid n q n q % error ( 𝒇 <𝒘<𝒘 𝟑 /𝟑 ) ( 𝒇 <𝒘 ) 5 6 1 1 24 1 0.1 124 5 0.01 547 15 4 𝐺 𝑦 = 𝐽𝑛 1 𝑥 𝑤 𝑓 `(? 𝑒𝑤 0.001 2216 36 5 𝑥 𝑤 = 𝑓 <( 𝑥 𝑤 = 𝑓 <(<( } /&

Results - G 0 W 0 gap § Si crystal (16 atoms) § Number of bands: 399 § 𝑂 m] =15, 𝑂 b] =30 Si 1.65 Laplace+windowing N 4 1.6 G 0 W 0 E g (eV) 1.55 1.5 1.45 1.4 1.35 0 0.1 0.2 0.3 0.4 0.5 ratio of computation to N 4 method

Where we are with OpenAtom GW Phase Serial Parallel 1 Compute P in RSpace Complete Complete 2 FFT P to GSpace Complete Complete 3 Invert epsilon Complete Complete 4 Plasmon pole Complete In Progress 5 COHSEX self-energy Complete Complete 6 Dynamic self-energy Complete In Progress 7 Coulomb Truncation Future Future Aim to release parallel COHSEX version late spring 2018

OpenAtom: First Principles GW method for electronic excitation - PowerPoint PPT Presentation

OpenAtom: First Principles GW method for electronic excitation Minjung Kim, Subhasish Mandal, and Sohrab Ismail-Beigi Yale University Eric Mikida, Kavitha Chandrasekar, Eric Bohm, Nikhil Jain, and Laxmikant Kale University of Illinois at

First-principles electronic transport calculations Electronic transport in nano-scale

OpenAtom: Fast, fine grained parallel electronic structure software for materials science,

Molecular and Electronic Dynamics Using the OpenAtom Software Sohrab Ismail-Beigi (Yale Applied

Method Dispatch in Java Principles of Software System Construction Principles of Software System

Projector Augmented Wave based Kohn Sham Density Functional Theory in OpenAtom with N 2 log

Making the Lanczos method work for electronic structure calculations Kesheng Wu Andrew Canning

Scalable GW software for excited electrons using OpenAtom Kavitha Chandrasekar, Eric Mikida, Eric

Mechatronics Project Presentation An Inexpensive Electronic Method for Measuring Takeoff

h-P discontinuous Galerkin finite element method for electronic structure calculations Carlo

Lanczos method 1D tight-binding model O(N) Krylov subspace method Applications

Linear Scaling Three Dimensional Fragment Method for Large Scale Electronic Structure

principles calculation of oxide electronic structure Trinh Weng Yan Amber Yao Xiaotong Dunman

Principles and Practice of Electronic Brachytherapy Jessica Hiatt MS IAEA ICARO, Vienna, Austria

Contents 1. Actors 2. Principles 3. Method 4. In a few words 2 1. Actors GRUNDTVIG

Best Practices in Electronic Record Retention A. Principles For Document Management Policies

Electronic Industries Co. Electronic Industries Co. Electronic Industries Co. Baghdad- Iraq

ACM SIGecom ecom: Electronic Commerce http://www.acm.org/sigecom dedicated to the

First steps in the formalization of convex polyhedra in Coq Solvers Principles and

ELECTRONIC RELEASE and MANDATORY ELECTRONIC SUBMISSION MAY 2010 ELECTRONIC RELEASE BACKGROUND

Design Principles and Usability Heuristics Heuristic Evaluations: An introspective method

Design Principles and Usability Heuristics Heuristic Evaluations: An introspective method

Principles Principles Principles Principles of a well of a well of a well of a well- - -

The Scientific Method The Scientific Method The Scientific Method involves 6 steps: Problem

Thickness Design 1972 AASHTO Method AASHTO Method Pavement engineers recognized early that