Scalable Algorithms for Electronic Structure Calculations on Petascale Computers François Gygi University of California, Davis fgygi@ucdavis.edu http://eslab.ucdavis.edu Supported by NSF ITR-HECURA-0749217 and DOE-SciDAC RANMEP2008 Workshop, NCTS, Taiwan, Jan 6, 2008 1
Outline • First-Principles simulations • Eigenvalue problems in electronic structure calculations • Localized representations of solutions and simultaneous diagonalization problem • Data compression through simultaneous diagonalization FG 2
First-Principles Simulations • Goal: Simulate molecules, solids, liquids, from first principles, without input from experiments • The approach: Molecular dynamics: an atomic-scale simulation method – Compute the trajectories of all atoms – extract statistical information from the trajectories Atoms move according to Newton’s law: �� = m R F i i i FG 3
First-Principles Simulations • Why “First-Principles”? – Avoid empirical models and adjustable parameters • Goal: applications to extreme conditions (high pressure, etc.) where no experimental data is available – Use fundamental principles: Quantum Mechanics – Must describe ions and electrons consistently and simultaneously At each time step: 1) Compute the electronic structure 2) Derive interatomic forces 3) Move atoms FG 4
First-Principles Simulations • Applications – Chemistry – Nanotechnology – Semiconductors – Biochemistry – High-pressure physics Growth of a carbon nanotube Biotin on silicon carbide on an iron catalyst Ice-water interface Silicon quantum dot FG 5
First-Principles Simulations • The computation of the electronic structure is the most expensive part of the simulation >99% of CPU time At each time step: 1) Compute the electronic structure 2) Derive interatomic forces 3) Move atoms FG 6
First-principles simulations require large computing resources • Cost of one time step scales as O( n 3 ) – n: number of electrons • Many time steps required / long simulations • Requires use of large-scale parallel platforms – target: O(10 4 ) to O(10 5 ) CPUs • Focus on scalable algorithms – communication cost is primary concern FG 7
Using large computers: BlueGene/L • 65,536 nodes, 128k CPUs • 3D torus network System (64 cabinets, 64x32x32) • 512 MB/node Cabinet (32 Node boards, 8x8x16) • 367 TFlop peak Node Board (32 chips, 4x4x2) 16 Compute Cards Compute Card 180/360 TF/s (2 chips, 2x1x1) 16 TB DDR Chip (2 processors) 2.9/5.7 TF/s 256 GB DDR 90/180 GF/s 8 GB DDR 5.6/11.2 GF/s 2.8/5.6 GF/s 0.5 GB DDR 4 MB FG 8
Computing the electronic structure • Kohn-Sham equations – solutions φ i represent electronic wavefunctions (one per electron) ( ) ϕ ∈ � 2 3 L i ϕ = −Δ ϕ + ρ ϕ = ε ϕ = ⎧ … ( , ) 1 H V i n r i i i i i ⎪ ′ ρ ( ) r ∫ ⎪ ′ ρ = + + ρ ∇ ρ ( , ) ( ) ( ( ), ( )) V V d V r r r r r ′ − ion XC ⎪ r r ⎪ ⎨ n ∑ ⎪ 2 ρ = ϕ ( ) ( ) r r ⎪ i = 1 i ⎪ ∫ ∗ ϕ ϕ = δ ( ) ( ) ⎪ r r d r ⎩ j ij i FG 9
Computing the electronic structure • Solutions are represented as Fourier series = ∑ ϕ ⋅ i q r ( ) r c e , j q j 2 < q E cut • A set of solutions is represented by an (orthogonal) ( m x n ) matrix of complex Fourier coefficients = Y c , ij q j i • Dimensions of Y : 10 6 x10 4 • Note: typically m / n ~ 100 FG 10
Computing the electronic structure • The energy is invariant under unitary transformations of Y ( ) [ ] ( ) = + F ρ T tr E Y Y HY = ∑ 2 ρ ϕ ( ) ( ) r r j j ( ) ( ) , = unitary E Y E YQ Q FG 11
Electronic structure calculation: (with fixed potential) • Invariant subspace computation Find Y such that: = Λ HY Y × × × ∈ ∈ Λ∈ � � � m m m n n n , , H Y – H is sparse – Cost of computing Hx: O(m log m) (involves Fast Fourier Transforms) FG 12
Electronic structure calculation: (with fixed potential) • Iterative methods for invariant subspace computations – Variants of Jacobi-Davidson – DIIS (a.k.a. Anderson acceleration) • Simple, diagonal preconditioning works well • Robustness of eigensolvers is key FG 13
Preconditioned steepest descent 1) correction ( ) = + β − T : Y Y K I YY HY 2) orthogonalization FG 14
Preconditioned DIIS ( ) Δ = − T 1) descent direction K I Y Y HY k k k k ( ) Δ Δ − Δ T 2) update tr − θ = k k k 1 Δ − Δ − 1 k k F ( ) = + θ − Y Y Y Y − 1 k k k k ( ) Δ = Δ + θ Δ − Δ − 1 k k k k = + β Δ Y Y + 1 k k k 3) orthogonalization FG 15
Self-consistent electronic structure computation • H depends non-linearly on the solution Y (through ρ ) • Fixed point iteration: repeat { ( ) ρ = T YY 1) compute charge density i ii 2) solve ρ = Λ ( ) H Y Y } until converged (i.e. ρ does not change) • Convergence can be accelerated using various charge-mixing schemes (e.g. Broyden) FG 16
Molecular Dynamics: solve the SCF problem at each time step • H is time-dependent (depends on positions of atoms) for each time step t { repeat { ( ) ρ = T YY 1) compute charge density i ii 2) solve ρ = Λ ( , ) ( ) ( ) ( ) H t Y t Y t t } until converged compute forces, move atoms } FG 17
Molecular Dynamics: using previous solutions optimally • Computing Y(t) – The previous solution Y(t-dt) is “close” to Y(t ), can be used as initial guess for iterative calculation of Y(t) Y + 1 k Y k Y − 1 k FG 18
Molecular Dynamics: using previous solutions optimally • Computing Y(t) – The previous solution Y(t-dt) is “close” to Y(t ), can be used as initial guess for iterative calculation of Y(t) � = − – The extrapolated subspace 2 Y Y Y − 1 k k is a better initial guess � = − 2 Y Y Y − 1 k k Y + 1 k Y k Y − 1 k FG 19
Molecular Dynamics: using previous solutions optimally • Subspace alignment – The eigensolver introduces arbitrary rotations in Y(t) – Extrapolation must be preceded by subspace alignment – Orthogonal Procrustes problem − = T min Y Y Q Q Q I − 1 k k Q � = − 2 Y Y Y Q − 1 k k Y + 1 k Y k Y − 1 k FG 20
Subspace alignment − Orthogonal Procrustes problem: minimize Y Y Q − 1 k k 1) Compute the polar decomposition ≡ = T Y Y A UH k − 1 k where U is unitary, H hermitian. = 2) rotation of Y k-1 : Y Y U − − 1 1 k k FG 21
Polar decomposition Polar decomposition A=UH (Higham ‘86) = X A 0 ( ) ( ) ∗ − = + 1 1 X X X + 1 k 2 k k converges quadratically to the unitary polar factor U Need better, inverse-free, scalable algorithm FG 22
Outline • First-Principles simulations • Eigenvalue problems in electronic structure calculations • Localized representations of solutions and simultaneous diagonalization problem • Data compression through simultaneous diagonalization FG 23
Localized representations of the invariant subspace • Linear combinations of electronic wavefunctions that minimize the spatial spread are called “Maximally Localized Wannier Functions” (MLWF) 2 σ = − 2 x x ˆ X • MLWFs are used to compute the electronic polarization in crystals • Computing MLWFs during a molecular dynamics simulation yields the infrared absorption spectrum N. Marzari and D. Vanderbilt, Phys. Rev. B56, 12847 (1997) R. Resta, Phys. Rev. Lett. 80, 1800 (1998) FG 24
Spread Functionals • Spread of a wavefunction associated with an operator  ( ) 2 ( ) σ φ = φ ˆ − φ ˆ φ φ 2 | | | | A A ˆ A 2 ˆ ˆ = φ φ − φ φ 2 | | | | A A • Spread of a set of wavefunctions associated with an operator  = ∑ ( ) { } ( ) σ φ σ φ 2 2 ˆ ˆ i i A A i FG 25
Spread Functionals • The spread is not invariant under orthogonal transformations ∑ ψ = φ ∈ × � n n orthogonal x X i ij j j ( ) ( ) { } { } σ ψ ≠ σ φ 2 2 ˆ ˆ i i A A • There exists a matrix X that minimizes the spread FG 26
Spread Functionals • Let × ˆ ˆ ∈ = = � 2 n n , | | | | A B a i A j b i A j ij ij ( ) ( ) n ( ) − ∑ { } 2 σ ψ = 2 T T tr X BX X AX ˆ i A ii = 1 i ( ) n ∑ 2 • Minimize the spread = maximize T X AX ii = 1 i = diagonalize A FG 27
Spread Functionals • Case of multiple operators ˆ = ( ) … k operators 1, , A k m = ( ) … k matrices 1, , A k m = ∑∑ ( ) { } ( ) σ ψ σ ψ 2 2 ˆ ˆ ( ) i k i A A i k ( ) n ∑∑ 2 ( ) T k X A X • Minimize the spread = maximize ii = 1 i k = joint approximate diagonalization of the matrices A (k) FG 28
Spread Functionals • Example of multiple operators ( ) ˆ = ˆ ˆ ϕ ≡ ϕ (1) ( , , ) ( , , ) A X X x y z x x y z ( ) ˆ = ˆ ˆ ϕ ≡ ϕ (2) ( , , ) ( , , ) A Y Y x y z y x y z ( ) ˆ = ˆ ˆ ϕ ≡ ϕ (3) ( , , ) ( , , ) A Z Z x y z z x y z • The matrices A (k) do not necessarily commute, even if the operators  (k) do commute FG 29
Recommend
More recommend