Mitglied der Helmholtz-Gemeinschaft A Parallel and Scalable Iterative Solver for Sequences of Dense Eigenproblems Arising in FLAPW PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli
Motivation and Goals Electronic Structure Mathematical Model Band energy gap + SIMULATIONS Conductivity Algorithmic Structure Forces, etc. Extracting & Exploiting Information
Motivation and Goals Electronic Structure Mathematical Model Band energy gap + SIMULATIONS Conductivity Algorithmic Structure Forces, etc. Extracting & Exploiting Information Performance More Efficiency Physics Scalability PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 2
Outline Sequences of generalized eigenproblems in all-electron computations The algorithm: Chebyshev Filtered Sub-space Iteration method ( ChFSI ) ChFSI parallelization and numerical tests PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 3
Outline Sequences of generalized eigenproblems in all-electron computations The algorithm: Chebyshev Filtered Sub-space Iteration method ( ChFSI ) ChFSI parallelization and numerical tests PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 4
Density Functional Theory scheme Self-consistent cycle Solve a set of Initial guess Compute discretized eigenproblems for charge density Kohn-Sham P ( ℓ ) k 1 ... P ( ℓ ) n start ( r ) equations k N No Compute new OUTPUT Converged? Yes Electronic charge density | n ( ℓ ) − n ( ℓ − 1 ) | < η structure, n ( ℓ ) ( r ) ... PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 5
Density Functional Theory scheme Self-consistent cycle Solve a set of Initial guess Compute discretized eigenproblems for charge density Kohn-Sham P ( ℓ ) k 1 ... P ( ℓ ) n start ( r ) equations k N No Compute new OUTPUT Converged? Yes Electronic charge density | n ( ℓ ) − n ( ℓ − 1 ) | < η structure, n ( ℓ ) ( r ) ... 1 every P ( ℓ ) : A ( ℓ ) k x = B ( ℓ ) k λ x is a generalized eigenvalue problem; k 2 A and B are DENSE and hermitian (B is also pos. def.); 3 P k s with different k index have different size and are independent from each other. 4 k = 1 : 10 − 100 ℓ = 1 : 20 − 50 ; PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 5
Density Functional Theory scheme Adjacent cycles ITERATION ( ℓ ) ITERATION ( ℓ + 1 ) direct direct P ( ℓ ) ( X ( ℓ ) k 1 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 1 ) ) k 1 k 1 k 1 k 1 solver solver direct direct P ( ℓ ) ( X ( ℓ ) k 2 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 2 ) ) k 2 k 2 k 2 k 2 solver solver Next cycle direct direct P ( ℓ ) ( X ( ℓ ) k N , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k N ) ) k N k N k N k N solver solver X ≡ { x 1 ,..., x n } Λ ≡ diag ( λ 1 ,..., λ n ) PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 6
Density Functional Theory scheme Adjacent cycles ITERATION ( ℓ ) ITERATION ( ℓ + 1 ) direct direct P ( ℓ ) ( X ( ℓ ) k 1 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 1 ) ) k 1 k 1 k 1 k 1 solver solver direct direct P ( ℓ ) ( X ( ℓ ) k 2 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 2 ) ) k 2 k 2 k 2 k 2 solver solver Next cycle direct direct P ( ℓ ) ( X ( ℓ ) k N , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k N ) ) k N k N k N k N solver solver X ≡ { x 1 ,..., x n } Λ ≡ diag ( λ 1 ,..., λ n ) PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 6
Density Functional Theory scheme Adjacent cycles ITERATION ( ℓ ) ITERATION ( ℓ + 1 ) direct direct P ( ℓ ) ( X ( ℓ ) k 1 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 1 ) ) k 1 k 1 k 1 k 1 solver solver direct direct P ( ℓ ) ( X ( ℓ ) k 2 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 2 ) ) k 2 k 2 k 2 k 2 solver solver Next cycle direct direct P ( ℓ ) ( X ( ℓ ) k N , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k N ) ) k N k N k N k N solver solver X ≡ { x 1 ,..., x n } Λ ≡ diag ( λ 1 ,..., λ n ) PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 6
Sequences of eigenproblems Current solving strategy The set of generalized eigenproblems P ( 1 ) ... P ( ℓ ) P ( ℓ + 1 ) ... P ( N ) is handled as a set of disjoint problems N × P ; Each problem P ( ℓ ) is solved independently using a direct solver as a black-box from a standard library (i.e. ScaLAPACK). PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 7
Sequences of eigenproblems Current solving strategy The set of generalized eigenproblems P ( 1 ) ... P ( ℓ ) P ( ℓ + 1 ) ... P ( N ) is handled as a set of disjoint problems N × P ; Each problem P ( ℓ ) is solved independently using a direct solver as a black-box from a standard library (i.e. ScaLAPACK). Extracting information − → searching for a new strategy � P ( ℓ ) � Treated the set of eigenproblems N × P as a sequence ; Investigated the presence of a connection between adjacent problems, collected data on angles b/w eigenvectors of adjacent eigenproblems; Θ ( ℓ ) � 1 −� X ( ℓ − 1 ) X ( ℓ ) � , ˜ k i ≡ { θ 1 ,..., θ n } = diag k i � k i uncovered evolution of eigenvectors along the sequence θ ( 2 ) ≥ θ ( 3 ) ≥ ··· ≥ θ ( N ) θ ( 2 ) ≫ θ ( N ) for fixed k i : j j j j j PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 7
Angles evolution fixed k Example: a metallic compound at fixed k Evolution of subspace angle for eigenvectors of k − point 1 and lowest 75 eigs 0 10 AuAg Angle b/w eigenvectors of adjacent iterations − 2 10 − 4 10 − 6 10 − 8 10 − 10 10 2 6 10 14 18 22 Iterations (2 − > 22) PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 8
Correlation and its exploitation Correlation ∃ correlation between successive eigenvectors x ( ℓ − 1 ) and x ( ℓ ) j ; j angles are small after the first few iterations. Exploiting information: S IMULATION ⇒ A LGORITHM The stage is favorable to an iterative eigensolver where the solution of P ( ℓ − 1 ) is used to solve P ( ℓ ) . Note: Mathematical model � Correlation. Correlation ⇐ numerical analysis of the simulation. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 9
Improved Density Functional Theory scheme Adjacent cycles ITERATION ( ℓ ) ITERATION ( ℓ + 1 ) iterative iterative P ( ℓ ) ( X ( ℓ ) k 1 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 1 ) ) k 1 k 1 k 1 k 1 solver solver iterative iterative P ( ℓ ) ( X ( ℓ ) k 2 , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k 2 ) ) k 2 k 2 k 2 k 2 solver solver Next cycle iterative iterative P ( ℓ ) ( X ( ℓ ) k N , Λ ( ℓ ) P ( ℓ + 1 ) ( X ( ℓ + 1 ) , Λ ( ℓ + 1 ) k N ) ) k N k N k N k N solver solver X ≡ { x 1 ,..., x n } Λ ≡ diag ( λ 1 ,..., λ n ) PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 10
Algorithmic choice Direct solvers. Iterative solvers. Sparse matrices. Dense matrices. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 11
Algorithmic choice Direct solvers. Iterative solvers. Sparse matrices. Dense matrices. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 11
Algorithmic choice Direct solvers. Iterative solvers. Disadvantage: Advantage: Many mat-vec xGEMM on blocks approx. eigenvecs multiplications Sparse matrices. Dense matrices. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 11
Outline Sequences of generalized eigenproblems in all-electron computations The algorithm: Chebyshev Filtered Sub-space Iteration method ( ChFSI ) ChFSI parallelization and numerical tests PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 12
Selecting an iterative eigensolver Two are the main properties an iterative algorithm has to comply with: 1 The ability to receive as input a sizable set of approximate eigenvectors Z 0 (extracted from X ( ℓ − 1 ) ); k i 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 13
Selecting an iterative eigensolver Two are the main properties an iterative algorithm has to comply with: 1 The ability to receive as input a sizable set of approximate eigenvectors Z 0 (extracted from X ( ℓ − 1 ) ); k i 2 The capacity to solve simultaneously for a substantial portion of eigenpairs. ChFSI constitutes the natural choice: it accepts the full set of multiple starting vectors; it avoids stalling when facing small clusters of eigenvalues; when augmented with polynomial accelerators it has a much faster convergence rate; converged eigenvectors can be easily locked; the degree of the polynomial can be opportunely optimized. PPAM 2013 Warsaw, Poland, Sept. 10th M. Berljafa and E. Di Napoli Folie 13
Recommend
More recommend