towards highly scalable ab initio molecular dynamics (aimd) on the intel knights landing manycore processor Mathias Jacquelin mjacquelin@lbl.gov Wibe De Jong wadejong@lbl.gov Computational Research Department Lawrence Berkeley National Laboratory Eric Bylaska ebylaska@pnnl.gov Pacifjc Northwest National Laboratory Scheduling in Knoxville 17 May 25 2017
introduction: plane wave methods ∙ Uses plane wave basis operations ∙ SUMMA like for global ∙ Many FFTs 1/22 ∙ 100-1000 atoms QM-CC QM-DFT AIMD QM/MM MM E (-1/2) ∇ 2 Ψ V ext Ψ V H Ψ V xc Ψ = Ψ + + + Ψ i Ψ j = δ ij :: N e N pack (-1/2) ∇ 2 Ψ V :: (N a N pack + N g LogN pack + N e N pack ) + N a N e N pack Ψ ext :: N e N g LogN g + N e N g + 2N g LogN g + N g + N e N g V H Ψ :: N e N g LogN g + N e N g V xc Ψ 3 :: N e 2 N pack + N e Ψ i Ψ j N a - number of atoms, N e - number of electrons N g – size of FFT grid, N pack - size of reciprocal space
introduction: plane wave discretization Matrix multiplication in reciprocal space Orthogonality (matrix multiplication) Density Poisson 2/22 j 2 ∇ 2 + V L ( r ) + ( 1 − α ) V x [ ρ ]( r ) + V c [ ρ ]( r ) H ψ i ( r ) = − 1 � + V NL + V H [ ρ ]( r ) − α K ij ( r ) ψ j ( r ) ∇ 2 V H , X , C ( r ) = − 4 πρ ( r ) ∇ 2 K ij ( r ) = − 4 πψ i ( r ) ψ j ( r ) ( N e + 1 ) N e 3D-FFT N e 3D-FFT ρ ( r ) = � N i = 1 | ψ i ( r ) | 2 � ω ψ i ( r ) ψ j ( r ) dr = δ ij
introduction: plane wave dft solutions 200,000 steps ∙ 1 ns = 100 - 150 days ∙ 100 ps = 10-15 days ∙ Assume 1 s/step longer time scales ∙ Mesoscale phenomena at ∙ 13 s/step = 70 days ∙ 10 s/step = 23 days ∙ 1 s/step = 2-3 days ∙ 20 ps of simulation time ∙ Avoid direct diagonalization because of large basis sets (much Expensive parts: global operations wavefunction orthogonalization ∙ Kinetic and nonlocal pseudopotential = matrix multiplications algorithm to solve DFT equation ∙ Instead evaluate wave function gradient using a conjugate gradient larger than Gaussian basis sets) 3/22 in reciprocal space (N g × N e , N g = 96 3 ) ∙ Local pseudopot., Coulomb, and exchange-correlation = N e FFTs ∙ Exact exchange ( ( N e + 1 ) N e FFTs), nonlocal pseudopotential, and
introduction: plane wave dft solutions 200,000 steps ∙ 1 ns = 100 - 150 days ∙ 100 ps = 10-15 days ∙ Assume 1 s/step longer time scales ∙ Mesoscale phenomena at ∙ 13 s/step = 70 days ∙ 10 s/step = 23 days ∙ 1 s/step = 2-3 days ∙ 20 ps of simulation time ∙ Avoid direct diagonalization because of large basis sets (much Expensive parts: global operations wavefunction orthogonalization ∙ Kinetic and nonlocal pseudopotential = matrix multiplications algorithm to solve DFT equation ∙ Instead evaluate wave function gradient using a conjugate gradient larger than Gaussian basis sets) 3/22 in reciprocal space (N g × N e , N g = 96 3 ) ∙ Local pseudopot., Coulomb, and exchange-correlation = N e FFTs ∙ Exact exchange ( ( N e + 1 ) N e FFTs), nonlocal pseudopotential, and
introduction: plane wave dft solutions 200,000 steps ∙ 1 ns = 100 - 150 days ∙ 100 ps = 10-15 days ∙ Assume 1 s/step longer time scales ∙ Mesoscale phenomena at ∙ 13 s/step = 70 days ∙ 10 s/step = 23 days ∙ 1 s/step = 2-3 days 3/22 ∙ Avoid direct diagonalization because of large basis sets (much Expensive parts: global operations wavefunction orthogonalization ∙ Kinetic and nonlocal pseudopotential = matrix multiplications algorithm to solve DFT equation ∙ Instead evaluate wave function gradient using a conjugate gradient larger than Gaussian basis sets) in reciprocal space (N g × N e , N g = 96 3 ) ∙ Local pseudopot., Coulomb, and exchange-correlation = N e FFTs ∙ Exact exchange ( ( N e + 1 ) N e FFTs), nonlocal pseudopotential, and ∙ 20 ps of simulation time ≈
introduction: plane wave dft solutions 200,000 steps ∙ 1 ns = 100 - 150 days ∙ 100 ps = 10-15 days ∙ Assume 1 s/step longer time scales ∙ Mesoscale phenomena at ∙ 13 s/step = 70 days ∙ 10 s/step = 23 days ∙ 1 s/step = 2-3 days 3/22 ∙ Avoid direct diagonalization because of large basis sets (much Expensive parts: global operations wavefunction orthogonalization ∙ Kinetic and nonlocal pseudopotential = matrix multiplications algorithm to solve DFT equation ∙ Instead evaluate wave function gradient using a conjugate gradient larger than Gaussian basis sets) in reciprocal space (N g × N e , N g = 96 3 ) ∙ Local pseudopot., Coulomb, and exchange-correlation = N e FFTs ∙ Exact exchange ( ( N e + 1 ) N e FFTs), nonlocal pseudopotential, and ∙ 20 ps of simulation time ≈
3d ffts y z x ∙ Reverse FFTs: steps in reverse order nz 1D FFTs along the x-dimension 6. Perform ny x y z 5. Rotate the cube y z x nx 1D FFTs along the y-dimension 4. Perform nz 3. Rotate the cube z x y ∙ At every AIMD step, perform: ny 1D FFTs along the z-dimension 2. Perform nx 1. Unpack sphere into a 3D cube (z,x,y) ∙ Each forward FFT performed in 6 steps: ∙ In reciprocal space, sphere of 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored
3d ffts y z x ∙ Reverse FFTs: steps in reverse order nz 1D FFTs along the x-dimension 6. Perform ny x y z 5. Rotate the cube y z x nx 1D FFTs along the y-dimension 4. Perform nz 3. Rotate the cube z x y ∙ At every AIMD step, perform: ny 1D FFTs along the z-dimension 2. Perform nx 1. Unpack sphere into a 3D cube (z,x,y) ∙ Each forward FFT performed in 6 steps: ∙ In reciprocal space, sphere of 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored
3d ffts y z x ∙ Reverse FFTs: steps in reverse order nz 1D FFTs along the x-dimension 6. Perform ny x y z 5. Rotate the cube y z x nx 1D FFTs along the y-dimension 4. Perform nz 3. Rotate the cube z x y ∙ At every AIMD step, perform: 1. Unpack sphere into a 3D cube (z,x,y) ∙ Each forward FFT performed in 6 steps: ∙ In reciprocal space, sphere of 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension
3d ffts 4. Perform nz ∙ Reverse FFTs: steps in reverse order nz 1D FFTs along the x-dimension 6. Perform ny x y z 5. Rotate the cube y z x nx 1D FFTs along the y-dimension 4/22 ∙ At every AIMD step, perform: 1. Unpack sphere into a 3D cube (z,x,y) ∙ Each forward FFT performed in 6 steps: ∙ In reciprocal space, sphere of ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension 3. Rotate the cube ( z , x , y ) → ( y , z , x )
3d ffts ∙ At every AIMD step, perform: ∙ Reverse FFTs: steps in reverse order nz 1D FFTs along the x-dimension 6. Perform ny x y z 5. Rotate the cube y z x 4/22 1. Unpack sphere into a 3D cube (z,x,y) ∙ Each forward FFT performed in 6 steps: ∙ In reciprocal space, sphere of ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension 3. Rotate the cube ( z , x , y ) → ( y , z , x ) 4. Perform nz × nx 1D FFTs along the y-dimension
3d ffts ∙ At every AIMD step, perform: ∙ In reciprocal space, sphere of ∙ Each forward FFT performed in 6 steps: 1. Unpack sphere into a 3D cube (z,x,y) 6. Perform ny nz 1D FFTs along the x-dimension ∙ Reverse FFTs: steps in reverse order 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension 3. Rotate the cube ( z , x , y ) → ( y , z , x ) 4. Perform nz × nx 1D FFTs along the y-dimension 5. Rotate the cube ( y , z , x ) → ( x , y , z )
3d ffts ∙ At every AIMD step, perform: ∙ In reciprocal space, sphere of ∙ Each forward FFT performed in 6 steps: 1. Unpack sphere into a 3D cube (z,x,y) ∙ Reverse FFTs: steps in reverse order 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension 3. Rotate the cube ( z , x , y ) → ( y , z , x ) 4. Perform nz × nx 1D FFTs along the y-dimension 5. Rotate the cube ( y , z , x ) → ( x , y , z ) 6. Perform ny × nz 1D FFTs along the x-dimension
3d ffts ∙ At every AIMD step, perform: ∙ In reciprocal space, sphere of ∙ Each forward FFT performed in 6 steps: 1. Unpack sphere into a 3D cube (z,x,y) ∙ Reverse FFTs: steps in reverse order 4/22 ∙ N e Reverse 3D FFTs ∙ N e Forward 3D FFTs radius E cut is stored 2. Perform nx × ny 1D FFTs along the z-dimension 3. Rotate the cube ( z , x , y ) → ( y , z , x ) 4. Perform nz × nx 1D FFTs along the y-dimension 5. Rotate the cube ( y , z , x ) → ( x , y , z ) 6. Perform ny × nz 1D FFTs along the x-dimension
pipe-lined 3d ffts ∙ At each AIMD step: ∙ 3D FFT steps are pipe-lined 5/22 ∙ N e Forward 3D FFTs ∙ N e Reverse 3D FFTs
pipe-lined 3d ffts ∙ At each AIMD step: ∙ 3D FFT steps are pipe-lined 5/22 ∙ N e Forward 3D FFTs ∙ N e Reverse 3D FFTs
pipe-lined 3d ffts ∙ At each AIMD step: ∙ 3D FFT steps are pipe-lined 5/22 ∙ N e Forward 3D FFTs ∙ N e Reverse 3D FFTs
∙ M is N e -by-N e , F is N pack -by-N e (or transpose) preserving orthogonality: lagrange multipliers ∙ At each AIMD step, wave functions need to be orthogonalized ∙ Lagrange multiplier method: ∙ Matrix Ricatti equations solved ∙ Expensive step critical to scalability ∙ Mainly matrix-matrix multiplications: ∙ 3 letters to describe a C A B matrix product ∙ First letter for A, second for B, and third for C FFM MMM FMF 6/22
Recommend
More recommend