An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto
Objectives � Accelerate part of Molecular Dynamics Simulation � Smooth Particle Mesh Ewald � Implementation � FPGA based � Try it and learn � Investigation � Acceleration bottleneck � Precision requirement � Parallelization strategy 2
Presentation Outline � Molecular Dynamics � SPME � The Reciprocal Sum Compute Engine � Speedup and Parallelization � Precision � Future work 3
4 Molecular Dynamics Simulation
Molecular Dynamics • Combines empirical force 1. Calculate interatomic calculations with Newton’s forces. equations of motion. 2. Calculate the net force. • Predict the time trajectory 3. Integrate Newton’s of small atomic systems. equations of motion. → → • Computationally = ⋅ − a F m 1 demanding. → → → → ( ) ( ) ( ) ( ) + δ = + δ + δ r t t r t t v t t a t 2 0 . 5 → → ⎡ → → ⎤ ( ) ( ) ( ) ( ) ⎥ + δ = + δ + + δ v t t v t t a t a t t 0 . 5 ⎢ ⎣ ⎦ → ∫ ∑ F 5
Molecular Dynamics ∑ − Θ − Θ k 2 ( ) Θ o All Angles + ∑ − k l l 2 ( ) b o All Bonds + ∑ + τ + φ A n [ 1 cos( )] U = All Torsions + q q ∑ + δ 1 2 r All Pairs δ − + ⎡ ⎤ σ 12 σ 6 ⎛ ⎞ ⎛ ⎞ ∑ ε − ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ 4 ⎝ r ⎠ ⎝ r ⎠ ⎢ ⎥ ⎣ ⎦ All Pairs 6
MD Simulation � Problem scientists are facing: � SLOW! � O(N 2 ) complexity. 3 0 CPU Years 7
Solutions � Parallelize to more compute engines � Accelerate with FPGA � Especially: The non-bonded calculations � To be more specific, this paper addresses: � Electrostatic interaction (Reciprocal space) � Smooth Particle Mesh Ewald algorithm. 8
Previous Work � Software SPME Implementations: � Original PME Package written by Toukmaji. � Used in NAMD2. � Hardware Implementations: � No previous hardware implementation of reciprocal sums calculation. � MD-Grape & MD-Engine uses Ewald Summation. � Ewald Summation is O(N 2 ); SPME is O(NLogN)! 9
10 Smooth Particle Mesh Ewald
Electrostatic Interaction � Coulombic equation: q q = − v coulomb 1 2 4 πε r 0 � Under the Periodic Boundary Condition, the summation to calculate Electrostatic energy is only … Conditionally Convergent. q q N N ' 1 ∑∑∑ i j = U r 2 = = n i j ij n 1 1 , 11
Periodic Boundary Condition � To combat Surface Effect… A B C 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 D E F 1 2 1 2 1 2 2 1 Replication 4 4 4 4 5 5 5 5 3 3 3 3 G H I 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 12
Ewald Summation Used For PBC � To calculate the Coulombic Interactions � O(N 2 ) Direct Sum + O(N 2 ) Reciprocal Sum q r Reciprocal Sum Direct Sum q q r r 13
Smooth Particle Mesh Ewald � Shift the workload to the Reciprocal Sum. � Use Fast Fourier Transform. � O(N) Real + O(NLogN) Reciprocal. � RSCE calculates the Reciprocal Sums using the SPME algorithm. 14
SPME Reciprocal Contribution Energy : − ∑ ( π m / β ) 2 2 2 ~ 1 exp = • − − − E B(m ,m ,m ) F(Q)(m ,m ,m )F(Q)( m , m , m ) π V m 1 2 3 1 2 3 1 2 3 2 2 ≠ m 0 FFT FFT 1 ∑ ∑ ∑ − K − K − K 1 1 1 ~ 1 2 3 = • ∗ E Q(m ,m ,m ) ( θ Q)( m ,m ,m ) rec 1 2 3 1 2 3 2 = = = m m m 0 0 0 1 2 3 2 2 2 = • • B(m ,m ,m ) b (m ) b (m ) b (m ) 1 2 3 1 1 2 2 3 3 − 1 ⎡ − ⎤ − n π i(n )m 2 π im k 2 1 ∑ 2 = × + i i b (m ) ( ) M (k ) ( ) ⎢ ⎥ exp 1 exp i i n K K ⎣ ⎦ = i k i 0 − ( π 2 m 2 / β 2 ) 1 exp = ≠ = C(m ,m ,m ) m ,c( , , ) 0 0 0 0 0 1 2 3 π V m 2 Force: ~ − ∂ K − K − K ∂ E ∑ ∑ ∑ 1 1 1 Q 1 2 3 rec = = • ∗ F (m ,m ,m ) ( θ Q)( m ,m ,m ) ∂ ∂ rec r r 1 2 3 1 2 3 15 = = = α i m m m α i 0 0 0 1 2 3
16 F E Charge Interpolation C D B A
17 Compute Engine Reciprocal Sum
18 RSCE Architecture
19 RSCE Verification Testbench
20 RSCE Validation Environment
Speedup Estimate RSCE vs. Software Implementation 21
RSCE Speedup � RSCE @ 100MHz vs. P4 Intel @ 2.4GHz. � Speedup: 3x to 14x � Why so insignificant? � Reciprocal Sums calculations not easily parallelizable. � QMM memory bandwidth limitation. � Improvement: � Using more QMM memories can improve the speedup. � Slight design modifications are required. 22
23 Parallelization Strategy Multiple RSCE
RSCE Parallelization Strategy � Assume a 2-D simulation system. � Assume P= 2, K= 8, N= 6. � Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes 24
RSCE Parallelization Strategy � Mini-mesh composed -> 2D-IFFT � 2D-IFFT = two passes of 1D-FFT (X and Y). Y Direction FFT X Direction FFT Ky Ky P1 P2 1D FFT Y direction P1 P2 P3 P4 P3 P4 0 0 Kx Kx 1D FFT X direction 25
Parallelization Strategy � 2D-IFFT -> Energy Calculation -> 2D-FFT � 2D-FFT -> Force Calculation Energy Calculation Force Calculation 3 ∑ = 2D-FFT E E Total P = P 0 26
27 RSCE + NAMD2 MD Simulations
RSCE Precision � Precision goal: Relative error bound < 10 -5 . � Two major calculation steps: � B-Spline Calculation. � 3D-FFT/ IFFT Calculation. � Due to the limited logic resource & limited precision FFT LogiCore. = > Precision goal cannot be achieved. 28
RSCE Precision � To achieve the relative error bound of < 10 -5 . � Minimum calculation precision: � FFT { 14.30} , B-Spline { 1.27} 29
MD Simulation with RSCE � RMS Energy Error Fluctuation: 2 − 2 E E = RMS Energy Fluctuatio n E 30
31 FFT Precision Vs. Energy Fluctuation
Summary � Implementation of FPGA-based Reciprocal Sums Compute Engine and its SystemC model. � Integration of the RSCE into a widely used Molecular Dynamics program called NAMD2 for verification � RSCE Speedup Estimate � 3x to 14x � Precision Requirement � B-Spline: { 1.27} & FFT: { 14: 30} = > 10 -5 rel. error � Parallelization Strategy 32
Future Work � More in-depth precision analysis. � Investigation on how to further speedup the SPME algorithm with FPGA. 33
34 Questions
Recommend
More recommend