an fpga implementation of reciprocal sums for spme
play

An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul - PowerPoint PPT Presentation

An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular Dynamics Simulation Smooth


  1. An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto

  2. Objectives � Accelerate part of Molecular Dynamics Simulation � Smooth Particle Mesh Ewald � Implementation � FPGA based � Try it and learn � Investigation � Acceleration bottleneck � Precision requirement � Parallelization strategy 2

  3. Presentation Outline � Molecular Dynamics � SPME � The Reciprocal Sum Compute Engine � Speedup and Parallelization � Precision � Future work 3

  4. 4 Molecular Dynamics Simulation

  5. Molecular Dynamics • Combines empirical force 1. Calculate interatomic calculations with Newton’s forces. equations of motion. 2. Calculate the net force. • Predict the time trajectory 3. Integrate Newton’s of small atomic systems. equations of motion. → → • Computationally = ⋅ − a F m 1 demanding. → → → → ( ) ( ) ( ) ( ) + δ = + δ + δ r t t r t t v t t a t 2 0 . 5 → → ⎡ → → ⎤ ( ) ( ) ( ) ( ) ⎥ + δ = + δ + + δ v t t v t t a t a t t 0 . 5 ⎢ ⎣ ⎦ → ∫ ∑ F 5

  6. Molecular Dynamics ∑ − Θ − Θ k 2 ( ) Θ o All Angles + ∑ − k l l 2 ( ) b o All Bonds + ∑ + τ + φ A n [ 1 cos( )] U = All Torsions + q q ∑ + δ 1 2 r All Pairs δ − + ⎡ ⎤ σ 12 σ 6 ⎛ ⎞ ⎛ ⎞ ∑ ε − ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ 4 ⎝ r ⎠ ⎝ r ⎠ ⎢ ⎥ ⎣ ⎦ All Pairs 6

  7. MD Simulation � Problem scientists are facing: � SLOW! � O(N 2 ) complexity. 3 0 CPU Years 7

  8. Solutions � Parallelize to more compute engines � Accelerate with FPGA � Especially: The non-bonded calculations � To be more specific, this paper addresses: � Electrostatic interaction (Reciprocal space) � Smooth Particle Mesh Ewald algorithm. 8

  9. Previous Work � Software SPME Implementations: � Original PME Package written by Toukmaji. � Used in NAMD2. � Hardware Implementations: � No previous hardware implementation of reciprocal sums calculation. � MD-Grape & MD-Engine uses Ewald Summation. � Ewald Summation is O(N 2 ); SPME is O(NLogN)! 9

  10. 10 Smooth Particle Mesh Ewald

  11. Electrostatic Interaction � Coulombic equation: q q = − v coulomb 1 2 4 πε r 0 � Under the Periodic Boundary Condition, the summation to calculate Electrostatic energy is only … Conditionally Convergent. q q N N ' 1 ∑∑∑ i j = U r 2 = = n i j ij n 1 1 , 11

  12. Periodic Boundary Condition � To combat Surface Effect… A B C 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 D E F 1 2 1 2 1 2 2 1 Replication 4 4 4 4 5 5 5 5 3 3 3 3 G H I 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 12

  13. Ewald Summation Used For PBC � To calculate the Coulombic Interactions � O(N 2 ) Direct Sum + O(N 2 ) Reciprocal Sum q r Reciprocal Sum Direct Sum q q r r 13

  14. Smooth Particle Mesh Ewald � Shift the workload to the Reciprocal Sum. � Use Fast Fourier Transform. � O(N) Real + O(NLogN) Reciprocal. � RSCE calculates the Reciprocal Sums using the SPME algorithm. 14

  15. SPME Reciprocal Contribution Energy : − ∑ ( π m / β ) 2 2 2 ~ 1 exp = • − − − E B(m ,m ,m ) F(Q)(m ,m ,m )F(Q)( m , m , m ) π V m 1 2 3 1 2 3 1 2 3 2 2 ≠ m 0 FFT FFT 1 ∑ ∑ ∑ − K − K − K 1 1 1 ~ 1 2 3 = • ∗ E Q(m ,m ,m ) ( θ Q)( m ,m ,m ) rec 1 2 3 1 2 3 2 = = = m m m 0 0 0 1 2 3 2 2 2 = • • B(m ,m ,m ) b (m ) b (m ) b (m ) 1 2 3 1 1 2 2 3 3 − 1 ⎡ − ⎤ − n π i(n )m 2 π im k 2 1 ∑ 2 = × + i i b (m ) ( ) M (k ) ( ) ⎢ ⎥ exp 1 exp i i n K K ⎣ ⎦ = i k i 0 − ( π 2 m 2 / β 2 ) 1 exp = ≠ = C(m ,m ,m ) m ,c( , , ) 0 0 0 0 0 1 2 3 π V m 2 Force: ~ − ∂ K − K − K ∂ E ∑ ∑ ∑ 1 1 1 Q 1 2 3 rec = = • ∗ F (m ,m ,m ) ( θ Q)( m ,m ,m ) ∂ ∂ rec r r 1 2 3 1 2 3 15 = = = α i m m m α i 0 0 0 1 2 3

  16. 16 F E Charge Interpolation C D B A

  17. 17 Compute Engine Reciprocal Sum

  18. 18 RSCE Architecture

  19. 19 RSCE Verification Testbench

  20. 20 RSCE Validation Environment

  21. Speedup Estimate RSCE vs. Software Implementation 21

  22. RSCE Speedup � RSCE @ 100MHz vs. P4 Intel @ 2.4GHz. � Speedup: 3x to 14x � Why so insignificant? � Reciprocal Sums calculations not easily parallelizable. � QMM memory bandwidth limitation. � Improvement: � Using more QMM memories can improve the speedup. � Slight design modifications are required. 22

  23. 23 Parallelization Strategy Multiple RSCE

  24. RSCE Parallelization Strategy � Assume a 2-D simulation system. � Assume P= 2, K= 8, N= 6. � Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes 24

  25. RSCE Parallelization Strategy � Mini-mesh composed -> 2D-IFFT � 2D-IFFT = two passes of 1D-FFT (X and Y). Y Direction FFT X Direction FFT Ky Ky P1 P2 1D FFT Y direction P1 P2 P3 P4 P3 P4 0 0 Kx Kx 1D FFT X direction 25

  26. Parallelization Strategy � 2D-IFFT -> Energy Calculation -> 2D-FFT � 2D-FFT -> Force Calculation Energy Calculation Force Calculation 3 ∑ = 2D-FFT E E Total P = P 0 26

  27. 27 RSCE + NAMD2 MD Simulations

  28. RSCE Precision � Precision goal: Relative error bound < 10 -5 . � Two major calculation steps: � B-Spline Calculation. � 3D-FFT/ IFFT Calculation. � Due to the limited logic resource & limited precision FFT LogiCore. = > Precision goal cannot be achieved. 28

  29. RSCE Precision � To achieve the relative error bound of < 10 -5 . � Minimum calculation precision: � FFT { 14.30} , B-Spline { 1.27} 29

  30. MD Simulation with RSCE � RMS Energy Error Fluctuation: 2 − 2 E E = RMS Energy Fluctuatio n E 30

  31. 31 FFT Precision Vs. Energy Fluctuation

  32. Summary � Implementation of FPGA-based Reciprocal Sums Compute Engine and its SystemC model. � Integration of the RSCE into a widely used Molecular Dynamics program called NAMD2 for verification � RSCE Speedup Estimate � 3x to 14x � Precision Requirement � B-Spline: { 1.27} & FFT: { 14: 30} = > 10 -5 rel. error � Parallelization Strategy 32

  33. Future Work � More in-depth precision analysis. � Investigation on how to further speedup the SPME algorithm with FPGA. 33

  34. 34 Questions

Recommend


More recommend