An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul - PowerPoint PPT Presentation
An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto Objectives Accelerate part of Molecular Dynamics Simulation Smooth
An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto
Objectives � Accelerate part of Molecular Dynamics Simulation � Smooth Particle Mesh Ewald � Implementation � FPGA based � Try it and learn � Investigation � Acceleration bottleneck � Precision requirement � Parallelization strategy 2
Presentation Outline � Molecular Dynamics � SPME � The Reciprocal Sum Compute Engine � Speedup and Parallelization � Precision � Future work 3
4 Molecular Dynamics Simulation
Molecular Dynamics • Combines empirical force 1. Calculate interatomic calculations with Newton’s forces. equations of motion. 2. Calculate the net force. • Predict the time trajectory 3. Integrate Newton’s of small atomic systems. equations of motion. → → • Computationally = ⋅ − a F m 1 demanding. → → → → ( ) ( ) ( ) ( ) + δ = + δ + δ r t t r t t v t t a t 2 0 . 5 → → ⎡ → → ⎤ ( ) ( ) ( ) ( ) ⎥ + δ = + δ + + δ v t t v t t a t a t t 0 . 5 ⎢ ⎣ ⎦ → ∫ ∑ F 5
Molecular Dynamics ∑ − Θ − Θ k 2 ( ) Θ o All Angles + ∑ − k l l 2 ( ) b o All Bonds + ∑ + τ + φ A n [ 1 cos( )] U = All Torsions + q q ∑ + δ 1 2 r All Pairs δ − + ⎡ ⎤ σ 12 σ 6 ⎛ ⎞ ⎛ ⎞ ∑ ε − ⎜ ⎟ ⎜ ⎟ ⎢ ⎥ 4 ⎝ r ⎠ ⎝ r ⎠ ⎢ ⎥ ⎣ ⎦ All Pairs 6
MD Simulation � Problem scientists are facing: � SLOW! � O(N 2 ) complexity. 3 0 CPU Years 7
Solutions � Parallelize to more compute engines � Accelerate with FPGA � Especially: The non-bonded calculations � To be more specific, this paper addresses: � Electrostatic interaction (Reciprocal space) � Smooth Particle Mesh Ewald algorithm. 8
Previous Work � Software SPME Implementations: � Original PME Package written by Toukmaji. � Used in NAMD2. � Hardware Implementations: � No previous hardware implementation of reciprocal sums calculation. � MD-Grape & MD-Engine uses Ewald Summation. � Ewald Summation is O(N 2 ); SPME is O(NLogN)! 9
10 Smooth Particle Mesh Ewald
Electrostatic Interaction � Coulombic equation: q q = − v coulomb 1 2 4 πε r 0 � Under the Periodic Boundary Condition, the summation to calculate Electrostatic energy is only … Conditionally Convergent. q q N N ' 1 ∑∑∑ i j = U r 2 = = n i j ij n 1 1 , 11
Periodic Boundary Condition � To combat Surface Effect… A B C 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 D E F 1 2 1 2 1 2 2 1 Replication 4 4 4 4 5 5 5 5 3 3 3 3 G H I 2 2 2 1 1 1 4 4 4 5 5 5 3 3 3 12
Ewald Summation Used For PBC � To calculate the Coulombic Interactions � O(N 2 ) Direct Sum + O(N 2 ) Reciprocal Sum q r Reciprocal Sum Direct Sum q q r r 13
Smooth Particle Mesh Ewald � Shift the workload to the Reciprocal Sum. � Use Fast Fourier Transform. � O(N) Real + O(NLogN) Reciprocal. � RSCE calculates the Reciprocal Sums using the SPME algorithm. 14
SPME Reciprocal Contribution Energy : − ∑ ( π m / β ) 2 2 2 ~ 1 exp = • − − − E B(m ,m ,m ) F(Q)(m ,m ,m )F(Q)( m , m , m ) π V m 1 2 3 1 2 3 1 2 3 2 2 ≠ m 0 FFT FFT 1 ∑ ∑ ∑ − K − K − K 1 1 1 ~ 1 2 3 = • ∗ E Q(m ,m ,m ) ( θ Q)( m ,m ,m ) rec 1 2 3 1 2 3 2 = = = m m m 0 0 0 1 2 3 2 2 2 = • • B(m ,m ,m ) b (m ) b (m ) b (m ) 1 2 3 1 1 2 2 3 3 − 1 ⎡ − ⎤ − n π i(n )m 2 π im k 2 1 ∑ 2 = × + i i b (m ) ( ) M (k ) ( ) ⎢ ⎥ exp 1 exp i i n K K ⎣ ⎦ = i k i 0 − ( π 2 m 2 / β 2 ) 1 exp = ≠ = C(m ,m ,m ) m ,c( , , ) 0 0 0 0 0 1 2 3 π V m 2 Force: ~ − ∂ K − K − K ∂ E ∑ ∑ ∑ 1 1 1 Q 1 2 3 rec = = • ∗ F (m ,m ,m ) ( θ Q)( m ,m ,m ) ∂ ∂ rec r r 1 2 3 1 2 3 15 = = = α i m m m α i 0 0 0 1 2 3
16 F E Charge Interpolation C D B A
17 Compute Engine Reciprocal Sum
18 RSCE Architecture
19 RSCE Verification Testbench
20 RSCE Validation Environment
Speedup Estimate RSCE vs. Software Implementation 21
RSCE Speedup � RSCE @ 100MHz vs. P4 Intel @ 2.4GHz. � Speedup: 3x to 14x � Why so insignificant? � Reciprocal Sums calculations not easily parallelizable. � QMM memory bandwidth limitation. � Improvement: � Using more QMM memories can improve the speedup. � Slight design modifications are required. 22
23 Parallelization Strategy Multiple RSCE
RSCE Parallelization Strategy � Assume a 2-D simulation system. � Assume P= 2, K= 8, N= 6. � Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes 24
RSCE Parallelization Strategy � Mini-mesh composed -> 2D-IFFT � 2D-IFFT = two passes of 1D-FFT (X and Y). Y Direction FFT X Direction FFT Ky Ky P1 P2 1D FFT Y direction P1 P2 P3 P4 P3 P4 0 0 Kx Kx 1D FFT X direction 25
Parallelization Strategy � 2D-IFFT -> Energy Calculation -> 2D-FFT � 2D-FFT -> Force Calculation Energy Calculation Force Calculation 3 ∑ = 2D-FFT E E Total P = P 0 26
27 RSCE + NAMD2 MD Simulations
RSCE Precision � Precision goal: Relative error bound < 10 -5 . � Two major calculation steps: � B-Spline Calculation. � 3D-FFT/ IFFT Calculation. � Due to the limited logic resource & limited precision FFT LogiCore. = > Precision goal cannot be achieved. 28
RSCE Precision � To achieve the relative error bound of < 10 -5 . � Minimum calculation precision: � FFT { 14.30} , B-Spline { 1.27} 29
MD Simulation with RSCE � RMS Energy Error Fluctuation: 2 − 2 E E = RMS Energy Fluctuatio n E 30
31 FFT Precision Vs. Energy Fluctuation
Summary � Implementation of FPGA-based Reciprocal Sums Compute Engine and its SystemC model. � Integration of the RSCE into a widely used Molecular Dynamics program called NAMD2 for verification � RSCE Speedup Estimate � 3x to 14x � Precision Requirement � B-Spline: { 1.27} & FFT: { 14: 30} = > 10 -5 rel. error � Parallelization Strategy 32
Future Work � More in-depth precision analysis. � Investigation on how to further speedup the SPME algorithm with FPGA. 33
34 Questions
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.