NAMD - Scalable Molecular Dynamics Gengbin Zheng 9/1/01 1
Molecular dynamics and NAMD • MD to understand the structure and function of biomolecules – proteins, DNA, membranes • NAMD is a production quality MD program – Active use by biophysicists (science publications) – 50,000+ lines of C++ code – 1000+ registered users – Features and “accessories” such as • VMD: visualization and analysis • BioCoRE: collaboratory • Steered and Interactive Molecular Dynamics 2
Molecular Dynamics 3
Molecular Dynamics • Collection of [charged] atoms, with bonds • Like N-Body problem, but much complicated. • At each time-step – Calculate forces on each atom • non-bonded: electrostatic and van der Waal’s • Bonds(2), angle(3) and dihedral(4) – Integration: calculate velocities and advance positions • 1 femtosecond time-step, millions needed! • Thousands of atoms (1,000 - 100,000) 4
Cut-off radius • Use of cut-off radius to reduce work – 8 - 14 Å – Far away charges ignored! • 80-95 % work is non-bonded force computations • Some simulations need far away contributions – Periodic systems: Ewald, Particle-Mesh Ewald – Aperiodic systems: FMA • Even so, cut-off based computations are important: – near-atom calculations are part of the above – Cycles: multiple time-stepping is used: k cut-off steps, 1 PME/FMA 5
Spatial Decomposition Patch But the load balancing problems are still severe: 6
Patch Compute Proxy 7
FD + SD • Now, we have many more objects to load balance: – Each diamond can be assigned to any processor – Number of diamonds (3D): • 14·Number of Patches 8
Load Balancing • Is a major challenge for this application – especially for a large number of processors • Unpredictable workloads – Each diamond (force object) and patch encapsulate variable amount of work – Static estimates are inaccurate • Measurement based Load Balancing Framework – Robert Brunner’s recent Ph.D. thesis – Very slow variations across timesteps 9
Load Balancing • Based on migratable objects • Collect timing data for several cycles • Run heuristic load balancer – Several alternative ones: • Alg7 - Greedy • Refinement • Re-map and migrate objects accordingly – Registration mechanisms facilitate migration 10
Load balancing strategy Greedy variant (simplified): Refinement: Sort compute objects (diamonds) Repeat Repeat (until all assigned) - Pick a compute from S = set of all processors that: the most overloaded PE -- are not overloaded - Assign it to a suitable -- generate least new commun. underloaded PE P = least loaded {S} Until (No movement) Assign heaviest compute to P Cell Compute Cell 11
5000000 4500000 4000000 3500000 3000000 Time migratable work 2500000 non-migratable work 2000000 1500000 1000000 500000 0 0 2 4 6 8 10 12 14 Average Processors 4500000 4000000 3500000 3000000 2500000 Time migratable work 2000000 non-migratable work 1500000 1000000 500000 0 e 0 2 4 6 8 0 2 4 g 1 1 1 a r e v A Processors 12
Results on Linux Cluster Speedup on Linux Cluster 80 70 60 50 Speedup 40 30 20 10 0 0 20 40 60 80 100 120 Processors 13
Performance of Apo-A1 on Asci Red 1200 1000 800 Speedup 600 400 200 0 0 500 1000 1500 2000 2500 Processors 14
Performance of Apo-A1 on O2k and T3E 250 200 150 Speedup 100 50 0 0 50 100 150 200 250 300 Processors 15
Future and Planned work • Increased speedups on 2k-10k processors – Smaller grainsizes – New algorithms for reducing communication impact – New load balancing strategies • Further performance improvements for PME/FMA – With multiple timestepping – Needs multi-phase load balancing 16
Steered MD: example picture Image and Simulation by the theoretical biophysics group, Beckman Institute, UIUC 17
Recommend
More recommend