HOOMD-blue - Scalable Molecular Dynamics and Monte Carlo Joshua Anderson and Jens Glaser Glotzer Group, Chemical Engineering, University of Michigan Blue Waters Symposium, Sun River, OR 05/12/2015 T HE G LOTZER G ROUP
Scaling on OLCF Cray XK7 % % % Figure 4 – Example of colloidal scale nucleation and growth of a crystal from a fluid of hard octahedra. % Fig. 1: Strong scaling benchmarks for all three simulation types (MD, MC, DEM) supported by HOOMD-blue, on Titan. Shown is T HE G LOTZER G ROUP
Applications of HOOMD-blue A B Mahynsk, A. Nat. Comm. 2014 Glaser et al. Macromolecules 2014 C Knorowski, C. and Travesset, A. JACS 2014 Beltran-Villegas et al. Soft Matter 2014 Trefz, B. et al. PNAS 2014 Long, A.W. and Ferguson, A.L. Marson, R. L. et al. Nano Lett. 2014 Nguyen et al. Phys Rev. Lett. 2014 J. Phys. Chem. B 2014 >100 peer-reviewed publications using HOOMD-blue as of May 2015 http://codeblue.umich.edu/hoomd-blue/publications.html T HE G LOTZER G ROUP ⎤
Universality of Block Copolymer Melts ð Þ 30 30 e 1 N ODT 25 20 S1 16 25 15 H 64 10 e N ODT 10 2 10 3 10 4 AB Diblock copolymer melt S1 32 N FH S2 16 20 S3 16 S1 64 S2 32 S3 32 S1 128 S2 64 15 S3 64 SCFT 10 100 200 500 1000 2000 5000 10000 Glaser, J., Medapuram, P., Beardsley, Medapuram P., Glaser J., Morse D. C. N T. M., Matsen, M. W., & Morse, D. C. Macromolecules 2015, 48 , 819-839. PRL , 113 , 068302 (2014) T HE G LOTZER G ROUP
Spatial domain decomposition • Particles can leave and enter r cut domains under periodic boundary r buff conditions • Ghost particles required for force computation • Update positions of ghost particles every time step T HE G LOTZER G ROUP
Scaling bottlenecks in spatial domain decomposition 4-12 cores 6 GB/s CPU CPU Network 6 GB/s GPU GPU 1000’s of cores T HE G LOTZER G ROUP
Compute vs. Communication GPU=K20X 200 ◆ ◆ ◆ Migrate ▼ ◆ 100 μ s average t step [ μ s ] ▲ ▲ ▲ Ghost exchange ▲ ▲ 50 500 ▼ ▼ ▼ ▼ ■ ● 20 ◆ Ghost update 2 4 8 16 ■ ● 200 ◆ ◆ ◆ ◆ ■ ● 100 ■ ● Neighbor ■ ● 50 ■ Force ● ◆ Communication 20 1 2 4 8 16 P (=# GPUs ) N=64,000 T HE G LOTZER G ROUP
Optimization of the communication algorithm communication computation overlap Profile of 1 MD time step MPI Comm Comm Collective GPU Pair NVT Pair pack unpack Thermo 50 μ s pack/unpack on GPU auto-tune kernel • Device-resident data • Autotune kernels • Overlap synchronization with computation T HE G LOTZER G ROUP
Weak scaling up to 108,000,000 particles weak scaling 2000 ● time steps / sec. 1000 ● ● ■ ● ● ● ● ● ●● ■ ■ 500 ■ ■ ■ ■ ■ ■ ■ ● ■ ● HOOMD - blue 1.0 200 ■ LAMMPS - GPU 11Nov13 100 1 2 5 10 20 50100 1000 # of GPUs (= # of nodes ) Trung Nguyen 32,000 particles/GPU T HE G LOTZER G ROUP
Strong Scaling of a LJ Liquid (N=10,976,000) strong scaling ● 1000 ● ● ■ ● time steps / sec. ■ 500 ■ ● ■ ● ■ 200 ● ■ 100 ● ■ 50 ● ■ ● 20 ● HOOMD - blue 1.0 ■ 10 ■ ■ LAMMPS - GPU 11Nov13 4 8 16 32 64 128 256 5121024 # of GPUs (= # of nodes ) Trung Nguyen T HE G LOTZER G ROUP
Strong Scaling Efficiency 100 ● ■ ▼ ▲ ◆ ○ ▲ ▼ ○ ◆ ○ ■ ▼ ▲ ● 256,000 ◆ ● 80 ○ efficiency [%] ■ ◆ ▼ 80% efficiency at ▲ ■ 864,000 ■ ◆ ▲ 250,000 ptls/GPU ○ 60 ● ◆ 2,048,000 ▼ ▼ ▲ ◆ ■ ○ ● ▲ 4,000,000 40 ■ ◆ ▲ ○ ● ■ ▼ 6,912,000 ▼ ○ ▲ 20 ● ○ ○ 10,976,000 ○ ● 0 10 5 5 × 10 5 1.5 × 10 6 2 × 10 6 2.5 × 10 6 10 6 N / P ( Number of particles per GPU ) T HE G LOTZER G ROUP
Polymer Brush Scaling CPU ● N = 107,520 □ N = 430,080 ■ N = 430,080 time steps / sec. ◆ N = 1,720,320 1000 ● ● ■ 500 ● ■ ● ■ ◆ 200 ● ■ ◆ ● ■ ◆ 100 ● ■ 50 ■ 20 ■ 10 □ 5 □ □ 2 □ 1 1 2 4 8 16 32 64 128 # nodes (= # GPUs ) Jaime Millan T HE G LOTZER G ROUP
GPUDirect RDMA on Wilkes System UDA 5/6 Memory � CPU Chip GPU set � d InfiniBand � GPU Memory � Pak Lui, Filippo Spiga, Rong Shi T HE G LOTZER G ROUP
Dissipative Particle Dynamics on Blue Waters and Titan T HE G LOTZER G ROUP
Summary - Molecular Dynamics • Multi-GPU support in HOOMD 1.0 enables large- scale MD using spatial domain decomposition • Strong Scaling extends to 1000’s of GPUs, and to more complex systems • GPUDirect RDMA is a promising technology, although strong scaling is ultimately limited by PCIe and kernel launch latency Glaser J., Nguyen T.D., Anderson J.A. et al. Strong scaling of general-purpose molecular dynamics simulations on GPUs. Comput. Phys. Commun. 192 , pp. 97-107 (2015) doi:10.1016/j.cpc.2015.02.028. T HE G LOTZER G ROUP
Molecular dynamics Monte Carlo Tethered nanospheres Truncated Tetrahedra Arbitrary polyhedra Quasicrystal growth Langevin dynamics Hard particle MC Hard particle MC Molecular Dynamics Engel M. et al., Nature Materials (in press) Marson, R, Nano Letters 14 , 4, 2014 Damasceno, P. F. et al., ACS Nano 6 , 609 (2012) Damasceno, P. F. et al., Science 337 , 453 (2012) Self-propelled colloids Surfactant coated surfaces Interacting nanoplates Hard disks - hexatic Non-equilibrium MD Dissipative particle dynamics Hard particle MC with interactions Hard particle MC Pons-Siepermann, I. C., Soft matter 6 3919 (2012) Nguyen N., Phys Rev E 86 1, 2012 Engel M. et al., PRE 87 , 042134 (2013) Ye X. et al., Nature Chemistry cover article (2013) T HE G LOTZER G ROUP
Hard particle Monte Carlo H #P04 • Hard Particle Monte Carlo plugin for HOOMD-blue • 2D Shapes • Disk [100] • Convex (Sphero)polygon β -Mn • Concave polygon cP20 (A13) • Ellipse Damasceno, P. F. et al., ACS Nano 6 , 609 (2012) Damasceno et al., Science (2012) • 3D Shapes • Sphere • Ellipsoid • Convex (Sphero)polyhedon • NVT and NPT ensembles • Frenkel-Ladd free energy • Parallel execution on a single GPU • Domain decomposition across multiple nodes (CPUs or GPUs) Engel M. et al., PRE 87 , 042134 (2013) Damasceno et al., Science (2012) T HE G LOTZER G ROUP T HE G LOTZER G ROUP
Easy and flexible to use from hoomd_script import * from hoomd_plugins import hpmc init.read_xml ( filename =‘init.xml’) mc = hpmc.integrate.convex_polygon ( seed =10, d =0.25, a =0.3); mc. shape_param.set ('A', vertices =[(-0.5, -0.5), (0.5, -0.5), (0.5, 0.5), (-0.5, 0.5)]); run(10e3) T HE G LOTZER G ROUP
Overlap checks Separating axis • Disk/sphere - trivial • Convex polygons - separating axis • Concave polygons - brute force • Spheropolygons - XenoCollide/GJK ∆ ~ r • Convex polyhedra - XenoCollide/GJK • Ellipsoid / Ellipse: Matrix method • Compute delta in double, convert to single for expensive overlap check XenoCollide = ⊖ 1001.842 - 1000.967 = 0.875 T HE G LOTZER G ROUP
Divergence (a) Initialization Trial move Circumsphere check Overlap check Overlap divergence Early exit divergence Thread Time (0.5 ms total) (b) T HE G LOTZER G ROUP
Strong scaling - squares GPU: Tesla K20X, CPU: Xeon E5-2680 (XSEDE Stampede) 10 9 Trial moves per second 10 8 10 7 29x N=1,048,576 GPU CPU N=65,536 GPU CPU N=4,096 GPU CPU 80% e ffi ciency 50% e ffi ciency 10 6 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 P - GPUs/CPU cores T HE G LOTZER G ROUP
Weak scaling - truncated octahedra (3D) GPU: Tesla K20X on Cray XK7, CPU: AMD bulldozer on Cray XE6 8 XK7 XE6 Trials moves / N / sec 7 1.6x 6 5 8 27 64 125 216 343 512 1000 Nodes T HE G LOTZER G ROUP
Questions? HOOMD-blue: http://codeblue.umich.edu/hoomd-blue Monte Carlo code not yet publicly available. • It will eventually be released open-source as part of HOOMD-blue • Paper on hard disks: Anderson, J. A. et al., JCP 254 , 27-38 (2013) • Paper on 3D, anisotropic shapes, multi-GPU: coming soon Funding / Resources • National Science Foundation, Division of Materials Research Award # DMR 1409620 • This work was partially supported by a Simons Investigator award from the Simons Foundation to Sharon Glotzer • This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575. • This research is part of the Blue Waters sustained petascale computing project, which is supported by the National Science Foundation (award number ACI 1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. • This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. email: joaander@umich.edu T HE G LOTZER G ROUP
Recommend
More recommend