Molecular Dynamics (MD) on GPUs March 2019
Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV virus and discovered the chemical structure of its capsid — “the perfect target for fighting the infection.” Without gpu, the supercomputer would need to be 5x larger for similar performance. 2
Overview of Life & Material Accelerated Apps MD QC All key codes are GPU-accelerated All key codes are ported or optimizing Great multi-GPU, multi-node (dense) performance GPU-accelerated math libraries, OpenACC directives GPU-accelerated apps GPU-accelerated apps ACEMD*, AMBER*, BAND, CHARMM, DESMOND, ESPResso, Folding@Home, ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-UK, GPAW, LATTE, LSDalton, LSMS, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*, LAMMPS, Lattice Microbes*, mdcore, MOLCAS, MOPAC2012, NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack, Quantum MELD, miniMD, NAMD, OpenMM, PolyFTS, SOP-GPU* & more Espresso/PWscf, QUICK, TeraChem* Active acceleration projects CASTEP, GAMESS, Gaussian, ONETEP, Quantum Supercharger Library*, VASP & more green* >90% of the workload is on GPU 3
MD vs. QC on GPUs Molecular Dynamics Quantum Chemistry Properties - electronic properties, Simulates atomic positions over time ground state, excitation, spectra Calculations Chemical-biological or chemical-material Examples: MO, PW, DFT, semi-emp Simple empirical formulas Electron wave function Forces No bond rearrangements Bond rearrangements allowed Millions Thousands Atom count Solvent optional Solvent included without difficulty Solvent Classical QM/MM or implicit methods Primarily FP32 Primarily FP64 Numeric precision CUDA - cuBLAS, cuFFT CUDA - cuFFT Solvers – cuTensor, Eigen Software acceleration OpenACC Quadro for workstations Tesla for data center NVIDIA GPUs Tesla for data center Not required Required Error correction (ECC) 4
GPU-Accelerated Molecular Dynamics Apps Performance Slides Available • DESMOND/FEP • HTMD ACEMD HOOMD-Blue • ESPResSO • mdcore AMBER/GTI LAMMPS • Folding@Home • MELD Chameleon NAMD • Genesis • OpenMM CHARMM • GPUGrid.net • PolyFTS GROMACS • HALMD 5 GPU Perf compared against dual multi-core x86 CPU socket.
MD Applications GPU-Accelerated Computing Turbocharge your research! • Speedup of 3X-8X compared to CPU only in all tests (average) • Majority of compute intensive for classical MD ported to GPUs • Large performance boost and improve TCO for compute infrastructure • Tesla GPUs are more energy efficient <50% of CPU-only computing • GPUs scale well within a node and/or over multiple nodes • Tesla V100 is highest performance GPU Try GPU accelerated MD apps for free – nvidia.com/GPUTestDrive 6
AmberMD 18.10-AT_18.12 March 2019
AmberMD 18.10_AT_18.12- PME-Cellulose AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 90 30.0X 80 78.55 25.0X 70 71.11 Cellulose 60 63.19 20.0X 408,609 atoms 58.54 57.68 50 Running AmberMD 18.10_AT_18.12 ns/day 48.13 15.0X 40 4X V100 The blue node contains Dual Intel Xeon 2X V100 24.2X Gold 6140 (Skylake) CPUs 4X V100 21.9X 2X V100 30 10.0X 1X V100 20.3X 1X V100 18.5X 18.0X The green nodes contain Dual Intel 15.4X 20 Xeon Gold 6140 (Skylake) CPUs + Tesla 5.0X Skylake Skylake V100 SXM2 (32GB) GPUs 10 Dual CPU Dual CPU 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-Cellulose_NPT 2fs PME-Cellulose_NVE 2fs 8
AmberMD 18.10_AT_18.12 - PME-FactorIX AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 350 25.0X 326.85 300 20.0X Factor IX 290.8 90,906 atoms 268.08 250 262.56 236.39 15.0X 200 207.66 ns/day Running AmberMD 18.10_AT_18.12 4X V100 150 The blue node contains Dual Intel Xeon 2X V100 10.0X 21.0X 4X V100 Gold 6140 (Skylake) CPUs 1X V100 18.7X 2X V100 17.8X 16.9X 1X V100 15.7X 100 The green nodes contain Dual Intel 13.8X Xeon Gold 6140 (Skylake) CPUs + Tesla 5.0X Skylake Skylake V100 SXM2 (32GB) GPUs 50 Dual CPU Dual CPU 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-FactorIX_NPT 2fs PME-FactorIX_NVE 2fs 9
AmberMD 18.10_AT_18.12 - PME-JAC AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 800 14.0X 700 12.0X 687.83 600 622.91 DHFR 10.0X 591.28 571.21 23,558 atoms 500 522.88 506.36 8.0X Running AmberMD 18.10_AT_18.12 ns/day 400 4X V100 6.0X 1X V100 The blue node contains Dual Intel Xeon 12.3X 2X V100 4X V100 300 11.1X 1X V100 Gold 6140 (Skylake) CPUs 2X V100 10.6X 10.4X 9.6X 9.3X 4.0X The green nodes contain Dual Intel 200 Xeon Gold 6140 (Skylake) CPUs + Tesla Skylake Skylake V100 SXM2 (32GB) GPUs Dual CPU Dual CPU 2.0X 100 1.0X 1.0X Speed up over dual CPU node (X) 0 0.0X PME-JAC_NPT 2fs PME-JAC_NVE 2fs 10
AmberMD 18.10_AT_18.12 - PME-STMV_NPT AmberMD 18.10-AT_18.12 - Tesla V100-SXM2-32GB 25 25.0X 20 20.0X 20.83 19.94 Satellite Tobacco Mosaic Virus 1,067,095 atoms 17.02 15 15.0X ns/day Running AmberMD 18.10_AT_18.12 4X V100 2X V100 The blue node contains Dual Intel Xeon 21.9X 10 10.0X 21.0X 1X V100 Gold 6140 (Skylake) CPUs 17.9X The green nodes contain Dual Intel Xeon Gold 6140 (Skylake) CPUs + Tesla 5 5.0X Skylake V100 SXM2 (32GB) GPUs Dual CPU 1.0X Speed up over dual CPU node (X) 0 0.0X PME-STMV_NPT 4fs 11
AmberMD 18.10_AT_18.12 – P100 vs V100 AmberMD 18.10-AT_18.12 70.00 25.0X All benchmarks compared as set 63.19 60.00 Cellulose, FactorIX, JAC, STMV 20.0X 57.68 Running AmberMD 18.10_AT_18.12 50.00 48.13 The blue node contains Dual Intel Xeon 15.0X 40.00 Gold 6140 (Skylake) CPUs 39.99 ns/day 37.17 4X The green nodes contain Dual Intel 30.00 2X V100 10.0X 30.22 Xeon Gold 6140 (Skylake) CPUs + V100 1X 20.3X Tesla P100 SXM2 (16GB) GPUs or 18.5X 4X V100 2X 20.00 Tesla V100 SXM2 (32GB) GPUs 21.24 21.24 P100 15.4X 1X P100 12.8X 5.0X P100 11.9X Speed up over dual CPU node (X) Skylake Skylake 10.00 9.7X Dual CPU Dual CPU 1.0X 1.0X 0.00 0.0X P100 V100 12
AmberMD 18.10_AT_18.12- PME-Cellulose AmberMD 18.10-AT_18.12 - Tesla T4 30.0 9.0X 8.0X 25.0 24.9 7.0X 23.9 22.7 Cellulose 21.8 20.0 6.0X 408,609 atoms 5.0X 17.1 Running AmberMD 18.10_AT_18.12 ns/day 15.0 16.0 2X T4 4.0X 4X T4 2X T4 The blue node contains Dual Intel Xeon 4X T4 7.7X 7.4X 7.3X Gold 6140 (Skylake) CPUs 7.0X 10.0 3.0X 1X T4 1X T4 5.3X 5.1X The green nodes contain Dual Intel Skylake Skylake 2.0X Xeon Gold 6140 (Skylake) CPUs + Tesla Dual CPU Dual CPU 5.0 T4 PCIe (16GB) GPUs 1.0X 1.0X 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-Cellulose_NPT 2fs PME-Cellulose_NVE 2fs 13
AmberMD 18.10_AT_18.12 - PME-FactorIX AmberMD 18.10-AT_18.12 - Tesla T4 140.0 9.0X 8.0X 120.0 123.8 Factor IX 7.0X 112.5 112.6 90,906 atoms 100.0 102.3 6.0X 80.0 85.0 5.0X Running AmberMD 18.10_AT_18.12 ns/day 79.6 2X T4 2X T4 4.0X The blue node contains Dual Intel Xeon 60.0 4X T4 8.0X 4X T4 7.5X Gold 6140 (Skylake) CPUs 7.2X 6.8X 1X T4 3.0X 1X T4 The green nodes contain Dual Intel 40.0 5.5X 5.3X Xeon Gold 6140 (Skylake) CPUs + Tesla Skylake Skylake 2.0X Dual CPU Dual CPU T4 PCIe (16GB) GPUs 1.0X 1.0X 20.0 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-FactorIX_NPT 2fs PME-FactorIX_NVE 2fs 14
AmberMD 18.10_AT_18.12 - PME-JAC AmberMD 18.10-AT_18.12 - Tesla T4 400.0 8.0X 372.8 350.0 7.0X 336.3 331.8 300.0 6.0X DHFR 301.8 285.4 23,558 atoms 250.0 5.0X 262.2 ns/day Running AmberMD 18.10_AT_18.12 200.0 4.0X 2X T4 The blue node contains Dual Intel Xeon 2X T4 4X T4 6.7X Gold 6140 (Skylake) CPUs 150.0 3.0X 4X T4 6.1X 6.0X 1X T4 1X T4 5.5X 5.1X The green nodes contain Dual Intel 4.8X 100.0 2.0X Skylake Skylake Xeon Gold 6140 (Skylake) CPUs + Tesla Dual CPU Dual CPU T4 PCIe (16GB) GPUs 1.0X 1.0X 50.0 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-JAC_NPT 2fs PME-JAC_NVE 2fs 15
AmberMD 18.10_AT_18.12 - PME-STMV_NPT AmberMD 18.10-AT_18.12 - Tesla T4 16.0 9.0X 15.0 8.0X 14.0 14.3 7.0X 12.0 Satellite Tobacco Mosaic Virus 1,067,095 atoms 6.0X 10.7 10.0 5.0X ns/day Running AmberMD 18.10_AT_18.12 8.0 2X T4 4X T4 4.0X 8.2X The blue node contains Dual Intel Xeon 7.8X 6.0 Gold 6140 (Skylake) CPUs 1X T4 3.0X 5.9X The green nodes contain Dual Intel 4.0 Skylake 2.0X Xeon Gold 6140 (Skylake) CPUs + Tesla Dual CPU T4 PCIe (16GB) GPUs 1.0X 2.0 1.0X Speed up over dual CPU node (X) 0.0 0.0X PME-STMV_NPT 4fs 16
AmberMD recommended usage Motherboard and CPU Dual-socket with server x86-64 CPU System memory >=16GB GPUs Tesla V100 SXM2 GPUs per socket 1 to 8 GPUs per task 1 – 4 (case dependent) 17 17
GROMACS 2019.1 March 2019
GROMACS 2019.1 - ADH Dodec GROMACS 2019.1 - Tesla V100-SXM2-32GB 250 4.0X 3.5X 200 3.0X 193.52 184.67 ADH 134,000 atoms 2.5X 160.21 150 ns/day Running GROMACS 2019.1 2.0X 4X V100 2X V100 3.6X 1X V100 The blue node contains Dual Intel Xeon 100 3.4X 1.5X Gold 6140 (Skylake) CPUs 3.0X 1.0X The green nodes contain Dual Intel 50 Xeon Gold 6140 (Skylake) CPUs + Tesla 53.7 Skylake V100 SXM2 (32GB) GPUs 0.5X Dual CPU 1.0X Speed up over dual CPU node (X) 0 0.0X ADH Dodec (h-bond) 19
Recommend
More recommend