molecular dynamics md on gpus
play

Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating - PowerPoint PPT Presentation

Molecular Dynamics (MD) on GPUs Feb. 2, 2017 Accelerating Discoveries Using a supercomputer powered by the Tesla Platform with over 3,000 Tesla accelerators, University of Illinois scientists performed the first all-atom simulation of the HIV


  1. PME-Cellulose_NPT on P100s SXM2 40 PME-Cellulose_NPT 36.65 35 32.22 15.6X Running AMBER version 16.3 30 13.7X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 25 23.37 (Broadwell) CPUs ns/day 20 9.9X The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 10 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.35 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 26

  2. PME-Cellulose_NVE on K80s 20 PME-Cellulose_NVE 16.53 16 Running AMBER version 16.3 6.7X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 11.85 12 (Broadwell) CPUs ns/day 4.8X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 8 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 4 Xeon E5-2699 v4@2.2GHz [3.6GHz 2.47 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 27

  3. PME-Cellulose_NVE on P100s PCIe 40 PME-Cellulose_NVE 35 32.55 Running AMBER version 16.3 30 13.2X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 25 23.34 (Broadwell) CPUs ns/day 20 9.4X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 10 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.47 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 28

  4. PME-Cellulose_NVE on P100s SXM2 45 PME-Cellulose_NVE 40.88 40 35.16 Running AMBER version 16.3 35 16.6X The blue node contains Dual Intel Xeon 30 14.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] 24.94 (Broadwell) CPUs 25 ns/day The green nodes contain Dual Intel 20 10.1X Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 15 SXM2 GPUs 10 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.47 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 29

  5. PME-FactorIX_NPT on K80s 80 PME-FactorIX_NPT 70 66.68 Running AMBER version 16.3 60 5.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 48.54 50 (Broadwell) CPUs ns/day 40 The green nodes contain Dual Intel 4.2X Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 30 (autoboost) GPUs 20 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 11.43 Turbo] (Broadwell) 10 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 30

  6. PME-FactorIX_NPT on P100s PCIe 140 132.86 PME-FactorIX_NPT 120 11.6X Running AMBER version 16.3 98.77 100 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 80 ns/day 8.6X The green nodes contain Dual Intel 60 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 40 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 20 11.43 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 31

  7. PME-FactorIX_NPT on P100s SXM2 180 PME-FactorIX_NPT 159.80 160 14.0X 144.11 Running AMBER version 16.3 140 12.6X The blue node contains Dual Intel Xeon 120 E5-2699 v4@2.2GHz [3.6GHz Turbo] 106.25 (Broadwell) CPUs 100 9.3X ns/day The green nodes contain Dual Intel 80 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 60 SXM2 GPUs 40 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 20 [3.6GHz Turbo] (Broadwell) 11.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 32

  8. PME-FactorIX_NVE on K80s 80 PME-FactorIX_NVE 71.49 70 6.0X Running AMBER version 16.3 60 The blue node contains Dual Intel Xeon 51.14 E5-2699 v4@2.2GHz [3.6GHz Turbo] 50 5.4X (Broadwell) CPUs ns/day 40 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 30 (autoboost) GPUs 20 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 11.98 Turbo] (Broadwell) 10 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 33

  9. PME-FactorIX_NVE on P100s PCIe 160 PME-FactorIX_NVE 145.83 140 12.2X Running AMBER version 16.3 120 105.86 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 100 8.8X (Broadwell) CPUs ns/day 80 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 60 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 40 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 20 11.98 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 34

  10. PME-FactorIX_NVE on P100s SXM2 200 PME-FactorIX_NVE 178.02 180 159.24 160 14.9X Running AMBER version 16.3 140 13.3X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 114.88 120 (Broadwell) CPUs ns/day 100 9.6X The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 80 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 60 1x P100 SXM2 is paired with Single ➢ 40 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 20 11.98 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 35

  11. PME-JAC_NPT on K80s 250 PME-JAC_NPT 216.78 200 Running AMBER version 16.3 4.7X The blue node contains Dual Intel Xeon 162.09 E5-2699 v4@2.2GHz [3.6GHz Turbo] 150 (Broadwell) CPUs 3.5X ns/day The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 45.89 50 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 36

  12. PME-JAC_NPT on P100s PCIe 350 PME-JAC_NPT 327.69 300 283.60 7.1X Running AMBER version 16.3 250 6.2X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 200 ns/day The green nodes contain Dual Intel 150 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 100 1x P100 PCIe is paired with Single ➢ 45.89 Intel Xeon E5-2699 v4@2.2GHz 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 37

  13. PME-JAC_NPT on P100s SXM2 450 PME-JAC_NPT 423.09 400 9.2X 360.64 Running AMBER version 16.3 350 7.9X 310.52 The blue node contains Dual Intel Xeon 300 6.8X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 250 ns/day The green nodes contain Dual Intel 200 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 150 SXM2 GPUs 100 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 45.89 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node 38

  14. PME-JAC_NVE on K80s 250 234.99 PME-JAC_NVE 200 Running AMBER version 16.3 4.9X 173.20 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 150 (Broadwell) CPUs 3.6X ns/day The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 47.90 50 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 39

  15. PME-JAC_NVE on P100s PCIe 400 PME-JAC_NVE 363.79 350 7.6X 308.46 Running AMBER version 16.3 300 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 250 6.4X (Broadwell) CPUs ns/day 200 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 150 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 100 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 47.90 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 40

  16. PME-JAC_NVE on P100s SXM2 500 PME-JAC_NVE 473.10 450 9.9X 402.18 400 Running AMBER version 16.3 8.4X 339.81 350 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 300 7.1X (Broadwell) CPUs ns/day 250 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 200 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 150 1x P100 SXM2 is paired with Single ➢ 100 Intel Xeon E5-2698 v4@2.2GHz 47.90 [3.6GHz Turbo] (Broadwell) 50 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node 41

  17. GB-Myoglobin on K80s 400 GB-Myoglobin 339.45 350 Running AMBER version 16.3 288.47 300 11.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 250 10.0X (Broadwell) CPUs ns/day 200 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 150 (autoboost) GPUs 100 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 50 28.86 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 42

  18. GB-Myoglobin on P100s PCIe 600 GB-Myoglobin 561.94 483.37 500 19.5X Running AMBER version 16.3 The blue node contains Dual Intel Xeon 400 E5-2699 v4@2.2GHz [3.6GHz Turbo] 16.7X (Broadwell) CPUs ns/day 300 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 200 PCIe (16GB) GPUs 1x P100 PCIe is paired with Single ➢ 100 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 28.86 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node 43

  19. GB-Myoglobin on P100s SXM2 700 GB-Myoglobin 639.37 600 22.2X 534.28 Running AMBER version 16.3 500 18.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 400 ns/day The green nodes contain Dual Intel 300 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 200 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 100 [3.6GHz Turbo] (Broadwell) 28.86 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe 4x P100 PCIe per node per node 44

  20. GB-Nucleosome on K80s 25 GB-Nucleosome 20.55 20 Running AMBER version 16.3 51.4X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 15 (Broadwell) CPUs ns/day The green nodes contain Dual Intel 11.31 Xeon E5-2699 v4@2.2GHz [3.6GHz 10 Turbo] (Broadwell) CPUs + Tesla K80 28.3X (autoboost) GPUs 5.84 1x K80 is paired with Single Intel ➢ 5 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 14.6X 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 45

  21. GB-Nucleosome on P100s PCIe 50 GB-Nucleosome 45.92 45 114.8X 39.91 40 Running AMBER version 16.3 35 99.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 30 (Broadwell) CPUs ns/day 25 22.77 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 20 56.9X Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 15 11.91 1x P100 PCIe is paired with Single ➢ 10 29.8X Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 5 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 46

  22. GB-Nucleosome on P100s SXM2 60 GB-Nucleosome 48.29 50 46.29 Running AMBER version 16.3 120.7X The blue node contains Dual Intel Xeon 40 E5-2699 v4@2.2GHz [3.6GHz Turbo] 115.7X (Broadwell) CPUs ns/day 30 The green nodes contain Dual Intel 25.53 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 20 SXM2 GPUs 63.8X 13.36 1x P100 SXM2 is paired with Single ➢ 10 Intel Xeon E5-2698 v4@2.2GHz 33.4X [3.6GHz Turbo] (Broadwell) 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 47

  23. Rubisco-75K on K80s 1.6 Rubisco-75K 1.4 1.34 Running AMBER version 16.3 1.2 134.0X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 1.0 (Broadwell) CPUs ns/day 0.8 The green nodes contain Dual Intel 0.69 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 0.6 69.0X (autoboost) GPUs 0.35 0.4 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0.2 35.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 48

  24. Rubisco-75K on P100s PCIe 4.5 Rubisco-75K 4.20 4.0 420.0X Running AMBER version 16.3 3.5 The blue node contains Dual Intel Xeon 3.0 2.69 E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.5 ns/day 269.0X The green nodes contain Dual Intel 2.0 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 1.40 1.5 PCIe (16GB) GPUs 1.0 1x P100 PCIe is paired with Single 140.0X ➢ 0.71 Intel Xeon E5-2699 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 71.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 49

  25. Rubisco-75K on P100s SXM2 5.0 Rubisco-75K 4.46 4.5 4.0 Running AMBER version 16.3 446.0X 3.5 The blue node contains Dual Intel Xeon 3.06 E5-2699 v4@2.2GHz [3.6GHz Turbo] 3.0 306.0X (Broadwell) CPUs ns/day 2.5 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 2.0 Turbo] (Broadwell) CPUs + Tesla P100 1.57 SXM2 GPUs 1.5 157.0X 1x P100 SXM2 is paired with Single ➢ 1.0 0.80 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0.5 80.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 50

  26. AMBER 14

  27. AMBER 14 vs. AMBER 12 Courtesy of Scott Le Grand From GTC 2014 presentation 52

  28. AMBER 14; large P2P and small Boost Clocks impacts AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks Impact DHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off) 250 215.18 196.68 200 150 132.97 ns/day 125.77 100 Boost No Boost Boost No Boost No P2P P2P P2P No P2P 50 0 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@745Mhz (no P2P) Tesla K40@875Mhz (no P2P) Tesla K40@745Mhz (P2P) Tesla K40@875Mhz (P2P) Series1 125.77 132.97 196.68 215.18 53

  29. AMBER Performance Over Time Courtesy of Scott Le Grand From GTC 2014 presentation 54 54

  30. Cellulose on K40s, K80s and M6000s 20 Running AMBER version 14 PME-Cellulose_NVE The blue node contains Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 15.38 16 14.90 13.67 The green nodes contain Dual Intel E5- 8.0X 7.7X 11.76 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 12 Simulated Time (ns/day) 7.1X 10.49 either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro 8.96 6.1X 7.87 M6000@987Mhz GPUs 8 5.4X 4.6X 4.1X 4 1.93 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 55

  31. Factor IX on K40s, K80s and M6000s Running AMBER version 14 80 PME-FactorIX_NVE The blue node contains Dual Intel E5- 66.89 70 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 61.18 60.93 7.0X 60 The green nodes contain Dual Intel E5- 6.4X 50.70 47.80 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 6.3X 50 Simulated Time (ns/day) either NVIDIA Tesla K40@875Mhz, Tesla 5.2X 40.48 K80@562Mhz (autoboost), or Quadro 40 5.0X 33.59 M6000@987Mhz GPUs 4.2X 30 3.5X 20 9.68 10 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 56

  32. JAC on K40s, K80s and M6000s 250 Running AMBER version 14 225.34 PME-JAC_NVE 219.83 The blue node contains Dual Intel E5- 200.34 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 200 6.0X 5.9X 174.34 5.4X 161.53 The green nodes contain Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 150 4.7X Simulated Time (ns/day) 134.82 either NVIDIA Tesla K40@875Mhz, Tesla 121.30 4.3X K80@562Mhz (autoboost), or Quadro 3.6X M6000@987Mhz GPUs 100 3.2X 50 37.38 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 57

  33. Cellulose on M40s 18 PME - Cellulose_NPT 15.90 16 Running AMBER version 14 14.9X 14.40 14 The blue node contain Single Intel Xeon 13.5X E5-2698 v3@2.30GHz (Haswell) CPUs 12 Simulated Time (ns/Day) The green nodes contain Single Intel 10.12 Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 CPUs + Tesla M40 (autoboost) GPUs 9.5X 8 6 4 2 1.07 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 58

  34. Cellulose on M40s 18 PME - Cellulose_NVE 17.13 16 15.41 Running AMBER version 14 16.0X 14 The blue node contain Single Intel Xeon 14.4X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 12 The green nodes contain Single Intel 10.50 Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 9.8X CPUs + Tesla M40 (autoboost) GPUs 8 6 4 2 1.07 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 59

  35. FactorIX on M40s 80 PME - FactorIX_NPT 72.96 Running AMBER version 14 70 67.37 13.6X The blue node contain Single Intel Xeon 60 E5-2698 v3@2.30GHz (Haswell) CPUs 12.5X Simulated Time (ns/Day) 50 The green nodes contain Single Intel 46.90 Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 40 8.7X 30 20 10 5.38 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 60

  36. FactorIX on M40s 90 PME - FactorIX_NVE 80.04 80 Running AMBER version 14 14.6X 73.00 70 The blue node contain Single Intel Xeon 13.3X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 60 The green nodes contain Single Intel 49.33 Xeon E5-2697 v2@2.70GHz (IvyBridge) 50 CPUs + Tesla M40 (autoboost) GPUs 9.0X 40 30 20 10 5.47 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 61

  37. JAC on M40s 250 PME - JAC_NPT 226.63 Running AMBER version 14 211.97 200 10.9X The blue node contain Single Intel Xeon 10.2X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 149.40 The green nodes contain Single Intel 150 Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 7.2X 100 50 20.88 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 62

  38. JAC on M40s 300 PME - JAC_NVE Running AMBER version 14 246.15 250 230.18 The blue node contain Single Intel Xeon 11.7X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 200 The green nodes contain Single Intel 10.9X Xeon E5-2697 v2@2.70GHz (IvyBridge) 157.68 CPUs + Tesla M40 (autoboost) GPUs 150 7.5X 100 50 21.11 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 63

  39. Myoglobin on M40s 350 GB - Myoglobin 322.09 300.86 Running AMBER version 14 300 32.8X 30.6X The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs 250 232.20 Simulated Time (ns/Day) The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 200 23.6X CPUs + Tesla M40 (autoboost) GPUs 150 100 50 9.83 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 64

  40. Nucleosome on M40s 18 GB - Nucleosome 16.11 16 Running AMBER version 14 14 The blue node contain Single Intel Xeon 123.9X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 12 The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 9.05 CPUs + Tesla M40 (autoboost) GPUs 8 69.6X 6 4.67 4 35.9X 2 0.13 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 65

  41. TrpCage on M40s 900 GB - TrpCage 831.91 800 Running AMBER version 14 2.03X 700 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 600 551.36 The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 500 464.63 1.3X CPUs + Tesla M40 (autoboost) GPUs 408.88 400 1.1X 300 200 100 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 66

  42. Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs Kepler K20, K40, K80, P100 1-4 # of GPUs per CPU socket GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better Scale to multiple nodes with same single node configuration 67 67

  43. CHARMM DOMDEC-GUI July 2016

  44. CHARMM DOMDEC-GUI 465 K System Benchmark 4 465 K System (Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 2.15 ns/day The green nodes contain Dual Intel 2 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs 6.0X 1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 0 1 Haswell node 1 node + 1x K80 per node 69

  45. CHARMM DOMDEC-GUI 534 K System Benchmark 2.0 534 K System (POPC_PSPC_CHL1mixture) Running CHARMM version c40a1 1.5 1.43 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel ns/day 1.0 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs 8.0X 0.5 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.18 0.0 1 Haswell node 1 node + 1x K80 per node 70

  46. CHARMM DOMDEC-GUI 20 K System Benchmark 80 20 K System (Crambin) Running CHARMM version c40a1 59.68 60 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs ns/day The green nodes contain Dual Intel 40 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 3.7X 20 Benchmarks were done based on the STANDARD 16.00 CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0 1 Haswell node 1 node + 1x M40 per node 71

  47. CHARMM DOMDEC-GUI 61 K System Benchmark 35 61 K System (GlnBP) 30 *Higher is better Running CHARMM version c40a1 25.08 25 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 20 6.4X The green nodes contain Dual Intel ns/day Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 15 10 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 5 3.90 0 1 Haswell node 1 node + 1x M40 per node 72

  48. CHARMM DOMDEC-GUI 465 K System Benchmark 4 465 K System (Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 2.27 The green nodes contain Dual Intel ns/day 2 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 6.3X 1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 0 1 Haswell node 1 node + 1x M40 per node 73

  49. GROMACS 2016 October 2016

  50. Erik Lindahl (GROMACS developer) video 75

  51. Water 1.5M on K80s 7 Water 1.5M 6.14 6 2.2X 5.22 5 Running GROMACS version 2016 1.9X The blue node contains Dual Intel Xeon 4 E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 3 2.79 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 2 (autoboost) GPUs 1 0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node 76 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  52. Water 3M on K80s 4 Water 3M 3.05 3 2.3X 2.66 3 Running GROMACS version 2016 2.0X The blue node contains Dual Intel Xeon 2 E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 2 The green nodes contain Dual Intel 1.32 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 1 (autoboost) GPUs 1 0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node 77 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  53. Water 1.5M on M40s 8 7.60 Water 1.5M 7 2.7X 6.15 6 2.2X Running GROMACS version 2016 5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel 2.79 3 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs 2 1 0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node 78 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  54. Water 3M on M40s 4.5 Water 3M 3.94 4.0 3.5 3.0X Running GROMACS version 2016 2.97 3.0 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 2.5 ns/day 2.3X (Broadwell) CPUs 2.0 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs 1.0 0.5 0.0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node 79 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  55. Water 1.5M on P40s 9 Water 1.5M 8.07 8 7 6.60 2.9X Running GROMACS version 2016 6 2.4X The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 5 ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 2.79 3 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 2 1 0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node 80 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  56. Water 3M on P40s 4.5 Water 3M 4.19 4.0 3.2X 3.5 3.36 Running GROMACS version 2016 3.0 2.5X The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 2.5 ns/day (Broadwell) CPUs 2.0 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 1.0 0.5 0.0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node 81 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  57. Water 1.5M on P100 PCIes 8 Water 1.5M 7.11 7 2.5X 6.34 6 2.3X Running GROMACS version 2016 5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day 4 (Broadwell) CPUs The green nodes contain Dual Intel 2.79 3 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + 2 Tesla P100 PCIe (16GB) GPUs 1 0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node 82 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  58. Water 3M on P100 PCIes 4.0 Water 3M 3.43 3.5 2.6X 3.16 3.0 2.4X Running GROMACS version 2016 2.5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day 2.0 (Broadwell) CPUs The green nodes contain Dual Intel 1.5 1.32 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + 1.0 Tesla P100 PCIe (16GB) GPUs 0.5 0.0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node 83 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

  59. GROMACS 5.1.2 February 2017

  60. Water 1.5M on K80s 7 Water 1.5M 6 5.75 Running GROMACS version 5.1.2 5 1.9X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 4 ns/day 3.49 The green nodes contain Dual Intel 3.04 Xeon E5-2699 v4@2.2GHz [3.6GHz 3 1.1X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 2 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 1 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 85

  61. Water 1.5M on P100s PCIe 10 Water 1.5M 8 Running GROMACS version 5.1.2 7.21 6.96 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 6 (Broadwell) CPUs 2.4X ns/day 2.3X 4.39 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 4 Turbo] (Broadwell) CPUs + Tesla P100 3.04 1.4X PCIe (16GB) GPUs 1x P100 PCIe is paired with Single ➢ 2 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node 86

  62. Water 1.5M on P100s SXM2 9 Water 1.5M 7.88 8 2.6X 7.18 Running GROMACS version 5.1.2 7 6.70 2.4X The blue node contains Dual Intel Xeon 6 2.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 5 ns/day 4.11 The green nodes contain Dual Intel 4 Xeon E5-2698 v4@2.2GHz [3.6GHz 1.4X 3.04 Turbo] (Broadwell) CPUs + Tesla P100 3 SXM2 GPUs 2 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 1 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x 100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 87

  63. Water 3M on K80s 3.5 Water 3M 2.98 3.0 Running GROMACS version 5.1.2 2.5 The blue node contains Dual Intel Xeon 2.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.0 ns/day The green nodes contain Dual Intel 1.59 Xeon E5-2699 v4@2.2GHz [3.6GHz 1.5 1.38 1.2X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1.0 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 0.5 Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 88

  64. Water 3M on P100s PCIe 4.0 3.80 Water 3M 3.43 3.5 2.8X Running GROMACS version 5.1.2 3.0 2.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 2.5 (Broadwell) CPUs ns/day 1.96 2.0 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 1.38 1.5 Turbo] (Broadwell) CPUs + Tesla P100 1.4X PCIe (16GB) GPUs 1.0 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node 89

  65. Water 3M on P100s SXM2 4.5 Water 3M 4.0 3.82 3.50 Running GROMACS version 5.1.2 3.5 2.8X The blue node contains Dual Intel Xeon 3.0 2.5X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.5 ns/day The green nodes contain Dual Intel 2.0 1.84 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 1.38 1.5 SXM2 GPUs 1.3X 1.0 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 90

  66. Recommended GPU Node Configuration for GROMACS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs Kepler K20, K40, K80 1x # of GPUs per CPU socket Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand 91 91

  67. HOOMD-Blue 1.3.3 February 2017

  68. lj-liquid on K80s 2500 lj-liquid 1942.12 2000 Running HOOMD-Blue version 1.3.3 5.9X The blue node contains Dual Intel Xeon 1594.37 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg time steps/sec 1500 (Broadwell) CPUs 1324.84 4.9X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 1000 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 4.1X 1x K80 is paired with Single Intel ➢ 500 Xeon E5-2699 v4@2.2GHz [3.6GHz 326.52 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 93

  69. lj-liquid on P100s PCIe 3500 lj-liquid 3217.68 2912.66 3000 9.9X Running HOOMD-Blue version 1.3.3 2500 8.9X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 2000 The green nodes contain Dual Intel 1500 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 1000 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 500 326.52 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node 94

  70. lj-liquid on P100s SXM2 4000 lj-liquid 3397.74 3500 3129.11 Running HOOMD-Blue version 1.3.3 3000 10.4X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 2500 avg timesteps/sec 9.6X (Broadwell) CPUs 2000 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1500 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 1000 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 500 326.52 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 SXM2 8x P100 SXM2 per node per node 95

  71. lj_liquid_512k on K80s 600 lj_liquid_512k 526.47 500 Running HOOMD-Blue version 1.3.3 12.1X The blue node contains Dual Intel Xeon 400 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 334.59 300 The green nodes contain Dual Intel 7.7X Xeon E5-2699 v4@2.2GHz [3.6GHz 220.10 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 200 5.1X 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 96

  72. lj_liquid_512k on P100s PCIe 1200 lj_liquid_512k 1045.50 1000 Running HOOMD-Blue version 1.3.3 24.1X The blue node contains Dual Intel Xeon 770.18 800 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec 17.7X (Broadwell) CPUs 600 534.54 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 12.3X 398.12 Turbo] (Broadwell) CPUs + Tesla P100 400 PCIe (16GB) GPUs 9.2X 1x P100 PCIe is paired with Single ➢ 200 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 97

  73. lj_liquid_512k on P100s SXM2 1200 1119.76 lj_liquid_512k 1000 Running HOOMD-Blue version 1.3.3 25.8X 793.36 The blue node contains Dual Intel Xeon 800 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec 18.3X (Broadwell) CPUs 568.51 600 The green nodes contain Dual Intel 13.1X Xeon E5-2698 v4@2.2GHz [3.6GHz 443.74 Turbo] (Broadwell) CPUs + Tesla P100 400 SXM2 GPUs 10.2X 1x P100 SXM2 is paired with Single ➢ 200 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 98

  74. lj_liquid_1m on K80s 350 lj_liquid_1m 303.00 300 Running HOOMD-Blue version 1.3.3 250 The blue node contains Dual Intel Xeon 13.7X E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 200 181.42 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 150 8.2X Turbo] (Broadwell) CPUs + Tesla K80 109.54 (autoboost) GPUs 100 5.0X 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 50 Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 99

  75. lj_liquid_1m on P100s PCIe 800 lj_liquid_1m 700 672.46 Running HOOMD-Blue version 1.3.3 600 30.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 500 avg timesteps/sec 465.58 (Broadwell) CPUs 400 The green nodes contain Dual Intel 21.1X Xeon E5-2699 v4@2.2GHz [3.6GHz 294.88 300 Turbo] (Broadwell) CPUs + Tesla P100 13.4X PCIe (16GB) GPUs 204.67 200 9.3X 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 100 [3.6GHz Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 100

Recommend


More recommend