PME-Cellulose_NPT on P100s SXM2 40 PME-Cellulose_NPT 36.65 35 32.22 15.6X Running AMBER version 16.3 30 13.7X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 25 23.37 (Broadwell) CPUs ns/day 20 9.9X The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 10 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.35 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 26
PME-Cellulose_NVE on K80s 20 PME-Cellulose_NVE 16.53 16 Running AMBER version 16.3 6.7X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 11.85 12 (Broadwell) CPUs ns/day 4.8X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 8 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 4 Xeon E5-2699 v4@2.2GHz [3.6GHz 2.47 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 27
PME-Cellulose_NVE on P100s PCIe 40 PME-Cellulose_NVE 35 32.55 Running AMBER version 16.3 30 13.2X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 25 23.34 (Broadwell) CPUs ns/day 20 9.4X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 15 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 10 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.47 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 28
PME-Cellulose_NVE on P100s SXM2 45 PME-Cellulose_NVE 40.88 40 35.16 Running AMBER version 16.3 35 16.6X The blue node contains Dual Intel Xeon 30 14.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] 24.94 (Broadwell) CPUs 25 ns/day The green nodes contain Dual Intel 20 10.1X Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 15 SXM2 GPUs 10 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 5 [3.6GHz Turbo] (Broadwell) 2.47 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 29
PME-FactorIX_NPT on K80s 80 PME-FactorIX_NPT 70 66.68 Running AMBER version 16.3 60 5.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 48.54 50 (Broadwell) CPUs ns/day 40 The green nodes contain Dual Intel 4.2X Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 30 (autoboost) GPUs 20 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 11.43 Turbo] (Broadwell) 10 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 30
PME-FactorIX_NPT on P100s PCIe 140 132.86 PME-FactorIX_NPT 120 11.6X Running AMBER version 16.3 98.77 100 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 80 ns/day 8.6X The green nodes contain Dual Intel 60 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 40 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 20 11.43 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 31
PME-FactorIX_NPT on P100s SXM2 180 PME-FactorIX_NPT 159.80 160 14.0X 144.11 Running AMBER version 16.3 140 12.6X The blue node contains Dual Intel Xeon 120 E5-2699 v4@2.2GHz [3.6GHz Turbo] 106.25 (Broadwell) CPUs 100 9.3X ns/day The green nodes contain Dual Intel 80 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 60 SXM2 GPUs 40 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 20 [3.6GHz Turbo] (Broadwell) 11.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 32
PME-FactorIX_NVE on K80s 80 PME-FactorIX_NVE 71.49 70 6.0X Running AMBER version 16.3 60 The blue node contains Dual Intel Xeon 51.14 E5-2699 v4@2.2GHz [3.6GHz Turbo] 50 5.4X (Broadwell) CPUs ns/day 40 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 30 (autoboost) GPUs 20 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 11.98 Turbo] (Broadwell) 10 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 33
PME-FactorIX_NVE on P100s PCIe 160 PME-FactorIX_NVE 145.83 140 12.2X Running AMBER version 16.3 120 105.86 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 100 8.8X (Broadwell) CPUs ns/day 80 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 60 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 40 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 20 11.98 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 34
PME-FactorIX_NVE on P100s SXM2 200 PME-FactorIX_NVE 178.02 180 159.24 160 14.9X Running AMBER version 16.3 140 13.3X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 114.88 120 (Broadwell) CPUs ns/day 100 9.6X The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 80 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 60 1x P100 SXM2 is paired with Single ➢ 40 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 20 11.98 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 35
PME-JAC_NPT on K80s 250 PME-JAC_NPT 216.78 200 Running AMBER version 16.3 4.7X The blue node contains Dual Intel Xeon 162.09 E5-2699 v4@2.2GHz [3.6GHz Turbo] 150 (Broadwell) CPUs 3.5X ns/day The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 45.89 50 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 36
PME-JAC_NPT on P100s PCIe 350 PME-JAC_NPT 327.69 300 283.60 7.1X Running AMBER version 16.3 250 6.2X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 200 ns/day The green nodes contain Dual Intel 150 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 100 1x P100 PCIe is paired with Single ➢ 45.89 Intel Xeon E5-2699 v4@2.2GHz 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 37
PME-JAC_NPT on P100s SXM2 450 PME-JAC_NPT 423.09 400 9.2X 360.64 Running AMBER version 16.3 350 7.9X 310.52 The blue node contains Dual Intel Xeon 300 6.8X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 250 ns/day The green nodes contain Dual Intel 200 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 150 SXM2 GPUs 100 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 45.89 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node 38
PME-JAC_NVE on K80s 250 234.99 PME-JAC_NVE 200 Running AMBER version 16.3 4.9X 173.20 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 150 (Broadwell) CPUs 3.6X ns/day The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1x K80 is paired with Single Intel ➢ 47.90 50 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 39
PME-JAC_NVE on P100s PCIe 400 PME-JAC_NVE 363.79 350 7.6X 308.46 Running AMBER version 16.3 300 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 250 6.4X (Broadwell) CPUs ns/day 200 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 150 Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 100 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 47.90 50 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) per node per node 40
PME-JAC_NVE on P100s SXM2 500 PME-JAC_NVE 473.10 450 9.9X 402.18 400 Running AMBER version 16.3 8.4X 339.81 350 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 300 7.1X (Broadwell) CPUs ns/day 250 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 200 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 150 1x P100 SXM2 is paired with Single ➢ 100 Intel Xeon E5-2698 v4@2.2GHz 47.90 [3.6GHz Turbo] (Broadwell) 50 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe per node per node per node 41
GB-Myoglobin on K80s 400 GB-Myoglobin 339.45 350 Running AMBER version 16.3 288.47 300 11.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 250 10.0X (Broadwell) CPUs ns/day 200 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 150 (autoboost) GPUs 100 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 50 28.86 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 42
GB-Myoglobin on P100s PCIe 600 GB-Myoglobin 561.94 483.37 500 19.5X Running AMBER version 16.3 The blue node contains Dual Intel Xeon 400 E5-2699 v4@2.2GHz [3.6GHz Turbo] 16.7X (Broadwell) CPUs ns/day 300 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 200 PCIe (16GB) GPUs 1x P100 PCIe is paired with Single ➢ 100 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 28.86 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node 43
GB-Myoglobin on P100s SXM2 700 GB-Myoglobin 639.37 600 22.2X 534.28 Running AMBER version 16.3 500 18.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 400 ns/day The green nodes contain Dual Intel 300 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 200 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 100 [3.6GHz Turbo] (Broadwell) 28.86 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe 4x P100 PCIe per node per node 44
GB-Nucleosome on K80s 25 GB-Nucleosome 20.55 20 Running AMBER version 16.3 51.4X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 15 (Broadwell) CPUs ns/day The green nodes contain Dual Intel 11.31 Xeon E5-2699 v4@2.2GHz [3.6GHz 10 Turbo] (Broadwell) CPUs + Tesla K80 28.3X (autoboost) GPUs 5.84 1x K80 is paired with Single Intel ➢ 5 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 14.6X 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 45
GB-Nucleosome on P100s PCIe 50 GB-Nucleosome 45.92 45 114.8X 39.91 40 Running AMBER version 16.3 35 99.8X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 30 (Broadwell) CPUs ns/day 25 22.77 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 20 56.9X Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 15 11.91 1x P100 PCIe is paired with Single ➢ 10 29.8X Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 5 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 46
GB-Nucleosome on P100s SXM2 60 GB-Nucleosome 48.29 50 46.29 Running AMBER version 16.3 120.7X The blue node contains Dual Intel Xeon 40 E5-2699 v4@2.2GHz [3.6GHz Turbo] 115.7X (Broadwell) CPUs ns/day 30 The green nodes contain Dual Intel 25.53 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 20 SXM2 GPUs 63.8X 13.36 1x P100 SXM2 is paired with Single ➢ 10 Intel Xeon E5-2698 v4@2.2GHz 33.4X [3.6GHz Turbo] (Broadwell) 0.40 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 47
Rubisco-75K on K80s 1.6 Rubisco-75K 1.4 1.34 Running AMBER version 16.3 1.2 134.0X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 1.0 (Broadwell) CPUs ns/day 0.8 The green nodes contain Dual Intel 0.69 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 0.6 69.0X (autoboost) GPUs 0.35 0.4 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0.2 35.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 48
Rubisco-75K on P100s PCIe 4.5 Rubisco-75K 4.20 4.0 420.0X Running AMBER version 16.3 3.5 The blue node contains Dual Intel Xeon 3.0 2.69 E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.5 ns/day 269.0X The green nodes contain Dual Intel 2.0 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 1.40 1.5 PCIe (16GB) GPUs 1.0 1x P100 PCIe is paired with Single 140.0X ➢ 0.71 Intel Xeon E5-2699 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 71.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 49
Rubisco-75K on P100s SXM2 5.0 Rubisco-75K 4.46 4.5 4.0 Running AMBER version 16.3 446.0X 3.5 The blue node contains Dual Intel Xeon 3.06 E5-2699 v4@2.2GHz [3.6GHz Turbo] 3.0 306.0X (Broadwell) CPUs ns/day 2.5 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 2.0 Turbo] (Broadwell) CPUs + Tesla P100 1.57 SXM2 GPUs 1.5 157.0X 1x P100 SXM2 is paired with Single ➢ 1.0 0.80 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0.5 80.0X 0.01 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 50
AMBER 14
AMBER 14 vs. AMBER 12 Courtesy of Scott Le Grand From GTC 2014 presentation 52
AMBER 14; large P2P and small Boost Clocks impacts AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks Impact DHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off) 250 215.18 196.68 200 150 132.97 ns/day 125.77 100 Boost No Boost Boost No Boost No P2P P2P P2P No P2P 50 0 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x 2 x Xeon E5-2690 v2@3.00GHz + 4 x Tesla K40@745Mhz (no P2P) Tesla K40@875Mhz (no P2P) Tesla K40@745Mhz (P2P) Tesla K40@875Mhz (P2P) Series1 125.77 132.97 196.68 215.18 53
AMBER Performance Over Time Courtesy of Scott Le Grand From GTC 2014 presentation 54 54
Cellulose on K40s, K80s and M6000s 20 Running AMBER version 14 PME-Cellulose_NVE The blue node contains Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 15.38 16 14.90 13.67 The green nodes contain Dual Intel E5- 8.0X 7.7X 11.76 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 12 Simulated Time (ns/day) 7.1X 10.49 either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro 8.96 6.1X 7.87 M6000@987Mhz GPUs 8 5.4X 4.6X 4.1X 4 1.93 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 55
Factor IX on K40s, K80s and M6000s Running AMBER version 14 80 PME-FactorIX_NVE The blue node contains Dual Intel E5- 66.89 70 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 61.18 60.93 7.0X 60 The green nodes contain Dual Intel E5- 6.4X 50.70 47.80 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 6.3X 50 Simulated Time (ns/day) either NVIDIA Tesla K40@875Mhz, Tesla 5.2X 40.48 K80@562Mhz (autoboost), or Quadro 40 5.0X 33.59 M6000@987Mhz GPUs 4.2X 30 3.5X 20 9.68 10 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 56
JAC on K40s, K80s and M6000s 250 Running AMBER version 14 225.34 PME-JAC_NVE 219.83 The blue node contains Dual Intel E5- 200.34 2698 v3@2.3GHz, 3.6GHz Turbo CPUs 200 6.0X 5.9X 174.34 5.4X 161.53 The green nodes contain Dual Intel E5- 2698 v3@2.3GHz, 3.6GHz Turbo CPUs + 150 4.7X Simulated Time (ns/day) 134.82 either NVIDIA Tesla K40@875Mhz, Tesla 121.30 4.3X K80@562Mhz (autoboost), or Quadro 3.6X M6000@987Mhz GPUs 100 3.2X 50 37.38 0 1 Haswell 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node 1 CPU Node Node + 1x K40 + 0.5x K80 + 1x K80 + 1x M6000 + 2x K40 + 2x K80 + 2x M6000 57
Cellulose on M40s 18 PME - Cellulose_NPT 15.90 16 Running AMBER version 14 14.9X 14.40 14 The blue node contain Single Intel Xeon 13.5X E5-2698 v3@2.30GHz (Haswell) CPUs 12 Simulated Time (ns/Day) The green nodes contain Single Intel 10.12 Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 CPUs + Tesla M40 (autoboost) GPUs 9.5X 8 6 4 2 1.07 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 58
Cellulose on M40s 18 PME - Cellulose_NVE 17.13 16 15.41 Running AMBER version 14 16.0X 14 The blue node contain Single Intel Xeon 14.4X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 12 The green nodes contain Single Intel 10.50 Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 9.8X CPUs + Tesla M40 (autoboost) GPUs 8 6 4 2 1.07 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 59
FactorIX on M40s 80 PME - FactorIX_NPT 72.96 Running AMBER version 14 70 67.37 13.6X The blue node contain Single Intel Xeon 60 E5-2698 v3@2.30GHz (Haswell) CPUs 12.5X Simulated Time (ns/Day) 50 The green nodes contain Single Intel 46.90 Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 40 8.7X 30 20 10 5.38 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 60
FactorIX on M40s 90 PME - FactorIX_NVE 80.04 80 Running AMBER version 14 14.6X 73.00 70 The blue node contain Single Intel Xeon 13.3X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 60 The green nodes contain Single Intel 49.33 Xeon E5-2697 v2@2.70GHz (IvyBridge) 50 CPUs + Tesla M40 (autoboost) GPUs 9.0X 40 30 20 10 5.47 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 61
JAC on M40s 250 PME - JAC_NPT 226.63 Running AMBER version 14 211.97 200 10.9X The blue node contain Single Intel Xeon 10.2X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 149.40 The green nodes contain Single Intel 150 Xeon E5-2697 v2@2.70GHz (IvyBridge) CPUs + Tesla M40 (autoboost) GPUs 7.2X 100 50 20.88 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 62
JAC on M40s 300 PME - JAC_NVE Running AMBER version 14 246.15 250 230.18 The blue node contain Single Intel Xeon 11.7X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 200 The green nodes contain Single Intel 10.9X Xeon E5-2697 v2@2.70GHz (IvyBridge) 157.68 CPUs + Tesla M40 (autoboost) GPUs 150 7.5X 100 50 21.11 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 63
Myoglobin on M40s 350 GB - Myoglobin 322.09 300.86 Running AMBER version 14 300 32.8X 30.6X The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs 250 232.20 Simulated Time (ns/Day) The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 200 23.6X CPUs + Tesla M40 (autoboost) GPUs 150 100 50 9.83 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 64
Nucleosome on M40s 18 GB - Nucleosome 16.11 16 Running AMBER version 14 14 The blue node contain Single Intel Xeon 123.9X E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 12 The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 10 9.05 CPUs + Tesla M40 (autoboost) GPUs 8 69.6X 6 4.67 4 35.9X 2 0.13 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 65
TrpCage on M40s 900 GB - TrpCage 831.91 800 Running AMBER version 14 2.03X 700 The blue node contain Single Intel Xeon E5-2698 v3@2.30GHz (Haswell) CPUs Simulated Time (ns/Day) 600 551.36 The green nodes contain Single Intel Xeon E5-2697 v2@2.70GHz (IvyBridge) 500 464.63 1.3X CPUs + Tesla M40 (autoboost) GPUs 408.88 400 1.1X 300 200 100 0 1 Node 1 Node + 1 Node + 1 Node + 1x M40 per node 2x M40 per node 4x M40 per node 66
Recommended GPU Node Configuration for AMBER Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ (1 CPU core drives 1 GPU) CPU speed (Ghz) 2.66+ System memory per node (GB) 16 GPUs Kepler K20, K40, K80, P100 1-4 # of GPUs per CPU socket GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 16x or higher Server storage 2 TB Network configuration Infiniband QDR or better Scale to multiple nodes with same single node configuration 67 67
CHARMM DOMDEC-GUI July 2016
CHARMM DOMDEC-GUI 465 K System Benchmark 4 465 K System (Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 2.15 ns/day The green nodes contain Dual Intel 2 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs 6.0X 1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 0 1 Haswell node 1 node + 1x K80 per node 69
CHARMM DOMDEC-GUI 534 K System Benchmark 2.0 534 K System (POPC_PSPC_CHL1mixture) Running CHARMM version c40a1 1.5 1.43 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs The green nodes contain Dual Intel ns/day 1.0 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs 8.0X 0.5 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.18 0.0 1 Haswell node 1 node + 1x K80 per node 70
CHARMM DOMDEC-GUI 20 K System Benchmark 80 20 K System (Crambin) Running CHARMM version c40a1 59.68 60 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs ns/day The green nodes contain Dual Intel 40 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 3.7X 20 Benchmarks were done based on the STANDARD 16.00 CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0 1 Haswell node 1 node + 1x M40 per node 71
CHARMM DOMDEC-GUI 61 K System Benchmark 35 61 K System (GlnBP) 30 *Higher is better Running CHARMM version c40a1 25.08 25 The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 20 6.4X The green nodes contain Dual Intel ns/day Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 15 10 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 5 3.90 0 1 Haswell node 1 node + 1x M40 per node 72
CHARMM DOMDEC-GUI 465 K System Benchmark 4 465 K System (Her1_HER1_membrane) Running CHARMM version c40a1 3 *Higher is better The blue node contains Dual Intel Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs 2.27 The green nodes contain Dual Intel ns/day 2 Xeon E5-2698 v3@2.30 GHz (Haswell) CPUs + Tesla M40 GPUs 6.3X 1 Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who is responsible for possible benchmarking error. 0.36 0 1 Haswell node 1 node + 1x M40 per node 73
GROMACS 2016 October 2016
Erik Lindahl (GROMACS developer) video 75
Water 1.5M on K80s 7 Water 1.5M 6.14 6 2.2X 5.22 5 Running GROMACS version 2016 1.9X The blue node contains Dual Intel Xeon 4 E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 3 2.79 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 2 (autoboost) GPUs 1 0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node 76 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 3M on K80s 4 Water 3M 3.05 3 2.3X 2.66 3 Running GROMACS version 2016 2.0X The blue node contains Dual Intel Xeon 2 E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 2 The green nodes contain Dual Intel 1.32 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla K80 1 (autoboost) GPUs 1 0 1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node 77 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 1.5M on M40s 8 7.60 Water 1.5M 7 2.7X 6.15 6 2.2X Running GROMACS version 2016 5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel 2.79 3 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs 2 1 0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node 78 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 3M on M40s 4.5 Water 3M 3.94 4.0 3.5 3.0X Running GROMACS version 2016 2.97 3.0 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 2.5 ns/day 2.3X (Broadwell) CPUs 2.0 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs 1.0 0.5 0.0 1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node 79 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 1.5M on P40s 9 Water 1.5M 8.07 8 7 6.60 2.9X Running GROMACS version 2016 6 2.4X The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 5 ns/day (Broadwell) CPUs 4 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 2.79 3 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 2 1 0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node 80 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 3M on P40s 4.5 Water 3M 4.19 4.0 3.2X 3.5 3.36 Running GROMACS version 2016 3.0 2.5X The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] 2.5 ns/day (Broadwell) CPUs 2.0 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1.5 1.32 Turbo] (Broadwell) CPUs + Tesla P40 GPUs 1.0 0.5 0.0 1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node 81 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 1.5M on P100 PCIes 8 Water 1.5M 7.11 7 2.5X 6.34 6 2.3X Running GROMACS version 2016 5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day 4 (Broadwell) CPUs The green nodes contain Dual Intel 2.79 3 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + 2 Tesla P100 PCIe (16GB) GPUs 1 0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node 82 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Water 3M on P100 PCIes 4.0 Water 3M 3.43 3.5 2.6X 3.16 3.0 2.4X Running GROMACS version 2016 2.5 The blue node contains Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] ns/day 2.0 (Broadwell) CPUs The green nodes contain Dual Intel 1.5 1.32 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + 1.0 Tesla P100 PCIe (16GB) GPUs 0.5 0.0 1 Broadwell node 1 node + 2x P100 PCIe (16GB) 1 node + 4x P100 PCIe (16GB) per node per node 83 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
GROMACS 5.1.2 February 2017
Water 1.5M on K80s 7 Water 1.5M 6 5.75 Running GROMACS version 5.1.2 5 1.9X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 4 ns/day 3.49 The green nodes contain Dual Intel 3.04 Xeon E5-2699 v4@2.2GHz [3.6GHz 3 1.1X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 2 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 1 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 85
Water 1.5M on P100s PCIe 10 Water 1.5M 8 Running GROMACS version 5.1.2 7.21 6.96 The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 6 (Broadwell) CPUs 2.4X ns/day 2.3X 4.39 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 4 Turbo] (Broadwell) CPUs + Tesla P100 3.04 1.4X PCIe (16GB) GPUs 1x P100 PCIe is paired with Single ➢ 2 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node 86
Water 1.5M on P100s SXM2 9 Water 1.5M 7.88 8 2.6X 7.18 Running GROMACS version 5.1.2 7 6.70 2.4X The blue node contains Dual Intel Xeon 6 2.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 5 ns/day 4.11 The green nodes contain Dual Intel 4 Xeon E5-2698 v4@2.2GHz [3.6GHz 1.4X 3.04 Turbo] (Broadwell) CPUs + Tesla P100 3 SXM2 GPUs 2 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 1 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x 100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 87
Water 3M on K80s 3.5 Water 3M 2.98 3.0 Running GROMACS version 5.1.2 2.5 The blue node contains Dual Intel Xeon 2.2X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.0 ns/day The green nodes contain Dual Intel 1.59 Xeon E5-2699 v4@2.2GHz [3.6GHz 1.5 1.38 1.2X Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 1.0 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 0.5 Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1x K80 per node 2x K80 per node 88
Water 3M on P100s PCIe 4.0 3.80 Water 3M 3.43 3.5 2.8X Running GROMACS version 5.1.2 3.0 2.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 2.5 (Broadwell) CPUs ns/day 1.96 2.0 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 1.38 1.5 Turbo] (Broadwell) CPUs + Tesla P100 1.4X PCIe (16GB) GPUs 1.0 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 PCIe (16GB) 2x P100 PCIe (16GB) 4x P100 PCIe (16GB) per node per node per node 89
Water 3M on P100s SXM2 4.5 Water 3M 4.0 3.82 3.50 Running GROMACS version 5.1.2 3.5 2.8X The blue node contains Dual Intel Xeon 3.0 2.5X E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs 2.5 ns/day The green nodes contain Dual Intel 2.0 1.84 Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 1.38 1.5 SXM2 GPUs 1.3X 1.0 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 0.5 [3.6GHz Turbo] (Broadwell) 0.0 1 Broadwell node 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 per node per node per node 90
Recommended GPU Node Configuration for GROMACS Computational Chemistry Workstation or Single Node Configuration # of CPU sockets 2 Cores per CPU socket 6+ CPU speed (Ghz) 2.66+ System memory per socket (GB) 32 GPUs Kepler K20, K40, K80 1x # of GPUs per CPU socket Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or high-end AMD Opterons GPU memory preference (GB) 6 GPU to CPU connection PCIe 3.0 or higher Server storage 500 GB or higher Network configuration Gemini, InfiniBand 91 91
HOOMD-Blue 1.3.3 February 2017
lj-liquid on K80s 2500 lj-liquid 1942.12 2000 Running HOOMD-Blue version 1.3.3 5.9X The blue node contains Dual Intel Xeon 1594.37 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg time steps/sec 1500 (Broadwell) CPUs 1324.84 4.9X The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 1000 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 4.1X 1x K80 is paired with Single Intel ➢ 500 Xeon E5-2699 v4@2.2GHz [3.6GHz 326.52 Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 93
lj-liquid on P100s PCIe 3500 lj-liquid 3217.68 2912.66 3000 9.9X Running HOOMD-Blue version 1.3.3 2500 8.9X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 2000 The green nodes contain Dual Intel 1500 Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs 1000 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 500 326.52 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 PCIe (16GB) 8x P100 PCIe (16GB) per node per node 94
lj-liquid on P100s SXM2 4000 lj-liquid 3397.74 3500 3129.11 Running HOOMD-Blue version 1.3.3 3000 10.4X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 2500 avg timesteps/sec 9.6X (Broadwell) CPUs 2000 The green nodes contain Dual Intel Xeon E5-2698 v4@2.2GHz [3.6GHz 1500 Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs 1000 1x P100 SXM2 is paired with Single ➢ Intel Xeon E5-2698 v4@2.2GHz 500 326.52 [3.6GHz Turbo] (Broadwell) 0 1 Broadwell node 1 node + 1 node + 1x P100 SXM2 8x P100 SXM2 per node per node 95
lj_liquid_512k on K80s 600 lj_liquid_512k 526.47 500 Running HOOMD-Blue version 1.3.3 12.1X The blue node contains Dual Intel Xeon 400 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 334.59 300 The green nodes contain Dual Intel 7.7X Xeon E5-2699 v4@2.2GHz [3.6GHz 220.10 Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs 200 5.1X 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 100 Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 96
lj_liquid_512k on P100s PCIe 1200 lj_liquid_512k 1045.50 1000 Running HOOMD-Blue version 1.3.3 24.1X The blue node contains Dual Intel Xeon 770.18 800 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec 17.7X (Broadwell) CPUs 600 534.54 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 12.3X 398.12 Turbo] (Broadwell) CPUs + Tesla P100 400 PCIe (16GB) GPUs 9.2X 1x P100 PCIe is paired with Single ➢ 200 Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 97
lj_liquid_512k on P100s SXM2 1200 1119.76 lj_liquid_512k 1000 Running HOOMD-Blue version 1.3.3 25.8X 793.36 The blue node contains Dual Intel Xeon 800 E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec 18.3X (Broadwell) CPUs 568.51 600 The green nodes contain Dual Intel 13.1X Xeon E5-2698 v4@2.2GHz [3.6GHz 443.74 Turbo] (Broadwell) CPUs + Tesla P100 400 SXM2 GPUs 10.2X 1x P100 SXM2 is paired with Single ➢ 200 Intel Xeon E5-2698 v4@2.2GHz [3.6GHz Turbo] (Broadwell) 43.43 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 SXM2 2x P100 SXM2 4x P100 SXM2 8x P100 SXM2 per node per node per node per node 98
lj_liquid_1m on K80s 350 lj_liquid_1m 303.00 300 Running HOOMD-Blue version 1.3.3 250 The blue node contains Dual Intel Xeon 13.7X E5-2699 v4@2.2GHz [3.6GHz Turbo] avg timesteps/sec (Broadwell) CPUs 200 181.42 The green nodes contain Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz 150 8.2X Turbo] (Broadwell) CPUs + Tesla K80 109.54 (autoboost) GPUs 100 5.0X 1x K80 is paired with Single Intel ➢ Xeon E5-2699 v4@2.2GHz [3.6GHz 50 Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1x K80 per node 2x K80 per node 4x K80 per node 99
lj_liquid_1m on P100s PCIe 800 lj_liquid_1m 700 672.46 Running HOOMD-Blue version 1.3.3 600 30.5X The blue node contains Dual Intel Xeon E5-2699 v4@2.2GHz [3.6GHz Turbo] 500 avg timesteps/sec 465.58 (Broadwell) CPUs 400 The green nodes contain Dual Intel 21.1X Xeon E5-2699 v4@2.2GHz [3.6GHz 294.88 300 Turbo] (Broadwell) CPUs + Tesla P100 13.4X PCIe (16GB) GPUs 204.67 200 9.3X 1x P100 PCIe is paired with Single ➢ Intel Xeon E5-2699 v4@2.2GHz 100 [3.6GHz Turbo] (Broadwell) 22.07 0 1 Broadwell node 1 node + 1 node + 1 node + 1 node + 1x P100 PCIe 2x P100 PCIe 4x P100 PCIe 8x P100 PCIe (16GB) per node (16GB) per node (16GB) per node (16GB) per node 100
Recommend
More recommend