hd gp gpu systems for hpc applications
play

HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps - PowerPoint PPT Presentation

Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur , & Christopher Kung Center for Computational Science | Section Head (Acting) Code 5594 Productivity Enhancement,


  1. Introduction Implementation HD GP-GPU Systems for HPC Applications: Engines | SAR | RF Amps Sergio Tafur † , & Christopher Kung ‡ † Center for Computational Science | Section Head (Acting) Code 5594 ‡ Productivity Enhancement, Technology Transfer and Training On-Site at NRL DISTRIBUTION A . Approved for public release: distribution unlimited. GPU Technology Conference | April 2016 S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC April 1, 2016

  2. Introduction Implementation Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  3. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  4. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications Synthetic Amperture Radar | RF Amps | RDEs Synthetic Aperture Radar Investigating the Use of GPU-Accelerated Nodes for SAR Image Formation, IEEE Int. Conf. on Cluster Computing and Workshops, 1-8, 31, 2009. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  5. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | Radio Frequency Amplifiers | RDEs RF Amplifiers Simulation of Klystrons With Slow and Reflected Electrons Using Large-Signal Code TESLA IEEE Transactions on Electron Devices, 54(6), 1555-1561, 2007. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  6. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Applications SAR | RF Amps | Rotating Detonation Engines Rotating Detonation Engines Thermodynamic Modeling of a Rotating Detonation Engine AIAA Paper 2011-803, 49 th AIAA Aerospace Sciences Meeting, 2011. S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  7. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  8. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Configuration | FDR Infiniband Fat-Tree(ish) S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  9. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Configuration | FDR Infiniband Fat-Tree(ish) S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  10. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | Exxact Quantum TXR410-768R Mother board: 2x Intel E5-2600 v3 4x PLX PEX 8747 switch Configuration: 8x Titan Black 128 GB DDR4 Memory http://www.tyan.com/datasheets/DataSheet_FT77A-B7059.pdf http://tyan.com/manuals/FT77C-B7079_QIG.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  11. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | Exxact Quantum TXR410-768R Mother board: 2x Intel E5-2600 v3 4x PLX PEX 8747 switch Configuration: 8x Titan Black 128 GB DDR4 Memory http://www.tyan.com/datasheets/DataSheet_FT77A-B7059.pdf http://tyan.com/manuals/FT77C-B7079_QIG.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  12. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 GTX TITAN Black GPU Engine Specs: 2880 CUDA Cores/960 DP Units 889 Base Clock (MHz) 980 Boost Clock (MHz) GTX TITAN Black Memory Specs: 7.0 Gbps Memory Clock 6144 MB Standard Memory Config 336 Memory Bandwidth (GB/sec) https://forums.geforce.com/default/topic/531846/geforce-gtx-titan-is-here-/ http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  13. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  14. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 Nvidia Graphics Processing Unit GK 110 w/ 15 SMX @ 250W eight w/ 2880 sp cores @ 0.98/1.12 GHz SP Perf: 8 x 2.8 TFlops SP Eff: 11.3 GFlops/W eight w/ 960 dp cores @ 0.98/1.12 GHz DP Perf: 8 x 1.11 TFlops DP Eff: 4.44 GFlops/W http://www.nvidia.com/content/pdf/kepler/nvidia-kepler-gk110-architecture-whitepaper.pdf http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-black/specifications S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  15. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK 110 Die 15 Streaming Multiprocessor (SMX) Architecture Units 1536 kB L2 Cache six 64-bit Memory controllers 3 SP cores / 1 DP Unit http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  16. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Hardware | NVIDIA GK110 NVIDIA | GK110 SMX Unit 192 SP Cores 64 DP Units 64 kB on chip memory 48kB shared / 16kB L1 16kB shared / 48kB L1 32kB shared / 32kB L1 http://international.download.nvidia.com/pdf/kepler/NVIDIA-Kepler-GK110-GK210-Architecture-Whitepaper.pdf S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  17. Introduction SAR | RF Amps | RDEs Implementation y Objecitve Distributed Architecture Y Objective Distributed Architecture Expected Performance GTX Titan Black Single Precision: 2.8 TFlops Double Precision: 922 TFlops Server Single Precision: 22.4 TFlops Double Precision: 7.4 TFlops Y.O.D.A. Single Precision: 1.4 PFlops Double Precision: 477 TFlops S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  18. Introduction Benchmarks Implementation Challenges Introduction 1 Applications: SAR | RF Amps | RDEs y Objective Distributed Architecture (Y.O.D.A.) Implementation 2 Benchmarked Performance Challenges S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  19. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs CuBLAS - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  20. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GEMM S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  21. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  22. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 2 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  23. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 4 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  24. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 8 GPUs MAGMA - S | D | C | Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  25. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Performance: 2/4/8 GPUs MAGMA - Z - GESV S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  26. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Single GPU Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

  27. Introduction Benchmarks Implementation Challenges Y Objective Distributed Architecture Benchmarked Aggregate Performance: HPL Top 500 Run S. Tafur, & C. Kung | DoDI 5230.24: Distribution Statement A. HD GP-GPU HPC

Recommend


More recommend