computing
play

Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, - PowerPoint PPT Presentation

Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Mller Center for Computing and Communication JARA High-Performance Computing RWTH Aachen University Rechen- und


  1. Accelerators in Technical Computing: Is it Worth the Pain? A TCO Perspective Sandra Wienke, Dieter an Mey, Matthias S. Müller Center for Computing and Communication JARA – High-Performance Computing RWTH Aachen University Rechen- und Kommunikationszentrum (RZ)

  2. Agenda  Introduction  Modeling  Total Cost of Ownership (TCO)  Comparison Metrics  Case Study on Accelerators  Programming Models & System Types  TCO Components @ RWTH  Real-World Application  Results  Conclusion & Outlook TCO of Accelerators 2 Sandra Wienke | Center for Computing and Communication

  3. Introduction  Today: Varity of HPC clusters  Usage of accelerators (NVIDIA GPU, Intel Xeon Phi) motivated by promising performance per watt ratio  System comparison by performance or performance per watt not sufficient for purchase decision  Total costs of ownership (TCO)  Acquisition costs, housing, operation costs,..  Inclusion of manpower costs (administration & programming)  Comparison of costs per program run (application-dependent)  Investigation of a real-world software package  OpenMP on Intel Sandy Bridge Impact of manpower effort/  OpenMP + LEO on Intel Xeon Phi programming model?  OpenCL, OpenACC on NVIDA Fermi GPU TCO of Accelerators 3 Sandra Wienke | Center for Computing and Communication

  4. Modeling – Total Cost of Ownership (TCO)  Basis: single compute node  extrapolate to cluster amount 𝑜: number of nodes  𝐉𝐨𝐰𝐟𝐭𝐮𝐧𝐟𝐨𝐮 𝑱 = 𝐔𝐃𝐏 𝒐, 𝝊 = 𝑫 𝒑𝒖 (𝒐) + 𝑫 𝒒𝒃 (𝒐) ∙ 𝝊 𝜐: system lifetime  One-time costs C ot  Per node: HW acquisition, building/infrastructure, OS/ env. installation  Per node type: OS/ env. installation, programming effort  Annual costs C pa  Per node: HW maintenance, building/infrastructure, OS/ env. maintenance, power consumption  Per node type: OS/ env. maintenance, compiler/software, application maintenance  TCO depends on architecture & application TCO of Accelerators 4 Sandra Wienke | Center for Computing and Communication

  5. Modeling – Comparison Metrics  Costs per program run C ppr 𝑜 ∶ number of nodes 𝜐 ∶ system lifetime  Includes investment/ TCO & application performance 𝑜 𝑓𝑦 ∶ #app. executions 𝐷 𝑞𝑞𝑠 𝑜, 𝜐 = TCO(𝑜, 𝜐) 𝑜 𝑓𝑦 (𝜐) ∙ 𝑜 with 𝑜 𝑓𝑦 𝜐 = 𝑙 ∙ 𝜐 𝑙 ∶ system usage rate 𝑢 𝑞𝑏𝑠 : parallel runtime 𝑢 𝑞𝑏𝑠  Used baseline for system X: Intel Sandy Bridge (SNB) + OpenMP 𝐷 𝑞𝑞𝑠,𝑌 𝑜 𝑌 , 𝜐 − 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐 < 0 ≥ 0 𝑗𝑔 𝑌 𝑃𝑁𝑄 beneficial 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐  Break-even investments  Min. budget needed so that system X beneficial over OpenMP on SNB  Solve for 𝐽 with given fixed lifetime 𝜐 : 𝐷 𝑞𝑞𝑠,𝑌 𝑜 𝑌 , 𝜐 − 𝐷 𝑞𝑞𝑠,𝑃𝑁𝑄 𝑜 𝑃𝑁𝑄 , 𝜐 = 0 with TCO 𝑜, 𝜐 = 𝐽 TCO of Accelerators 5 Sandra Wienke | Center for Computing and Communication

  6. Case Study on Accelerators – Programming Models & System Types Programming Model Accelerator Host Compiler Serial 2x Intel Sandy Bridge, Intel 13.0.1 OpenMP 16 cores, 2 GHz (simple, vectorized) Intel Xeon Phi LEO + OpenMP Intel 13.0.1 5110P, 60 cores 1x Intel Westmere, OpenACC NVIDIA Tesla PGI 12.9 4 cores, 2.4 GHz C2050 (Fermi), OpenCL Intel 13.0.1 ECC on TCO of Accelerators 6 Sandra Wienke | Center for Computing and Communication

  7. Case Study on Accelerators – TCO Components @ RWTH  One-time costs  HW purchase: list prices from Bull  Building/infrastructure: as annual costs since it is amortized over 25 years  OS/env. installation: -  Programming effort: Full-time employee costs 285.71 € a day  Annual costs  HW maintenance: 5% of HW purchase costs  Building/infrastructure: 200,000 € per year; costs per node: division by 1.6MW; multiplication by max. power consumption of each node  OS/env. maintenance: 4 admins, 75% maintenance cluster (~2300 nodes): 180,000 € / 2300 = 78 € per node and year  Software/compiler: -  Power: PUE 1.5, regional electricity costs 0.15 € /kWh  Application maintenance: - (small kernels)  Given lifetime of 4 years & investment  C ppr  #nodes, #executions (usage rate 80%) TCO of Accelerators 7 Sandra Wienke | Center for Computing and Communication

  8. Case Study on Accelerators – Real-World Application  Basis  Serial version  Small kernel  Assumption: homogeneous app. landscape  KegelSpan 2 Source: BMW, ZF, Klingelnberg  3D simulation of bevel gear cutting process  Kernel artificially increased from 25% to 90% TCO of Accelerators 2 C. Brecher, C. Gorgels, and A. Hardjosuwito. Simulation based Tool Wear Analysis in 8 Sandra Wienke | Center for Computing and Communication Bevel Gear Cutting. In International Conference on Gears, volume 2108.2 of VDI- Berichte, pp.1381 – 1384, Düsseldorf, VDI Verlag, 2010.

  9. Case Study on Accelerators – TCO Components of Application 180 250 OpenCL (GPU) 158  power consumption [W] 160 OpenACC (GPU) 140 200 140 119 OpenMP+LEO (Phi) runtime [s] 120 OpenMP-vec (SNB) 150 100 OpenMP-simp (SNB) 80 100 60 40 50 20 0 0 6 5.0 effort [days] 4.5 3.5 4 1.5 2 0.5 0 TCO of Accelerators 9 Sandra Wienke | Center for Computing and Communication

  10. Case Study on Accelerators – Results 20% costs per program run (relative to OMP-simp) OpenCL (GPU) OpenACC (GPU) 10% OpenMP+LEO (Phi) 3.62% 0% OpenMP-vec (SNB) -10% -12.09% -16.82% -20% -17.15% 0 € 100K € 200K € Investment 10,000 € break-even investment 7,787 7,231 5,000 € 1,809 0 € TCO of Accelerators 10 Sandra Wienke | Center for Computing and Communication

  11. Conclusion  Are accelerators beneficial? “It depends”  TCO spreadsheet 1 for own computations available  Our results (w/ 90% kernel portion) show SNB-OMP (4 years, 250 K € )  GPU Fermi beneficial over 2-socket Intel SNB server -17% C ppr + 4% C ppr  Intel Xeon Phi results disappointing for now  Mainly due to high acquisition costs  NVIDIA Kepler probably similar  Programming effort impacts break-even investment (see OpenACC  OpenCL)  Bigger codes: increase of kernel size ~ increase of break-even invest.  Projections possible (e.g. hybrid codes) 1 Wienke, S., an Mey, D., Müller, M.S.: Accelerators for Technical TCO of Accelerators 11 Computing: Is it Worth the Pain? TCO Spreadsheet. https://sharepoint. Sandra Wienke | Center for Computing and Communication campus.rwth-aachen.de/units/rz/HPC/public/Shared%20Documents/ WienkeEtAl_Accelerators-TCO-Perspective.xlsx, 2013

  12. Outlook  Hybrid code implementation (cmp to projections)  Model extensions  New programming models & architectures (OpenMP 4.0, NVIDIA Kepler)  Network communication (MPI)  Mixed job execution (heterogeneous application landscape)  Assessment of decrease in runtime/ gaining more results  Comprehensive TCO calculation with predictive powers  Performance, power consumption, manpower  Towards exascale computing, architectures might get more complex  More difficult to manage & program Thank you for  Impact of manpower effort might get stronger your attention! TCO of Accelerators 12 Sandra Wienke | Center for Computing and Communication

Recommend


More recommend