accurate emulation of cpu performance
play

Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum - PowerPoint PPT Presentation

Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy Grand Est 2 LORIA / Nancy - Universit e Validation of distributed systems Approaches: Theoretical approach (paper and pencil) the most


  1. Accurate emulation of CPU performance Tomasz Buchert 1 Lucas Nussbaum 2 Jens Gustedt 1 1 INRIA Nancy – Grand Est 2 LORIA / Nancy - Universit´ e

  2. Validation of distributed systems Approaches: Theoretical approach (paper and pencil) � the most general results and understanding � very hard (leads to unsolvability results) Experimentation (real application on a real environment) � realistic context, credibility � difficulty of preparation and control, questionable reproducibility Simulation (modeled application inside modeled environment) � very simple and perfectly reproducible � experimental bias, possibly unrealistic Emulation (real application inside a modeled environment) � control over the experiment parameters � difficult Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 2 / 20

  3. Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Performance and number of CPUs Memory capabilities Background noise (network, CPU, faults) Already implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster (but not perfect yet!) In this talk, however, we specifically concentrate on Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 3 / 20

  4. Emulation The perfect emulated environment should emulate (independently): Network bandwidth, latency, topology Performance and number of CPUs Memory capabilities Background noise (network, CPU, faults) Already implemented in Wrekavoc – a tool to define and control heterogeneity of the cluster (but not perfect yet!) In this talk, however, we specifically concentrate on Emulation of CPU Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 3 / 20

  5. CPU emulation Various elements of CPU architecture could be emulated: speed number of cores sizes and properties of caches (and topology thereof) memory access speed (especially for NUMA systems) In this talk, we will talk about Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 4 / 20

  6. CPU emulation Various elements of CPU architecture could be emulated: speed number of cores sizes and properties of caches (and topology thereof) memory access speed (especially for NUMA systems) In this talk, we will talk about Degradation of CPU speed Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 4 / 20

  7. An example Unused Unused Unused Unused 70 % 50 % 50 % 30 % CPU 1 CPU 2 CPU 3 CPU 4 (1) controlling speed of each CPU/core independently Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 5 / 20

  8. An example (continued) Unused Unused Unused Unused 70 % 50 % 50 % 30 % CPU 1 CPU 2 CPU 3 CPU 4 (2) being able to create separated scheduling zones Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 6 / 20

  9. Dynamic frequency scaling (CPU-Freq) AKA Intel Enhanced SpeedStep or AMD Cool’n’Quiet Hardware solution to reduce: heat noise power usage For: no overhead of emulation completely unintrusive meaningful CPU time measure Against: only a finite set of different frequency levels Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 7 / 20

  10. CPU-Lim Method available in Wrekavoc Algorithm: if CPU usage ≥ threshold → send SIGSTOP to the process if CPU usage < threshold → send SIGCONT to the process CPU usage = CPU time of the process process lifetime For: easy and almost POSIX-compliant Against: intrusive and unscalable decision based on one process instead of global CPU usage sleeping is indistinguishable from preemption Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 8 / 20

  11. Fracas Based on idea from KRASH (load injection tool) idea Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time CPU burner CPU burner CPU burner Emulated Emulated processes processes Emulated processes Core 1 Core 2 Core 3 Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 9 / 20

  12. Fracas Based on idea from KRASH (load injection tool) idea Uses Linux Cgroups and Completely Fair Scheduler A predefined portion of the CPU is given to tasks burning CPU All other processes are given the remaining CPU time For: unintrusive scalable Against: unportable to other systems sensitive to the configuration of the scheduler Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 9 / 20

  13. Fracas and latency of the scheduler 7.0 6.5 6.0 GFLOP / s 5.5 5.0 0.1 ms 1 ms 4.5 10 ms 4.0 100 ms 3.5 1000 ms 3.0 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] The smaller the latency, the better the emulation Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 10 / 20

  14. Evaluation Based on different types of work: CPU intensive (Linpack benchmark) IO bound multiprocessing multithreading memory speed (STREAM benchmark) X-axis – emulated frequency Y-axis – speed perceived by the benchmark each test repeated 10 times, results = average 95% confidence interval using t-Student distribution Evaluation performed on Grid’5000 platform nodes with two quad-core Intel Xeon X5570 processors nodes with a pair of single-core AMD Opteron 252 processors Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 11 / 20

  15. Grid’5000 9 sites, 1600 machines Lille, Rennes, Orsay, Nancy, Bordeaux, Lyon, Grenoble, Toulouse, Sophia Dedicated to research on distributed systems and HPC Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 12 / 20

  16. CPU intensive work 6.5 6.0 5.5 GFLOP / s 5.0 4.5 4.0 CPU-Freq 3.5 CPU-Lim1 Fracas 3.0 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] CPU-Lim is less predictable (the outcome has higher variance) Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 13 / 20

  17. IO-bound work 6000 5500 Loops / s 5000 4500 4000 CPU-Freq CPU-Lim1 3500 Fracas 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] CPU-Lim gives (unfair) advantage to IO-bound tasks Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 14 / 20

  18. Multiprocessing 10000 CPU-Freq CPU-Lim1 8000 Fracas Loops / s 6000 4000 2000 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] Fracas can’t emulate CPU for multitask computation Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 15 / 20

  19. Multithreading 10000 CPU-Freq CPU-Lim1 8000 Fracas Loops / s 6000 4000 2000 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] CPU-Lim controls processes instead of scheduling entities Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 16 / 20

  20. Memory speed 11000 10000 9000 GB / s 8000 7000 CPU-Freq CPU-Lim1 6000 Fracas 5000 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 CPU Frequency [GHz] Memory speed is affected differently by each method Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 17 / 20

  21. Summary of the evaluation CPU-Freq: very good results coarse granularity CPU-Lim: not scalable due to implementation, intrusive higher variance controls processes, not threads Fracas: good behavior for a single-task workload scalable bad behavior for multitask workload Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 18 / 20

  22. Future work Explore other approaches Improve Fracas to cover multitasking Emulate memory bandwidth Emulate other aspects of CPU Integrate Fracas into Wrekavoc Take over the world :) Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 19 / 20

  23. Conclusions Presented Fracas, a method for CPU performance emulation based on Linux cgroups Compared with CPU-Freq and CPU-Lim (Wrekavoc) Evaluated experimentally on Grid’5000 None of the methods is perfect: CPU-Freq: coarse grained CPU-Lim: implementation problems, not scalable Fracas: works perfectly in single thread/process case, needs work in multithread/process case Questions? Tomasz Buchert, Lucas Nussbaum and Jens Gustedt Accurate emulation of CPU performance 20 / 20

Recommend


More recommend