the computational sprinting game
play

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi - PowerPoint PPT Presentation

The Computational Sprinting Game Songchun Fan , Seyed Majid Zahedi , Benjamin C. Lee { songchun.fan, seyedmajid.zahedi, benjamin.c.lee } @duke.edu [ Co-First Authors] Computational Sprinting Supply extra power to enhance performance


  1. The Computational Sprinting Game Songchun Fan ∗ , Seyed Majid Zahedi ∗ , Benjamin C. Lee { songchun.fan, seyedmajid.zahedi, benjamin.c.lee } @duke.edu [ ∗ Co-First Authors]

  2. Computational Sprinting • Supply extra power to enhance performance for short durations • Activate more cores, boost voltage/frequency 2 / 25

  3. Computational Sprinting • Supply extra power to enhance performance for short durations • Activate more cores, boost voltage/frequency Non−sprinting Sprinting Average Temperature (°C) 1.5 Normalized Speedup 6 50 Normalized Power 5 40 1.0 4 30 3 20 2 0.5 10 1 0 0.0 0 e n t m r s s n k c e e n t m r s s n k c e e n t m r s s n k c e n a n n a n l n c v o n l o n c l v o a n l o n c l v o a o l e v e a g a g e v e g i i a i a i e v e a a a i i a i a a s i s n t n a s i s t i n s i s n t n d e a r i n r d e a r n i e a n i d e a n i i e a c a l i m l c i e a c a l m l e i a l m l e e g i e r g r e e g i r r k r r r g k r a t d g a t d g k r a t d r r r o p o p o p c c c 2 / 25

  4. Sprinting Architecture • Power for sprints supplied by shared rack • Heat from sprints absorbed by thermal packages Fig. www.fortlax.se and Raghavan, Arun, et al. ”Computational sprinting on a hardware/software testbed.” 3 / 25

  5. Power Emergencies 3600 L o n g - d e l a y C o n v e n t i o n a l T r i p p i n g Duration of Current Draw (sec) Non-deterministic S h o r t C i r c u i t • Sprints may trip breaker 120 • Current ↑ with sprinters T olerance Band 2 P =1 trip P =0 trip Tripped • Time ↑ with sprint duration 0.1 • Risk ↑ with current, time Not Tripped 1 2 3 5 10 20 Current Normalized to Rated Current Fig. Fu, Wang, and Lefurgy. ”How much power oversubscription is safe and allowed in data centers.” 4 / 25

  6. Uninterruptible Power Supplies • When sprints trip breaker, draw on batteries • When sprints complete, recharge batteries Fig. www.amper-ecuador.com 5 / 25

  7. Example – Private Clouds • Applications compute on servers that share power • Processors sprint independently • Processors sprint selfishly for performance Fig. Google, www.lasknet.net 6 / 25

  8. Sprinting Management When should processors sprint? • Phases with higher performance from sprints • But sprints prohibited as chip cools Which processors should sprint? • Processors that benefit most from sprints • But sprints prohibited as batteries recover 7 / 25

  9. Management Desiderata Individual Performance • Sprints account for phase behavior • Sprints now constrain future sprints System Stability • Sprints account for others’ sprinting strategies • Sprints risk power emergencies 8 / 25

  10. Sprinting Strategy • Optimize sprints given constraints • Sprint, wait ∆ cooling for chip cooling • Sprint, wait ∆ recovery for rack recovery if breaker trips ● 8 ● Utility from Sprint ● ● ? ● ● ● ● 7 ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● 0 5 10 15 20 25 30 Epoch 9 / 25

  11. Sprinting Strategy • Optimize sprints given constraints • Sprint, wait ∆ cooling for chip cooling • Sprint, wait ∆ recovery for rack recovery if breaker trips × ● × 8 × ● Utility from Sprint ● × × ● ● ● ● ● 7 ● ● × × ● ● ● ● ● × × ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● 0 5 10 15 20 25 30 Epoch 9 / 25

  12. Game Theory Study strategic agents • Agents selfishly maximize individual utility Optimize responses • Response maximizes utility, given others’ strategies Find equilibrium • State where all agents play their best responses 10 / 25

  13. Sprinting Game States • Active – can sprint • Cooling – cannot sprint, chip cooling • Recovery – cannot sprint, batteries recharging Actions • Sprint or not, when active Strategies • Agent’s state, app’s phase, history, ... • Others’ strategies, utilities, and states, ... 11 / 25

  14. Mean Field Equilibrium (MFE) Challenges • Large system with many agents • Complex strategies and many competitors • Intractable optimization for best response Solution • Abstract many agents with statistical distributions • Optimize agents’ strategies against expectations 12 / 25

  15. Equilibrium Strategy Agents maximize expected value of (not) sprinting • Current state • Utility from sprinting, u • Probability of tripping, P trip Agents employ threshold strategy • If active and u ≥ u T , then sprint 13 / 25

  16. Find Equilibrium – Offline • Initialize probability of breaker trip P trip • Given P trip , optimize threshold strategy u T • Given u T , estimate number of sprinters N • Given N , update probability P ′ trip • Iterate if P ′ trip � = P trip 14 / 25

  17. Execute Strategy – Online If active and u ≥ u T , then sprint 15 / 25

  18. Sprinting Thresholds Linear Regression PageRank 0.4 0.3 0.20 Density Density 0.0 0.1 0.2 0.10 0.00 2 4 5 3 6 0 5 10 15 Utility from Sprint Utility from Sprint • Thresholds are optimal and diverse • Agents behave strategically to maximize performance 16 / 25

  19. Management Architecture Coordinator Alg 1 Pro fi le Strategy User User User Agent Agent Agent Predictor Predictor Predictor . . . Executor Engine Executor Engine Executor Engine T ask T ask T ask • Offline : coordinator profiles utility, optimizes thresholds • Online : predictors estimate sprint utility • Online : agents apply threshold strategy • Online : executor adapts computation 17 / 25

  20. Experimental Methodology Sprinting • 3 cores @1.2GHz → 12 cores @ 2.7GHz Workloads • Apache Spark • Spark engine dynamically schedules tasks on active cores Performance Metric • Tasks completed per second (TPS) Simulation Method • R-based simulator using traces of Spark computation 18 / 25

  21. Management Policies Greedy • Sprint if neither cooling nor recovering Exponential Back-off • Sprint if neither cooling nor recovering • Wait randomly for U[0, 2 k ] epochs after k th trip Cooperative Threshold • Enforce globally optimized threshold Equilibrium Threshold • Announce decentralized, strategic threshold 19 / 25

  22. Case for Equilibria + Equilibrium Cooperative Performance + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate 20 / 25

  23. Case for Equilibria + Equilibrium Cooperative Performance - + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate • Cooperative (–): enforce strategies globally 20 / 25

  24. Case for Equilibria + + Equilibrium Cooperative Performance - + Stability • Cooperative (+): maximize global performance • Equilibrium (+): remove incentives to deviate • Cooperative (–): enforce strategies globally • Equilibrium (+): maximize individual performance 20 / 25

  25. Sprinting Behavior 600 Greedy 300 Number of Sprinting Users 0 600 Exponential Backoff 300 0 600 Cooperative Threshold 300 0 600 Equilibirum Threshold 300 0 0 200 400 600 800 1000 Epoch Index 21 / 25

  26. Sprinting Performance Performance (Normalized to Greedy) Greedy 6 Exponential Backoff Equilibrium Threshold 5 Cooperative Threshold 4 3 2 1 0 e n m s s n k c e t r n a v o n l o n c l e v e a g i a a a i i s i s n t n d e a r n i e a c a i m l l e e g i r r k r d g a t r o p c • Greedy – aggressive, incurs emergencies • Exponential – conservative, untimely sprints • Equilibrium – strategic, produces equilibrium • Cooperative – optimal, requires enforcement 22 / 25

  27. Game States Active (not sprinting) Global recovery Local cooling Sprinting 100% 75% 50% 25% 0% Greedy Exponential Equilibrium Cooperative • Greedy – time in recovery • Exponential – untimely sprints • Equilibrium – timely sprints • Cooperative – timely sprints 23 / 25

  28. Conclusion Management with game theory • Agents sprint according to threshold – inexpensive • Agents have no incentives to deviate – stable • Agents optimize response – high performance Future directions • Use game theory to manage scarce resources • E.g., big/small processors, accelerators 24 / 25

  29. Thank you Questions? 25 / 25

Recommend


More recommend