automatic management of turbomode
play

Automatic Management of TurboMode David Lo Christos Kozyrakis - PowerPoint PPT Presentation

Automatic Management of TurboMode David Lo Christos Kozyrakis Stanford University http://mast.stanford.edu Executive Summary ! TurboMode overclocks cores to exhaust thermal budget ! An important performance feature of multi-core x86 servers !


  1. Automatic Management of TurboMode David Lo Christos Kozyrakis Stanford University http://mast.stanford.edu

  2. Executive Summary ! TurboMode overclocks cores to exhaust thermal budget ! An important performance feature of multi-core x86 servers ! Challenge: turbomode does not always benefit workloads ! Naively turning TurboMode on often leads to high energy waste ! Solution: predictive model to manage TurboMode (on/off) ! Using machine learning on performance counter data ! Eliminates negative cases, boosts ED and ED 2 by 47% and 68% 2 HPCA-20 February 19, 2014

  3. What is TurboMode (TM)? ! Dynamic overclocking of cores to exhaust thermal budget ! Matches actual power consumption to max design TDP ! Big performance gains: up to 60% frequency boost ! Found on all modern x86 multi-cores ! TurboMode control ! Black-box HW control decides when and how much to overclock ! SW has limited control: can only turn TurboMode on/off 3 HPCA-20 February 19, 2014

  4. Characterizing TurboMode ! Evaluate the effects of TM across the board ! Efficiency metrics: EDP, ED 2 P, throughput/W, throughput/$, … ! Many hardware platforms: Intel/AMD, server/notebook ! Many workloads: SpecCPU, SpecPower, websearch, … ! Characterization ! Run with TurboMode on and TM off ! Compare impact on all of efficiency metrics 4 HPCA-20 February 19, 2014

  5. Efficiency Metrics ! Guidelines ! We all care about performance and energy consumption ! Capture both latency and throughput workloads ! Metric recap ! ED : latency & energy ! ED 2 : latency & energy, more weighted towards latency (think servers) ! Throughput/W : throughput & energy ! Throughput/$ : throughput & cost efficiency (think datacenter TCO) 5 HPCA-20 February 19, 2014

  6. Evaluation Hardware ! Intel Sandy Bridge server [SBServer] : 19% max boost ! Intel Sandy Bridge mobile [SBMobile] : 44% max boost ! AMD Interlagos [ILServer] : 59% max boost ! Intel Ivy Bridge server [IBServer] : 12% max boost ! Intel Haswell server [Hserver] : 13% max boost 6 HPCA-20 February 19, 2014

  7. Evaluation Workloads ! Representative of multiple domains ! CPU, memory, and IO workloads ! Single-threaded SpecCPU benchmarks ! Multi-programmed SpecCPU mixes ! Multi-threaded PARSEC >100 configs ! Enterprise SPECpower_ssj2008 ! Websearch 7 HPCA-20 February 19, 2014

  8. Observation: No Optimal On/Off Setting Sandy$Bridg Sandy$ ridge$Se Server r In Interlag erlagos os$S $Ser erver er Sandy$ Sandy$Bridg ridge$Mo Mobile bile 127% 75% $off TurboMode$off 50% 25% ment$over$Tu 0% C25% mproveme C50% Mix$1 Mix$2 Websearch %$imp Mix$1 Mix$2 Websearch %$ ED ED ED² ² Wo Workload Wo Workload Wo Workload QPS/W QP QPS/$ QP 8 HPCA-20 February 19, 2014

  9. Observation: TM leads to High Variance on Efficiency Sandy$Bridge$Server$ED² ² 30% 20% ment mproveme 10% ~50% mixes benefit from TM 0% ~50% mixes suffer due to TM ²$imp C10% ED²$ C20% C30% Ap App$M $Mix x 1 82 9 HPCA-20 February 19, 2014

  10. Characterization Analysis ! TurboMode mostly benefits CPU bound workloads ! Boost in performance and efficiency from higher frequency ! SpecCPU mixes of CPU-intensive workloads, SpecPower, websearch, … ! TurboMode ineffective when memory/IO bound ! Interference on memory/IO really aggravates this ! Small/no performance gain, high energy waste with higher frequency ! SpecCPU mixes of memory-intensive workloads, canneal, streamcluster, … ! Applications have multiple phases ! CPU bound vs. memory/IO bound ! SpecCPU mixes 10 HPCA-20 February 19, 2014

  11. TurboMode Control ! Naïve TM control ! Always off: miss boost on CPU bound applications ! Always on: suffer inefficiency on interference-bound applications ! Need dynamic TM control ! Understands applications running and metric of interest ! Predicts optimal setting (on/off), adjust dynamically to phases ! No a priori knowledge of applications, no new hardware needed 11 HPCA-20 February 19, 2014

  12. Predictive Model for TurboMode ! Idea: use runtime info to dynamically predict TM benefits ! Focus primarily on detecting memory interference ! Build predictive model based on performance counters ! Use performance counters & model to predict interference severity ! If too severe, turn off TurboMode 12 HPCA-20 February 19, 2014

  13. Autoturbo: Predictive Control for TurboMode Sample perf Core 1 Core 2 counters per core App N App N App N Perf Classifier Perf Perf Training data Core Core N N-1 App properties per core Enable/disable TurboMode TurboMode TM on/off heuristic Metric 13 HPCA-20 February 19, 2014

  14. Training the Predictive Model Raw training data Feature selection Model selection Single SpecCPU, Single SPECCPU, 85% Naïve Bayes TurboMode on TurboMode on Single SPECCPU, Single SpecCPU, 81% Logistic Regression TurboMode off TurboMode off Single SPECCPU Single SpecCPU 73% Nearest Neighbors +stream, TurboMode on +stream, TurboMode on Single SpecCPU Single SPECCPU 75% Decision Tree +stream, TurboMode off +stream, TurboMode off 14 HPCA-20 February 19, 2014

  15. Model Validation ! Model accuracy : ~90% on cross-validation ! Best counters: those that indicate memory-bound workload ! SBServer/SBMobile : % cycles with outstanding memory requests, … ! ILServer : L2 MPKI, # requests to memory/instruction, … ! CPU/thermal intensity counters don’t correlate strongly! ! E.g., floating-point intensity counters 15 HPCA-20 February 19, 2014

  16. Autoturbo Evaluation ! Used autoturbo in conjunction with workloads ! Evaluation workloads are apps other than single-thread SpecCPU ! Measure efficiency metrics ! Compare against ! Baseline: TurboMode is always off ! Naïve TM : TurboMode is always on ! Static oracle : TurboMode on if leads to benefit for the overall run 16 HPCA-20 February 19, 2014

  17. Autoturbo results Sandy Bridge Mobile QPS/$ Sandy Bridge Server ED ² 10% 40% Gains over QPS/$ improvement ED ² improvement never using 5% 20% TurboMode 0% 0% -5% -20% -10% -40% App Mix App Mix Gains over always using TurboMode Naïve Auto Naïve Auto 1 35 1 82 Static Oracle Static Oracle 17 HPCA-20 February 19, 2014

  18. Autoturbo Analysis ! Autoturbo gets best of both worlds ! Reduces cases where TM causes efficiency degradation ! Keeps cases where TM leads to benefits ! autoturbo often disables TM even though it is beneficial ! Cause : the interference predictor assumes worst case interference ! autoturbo beats the static oracle ! Cause : autoturbo can take advantage of dynamism during the run 18 HPCA-20 February 19, 2014

  19. Conclusions ! TurboMode is useful but must be managed dynamically ! This work: dynamic TurboMode control ! Predictive model for memory interference ! Dynamic control with no hand-tuning needed ! Eliminates efficiency drops, maintains efficiency gains of TurboMode ! Future work ! Apply similar approach to manage advanced power settings 19 HPCA-20 February 19, 2014

  20. autoturbo dealing with a phase change autoturbo dynamic adjustment on Sandy Bridge Mobile 3.00 Frequency (GHz) 2.90 Memory interference 2.80 occurs mid-workload 2.70 2.60 2.50 2.40 215 235 255 275 295 Time (s) 20 HPCA-20 February 19, 2014

Recommend


More recommend