Star-Cap: Cluster Power Management Using Software-Only Models John D. Davis Suzanne Rivoire (rivoire@sonoma.edu) Moisés Goldszmidt (Microsoft Research) ICPP Workshop on Power-aware Algorithms, Systems, and Architectures (PASA) Sept. 10, 2014
Power capping motivation o Reduce waste from overprovisioning o Provision for actual maximum power instead of sum of nameplate power o Have a mechanism to throttle power consumption o Major server manufacturers offer this feature; Intel offers at chip level (RAPL) [Femal ICAC ‘05 , Ranganathan ISCA ‘06, Lefurgy ICAC ’07 … ] 2
The problem with vendor solutions o Additional management hardware, additional cost or limited to chip o Compare to trend of customized bare- bones servers … o … and “wimpy nodes” for data-intensive workloads Goal: eliminate cost of hardware instrumentation 3
Outline o Star-Cap overview o Software-only power models o Power capping schemes o Evaluation 4
Two-level scheme Power( Management( Policies Machine(1 Machine(N Power( Power( Power( Power( Control Model Model Control ! o Top level: determine node power budgets o Node level: enforce and report 5
Sensors and Actuators o Sensors: OS-level, architecture- independent performance counters o Actuators: n For this work, DVFS states n Nothing prevents other mechanisms from being used 6
Outline o Star-Cap overview o Software-only power models o Power capping schemes o Evaluation 7
OS-level counters OS-level Node AC f(x) counters power o Full-system, not a specific component o OS-level, architecture-independent counters o Piecewise quadratic model, fit with MARS [Davis et al., IISWC ‘12] 8
Model training process 1 ETW (Event Tracing for Windows) n Architecture counters: ~250 n Processor, physical and logical disk, network, memory, filesystem 2 Remove redundant counters: ~45 n Correlation Matrix (> |0.95|) n Performance counter definitions 3 Select features: ~10 n R glmpath with L1 regularization n Stepwise refinement 9
Outline o Star-Cap overview o Software-only power models o Power capping schemes o Evaluation 10
Star-Cap Overview o Inputs to all schemes n Target node-level power consumption (set at top level) n Current power (modeled or measured) n List of available frequency states o Outputs n List of frequency states available to OS n Let current OS policy select from available states 11
Threshold-based o If P current < P lo n Make the next highest frequency state available o If P current > P hi n Remove highest frequency state from available list o Our thresholds: n P hi = 95% of cap n P lo = 90% of cap 12
Reactive Capping (ReCap) o Adjust frequency state based on P current o After making a change, wait for it to settle before making another (reduce oscillations) o Three versions: n M-ReCap: P current is measured power n L-ReCap: P current is predicted by a CPU- utilization-based linear model n C-ReCap: P current is predicted by quadratic power model in previous section 13
Proactive Capping (ProCap) o Use quadratic power model to predict P current o Before changing available frequencies, predict P next n Using next allowable frequency state n Keeping all other counters constant (oversimplification!) o If P next would violate threshold, don’t bother adjusting available frequencies 14
Outline o Star-Cap overview o Software-only power models o Power capping schemes o Evaluation 15
Workloads Primes Staticrank o Primes (CPU) o Staticrank (Net) o Sort (Disk, Net) o Wordcount (Disk) Sort Wordcount o All run across 5 homogeneous nodes 16
Hardware Systems Cluster Intel Core 2 Duo AMD Opteron (server) (laptop) CPU Intel Core 2 Duo X2 AMD Opteron 2X4 2.26 GHz 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2 17
Hardware Systems Cluster Intel Core 2 Duo AMD Opteron (server) (laptop) CPU Intel Core 2 Duo X2 AMD Opteron 2X4 2.26 GHz 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2 18
Hardware Systems Cluster Intel Core 2 Duo AMD Opteron (server) (laptop) CPU Intel Core 2 Duo X2 AMD Opteron 2X4 2.26 GHz 2.0 GHz Storage SSD HDD Idle Power (W) 25 135 Dyn Power range (W) 20 55 OS Windows Server 2008 R2 4 frequency states: 100%, 94%, 82%, 70% 19
Power profiles 50 50 NodeD02 Node-02 No)Frequency)Cap NodeD03 Node-03 94%)Frequency)Cap 45 45 Node-04 NodeD04 82%)Frequency)Cap Node-05 NodeD05 Server)Power)(W) 40 40 70%)Frequency)Cap Power&(W) 35 35 30 30 25 25 { WordCount Prime Sort PageRank 20 20 1 2 3 1 3601 7201 10801 Time&(Hrs) ! If DVFS is the only actuator, some power budgets will be much easier to deal with than others. 20
Reactive capping: modeled vs. measured power 50 500 700 M+ReCap C+ReCap ProCap 45 450 650 40 400 600 35 350 550 30 300 500 25 250 450 o Low power cap (38 W) o Graph shows 1 node o Blue: ReCap based on measured power o Gray: ReCap based on model power 21
Reactive vs. proactive capping (A) 40 200 400 35 150 350 30 100 300 25 50 250 WordCount Prime Sort PageRank 0 0 200 (B) 0 10 20 30 40 50 60 o Same power cap ! o Blue: ReCap based on measured power o Purple: ProCap 22
Higher power cap 50 50 50 50 MEReCap LEReCap Linear,model,,window Measured,power,,window ProCap Cluster,model,,prediction 45 45 45 45 Power,Cap,Threshold Power,Cap 40 40 40 40 Power&(W) Server,Power,(W) 35 35 35 35 30 30 30 30 25 25 25 25 WordCount Prime WordCount Prime WordCount Prime (A) (B) (C) 20 0 20 20 20 0 100 200 0 100 200 100 200 1 101 201 1 101 201 1 101 201 Time(s) Time(s) Time(s) Time&(s) Time&(s) Time&(s) Figure'4 ."42W"power"cap"examples"for"WordCount"and"Primes"using"the"Reactive"Power"Capping"with"(A)"(M?ReCap)"measured"power"and" o 42W cap o Left: M-Recap Center: L-Recap Right: ProCap o Model accuracy matters! 23
Conclusion o Demonstrated the potential of high- accuracy, software-only models for server- level power capping o Suitable for low-power, low-cost “wimpy nodes” o Extensible to other power management hooks and policies 24
Backup slides 25
Dynamic Range Error Report error as a percent of the dynamic range – idle power shouldn’t count. Max Dynamic power Range Idle power Cluster Power 26
Model Accuracy 16% 14% Model Features CPU utilization 12% Average DRE CPU utilization and MHz 10% Cluster specific General 8% 6% 4% 2% 0% Linear Piecewise Linear Quadratic Switching Linear Modeling Techniques 27
Model Features o Automatically selected from over 200 OS counters o Processor: utilization, frequency o Memory: cache faults/sec; pool nonpaged allocations o Disk: total disk time % o Filesystem and virtual memory: file system pin read/sec, peak page file bytes 28
Recommend
More recommend