towards energy efficient reactive thermal management in
play

Towards Energy-Efficient Reactive Thermal Management in Instrumented - PowerPoint PPT Presentation

Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters Ivan Rodero 1 , Eun Kyung Lee 1 , Dario Pompili 1 , Manish Parashar 1 , Marc Gamell 2 , Renato J. Figueiredo 3 1 NSF Center for Autonomic Computing, Rutgers


  1. Towards Energy-Efficient Reactive Thermal Management in Instrumented Datacenters Ivan Rodero 1 , Eun Kyung Lee 1 , Dario Pompili 1 , Manish Parashar 1 , Marc Gamell 2 , Renato J. Figueiredo 3 1 NSF Center for Autonomic Computing, Rutgers University, NJ, USA 2 Open University of Catalonia, Barcelona, Spain 3 NSF Center for Autonomic Computing, University of Florida, FL, USA Energy Efficient Grids, Clouds and Clusters, Brussels , October 26, 2010

  2. Agenda  Context and Motivation  Datacenter Thermal Management  Energy Efficiency and Tradeoffs  Evaluation Methodology  Results  Next Steps  Conclusions 2

  3. Energy-Efficient Autonomic Management for High Performance Computing Workloads Cross-infrastructure Power Management Application-aware Cloud Cloud Application/ Controller ¡ Observer ¡ Sensor Actuator Workload (private, (private, public, public, Controller ¡ Observer ¡ Virtualization Sensor Actuator hybrid, etc.) hybrid, etc.) Controller ¡ Observer ¡ Resources Sensor Actuator Physical Controller ¡ Observer ¡ Sensor Actuator Environment Cross-layer Power Management Instrumented infrastructure Virtualized  Goal : Autonomic (self-monitored and self-managed) computing systems:  Optimizing energy efficiency while ensuring Quality of Service delivered ( performance ) 3

  4. Cross-Layer Architecture Observer ¡ Correla3ons ¡ Observer ¡1 ¡ Global Application/ Local Sensor 1 Controller Actuator Workload Controller Applica3on ¡req. ¡profiles ¡ Observer ¡2 ¡ Local Virtualization Sensor 2 Actuator 1 Controller VM ¡efficiency ¡ Observer ¡ Controls ¡ Observer ¡3 ¡ Local Resources Actuator 2 Sensor 3 Controller Resource ¡performance ¡ Observer ¡4 ¡ Physical Local Sensor 4 Actuator 3 Environment Controller Environment ¡predic3on ¡ Managed ¡Environment ¡ Observer’s ¡ ¡Sensing ¡Port ¡ Request ¡Flow ¡ Actuator ¡ Informa3on ¡Flow ¡ Resource ¡Flow ¡ Sensor ¡ Controller’s ¡Command ¡ 4

  5. Cross-Layer Energy-Efficient Autonomic Management  Abnormal operational state detection  Distributed Online Clustering (e.g., workload)  Physical sensing physical layer (e.g., thermal hotspots)  Reactive and proactive approaches  Reacting to anomalies to return to steady state  Predict anomalies in order to avoid them QoS Different paths for reaching steady operational state QoS Abnormal Energy state Efficiency ? Cross-layer actions Thermal Energy efficiency Efficiency QoS Steady Thermal State (AUTONOMIC) efficiency Energy … Efficiency Thermal efficiency 5

  6. Interactions between Autonomic Components Scheduling Global controller Observer (correlations) Provisioning and maping Workload Characterization Pinning Trading with (e.g., DOC) HPC Workload 3 rd parties VM Migration Proactive Reactive configuration Cooperate Environment Component-level monitoring Power (temperature, etc.) Management Reactive Resources configuration Depend monitoring Designs for (load, power, etc.) aggressive power management 6

  7. Datacenter’s Thermal Behavior 8 6 Temperature [C] 4 2 0 � 2 15 60 10 40 5 20 0 0 Node Number Time [min] 8 6 Temperature [C] 4 Temporal correlation of the measured 2 temperature under different workload 0 � 2 distributions 15 60 10 40 5 20 0 0 Node Number Time [min] 8

  8. Reacting to Thermal Hotspots 60 Steady Internal Server Temperature Environment (Hotspot) 55 50 Temperature (C) 45 40 35 Reaction : VM migration 30 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (s) 260 240 220 Power (W) 200 Correlation between server’s 180 temperature and power 160 140 120 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Time (s) 9

  9. Thermal Management Approaches The lower power dissipated �  Assumption: � � � � The lower heat generated cpu ≈ C × α × V 2 × f P  Reducing the activity factor ( α )  VM Migration : move a VM to another server  May reduce CPU activity, but also memory activity, etc.  Potentially may result in lower CPU frequency if OS support  Overhead (suspend, transfer data, resume, etc.)  Requires availability in another server (impact on the target server) 10

  10. Thermal Management Approaches (2)  Reducing the activity factor ( α )  Pinning (in Xen platform) : affinity in VCPUs – PCPUs mapping  CPUs without VMs running on them  OS power management may result in automatic DVFS  Penalty on the performance (resource sharing)  Reducing the frequency/voltage of CPUs ( V 2 × f )  Processor DVFS  Penalty on the performance (in general higher response time)  Different possibilities  Different frequencies/voltages  Applied to all CPUs/cores or to a subset 11

  11. Goals and Tradeoffs  Goal: selection of appropriate technique to mitigate the effects of thermal hotspots  Energy-Efficient  Lower energy consumption  Lower maximum/average power dissipation  Driven by optimization requirements. Examples:  Reduce temperature 5 o C (based on a threshold)  A penalty of up to 10% on response time is acceptable  There are well known tradeoffs between p erformance and energy efficiency  But also we need to consider other dimensions such as thermal efficiency (temperature) 12

  12. Goals and Tradeoffs (2)  Example: Tradeoff between temperature and performance of pinning 4 VMs into different PCPUs 13

  13. Evaluation Methodology  Server configuration:  Two servers based on Intel quad-core Xeon processors  Operate at four frequencies ranging from 1.6GHz to 2.4GHz (but only 3 available under Xen 3.1)  CentOS Linux operating system with a patched 2.6.18 kernel with Xen version 3.1  Additional hardware:  A “Watts Up? .NET” power meter to empirically measure “instantaneous” power  Accuracy of ±1.5% of the measured power with sampling rate of 1Hz  TelosB motes to measure both internal (not sensors embedded into the CPU) and external temperatures  A Sunbeam SFH111 heater (directed to the servers) in order to emulate a thermal hotspot  Workload: HPL linpack

  14. Energy Consumption “Estimation”  Use case: No running VMs in target server before migration 15

  15. Results 55 46 260 Reference Reference Reference Migrate 1VM Migrate 1VM Migrate 1VM Migrate 2VMs External Temperature o C 44 Migrate 2VMs Migrate 2VMs Internal Temperature o C 240 Migrate 3VMs Migrate 3VMs Migrate 3VMs 50 42 220 External Temperature External Temperature 40 Power (W) 45 Power (W) 200 38 180 40 36 160 34 35 140 32 Correlation between 30 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 55 Time (s) Time (s) Time (s) 4CPUs at 2.40GHz 46 260 Internal Temperature o C 2CPUs at 1.60GHz 4CPUs at 2.40GHz 4CPUs at 2.40GHz internal and external 4CPUs at 2.13GHz External Temperature o C 2CPUs at 1.60GHz 2CPUs at 1.60GHz 4CPUs at 1.60GHz 44 4CPUs at 2.13GHz 4CPUs at 2.13GHz 240 50 4CPUs at 1.60GHz 4CPUs at 1.60GHz Correlation between temperature 42 External Temperature 220 External Temperature 45 40 Power (W) Power (W) 200 temperature and power 38 40 180 36 160 34 35 140 32 30 0 200 400 600 800 1000 1200 1400 1600 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 Time (s) 55 46 260 Time (s) Time (s) Reference Reference Reference Pinning VMs to 3CPUs Pinning VMs to 3CPUs Pinning VMs to 3CPUs Pinning VMs to 2CPUs External Temperature o C 44 Pinning VMs to 2CPUs Pinning VMs to 2CPUs Internal Temperature o C 240 Pinning VMs to 1CPU Pinning VMs to 1CPU Pinning VMs to 1CPU 50 42 220 External Temperature Internal Temperature DVFS: using 2 CPUs at 1.6 GHz 40 Power (W) 45 Power (W) 200 38 presents similar results than using 4 180 40 36 160 CPU at 2.13 GHz 34 35 140 32 30 30 120 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 1200 1400 1600 Time (s) Time (s) Time (s) 16

  16. Results (2) 17

  17. Next Steps  Autonomic VM allocation and reactive technique decision  Cross-layer design approach  Examples: component-level power management, workload clustering, etc.  Application-aware (workload characterization into CPU-, I/O-, etc. intensive )  Optimization targets based on self-monitoring  Models are required  VM migration, DVFS (work presented in this presentation)  VM allocation (#VMs, workload characteristics, combinations, etc.)  Preliminary results based on a brute force algorithm  Models at the server and datacenter level 18

  18. Conclusions  Tradeoffs exist between:  Performance  Energy efficiency  Thermal efficiency of reactive thermal management techniques for HPC workloads  Pinning is an effective mechanism to react to thermal anomalies under certain conditions  In addition to VM migration  In contrast to DVFS  Different mechanisms’ behaviors observed depending on the system characteristics and optimization goals.  Autonomic decision making is required  Cross-layer designs should improve datacenter’s management 20

  19. Thank you! Energy E ffi cient High Performance Computing Initiative Center for Autonomic Computing, Rutgers University http://nsfcac.rutgers.edu/GreenHPC/ 21

Recommend


More recommend