Virtual Melting Temperature: Managing Server Load to Minimize Cooling Overhead with Phase Change Materials Matt Skach 1 , Manish Arora 2,3 , Dean Tullsen 3 , Lingjia Tang 1 , Jason Mars 1 University of Michigan 1 -- Advanced Micro Devices, Inc. 2 -- UC San Diego 3 ISCA ‘18
Datacenters Huge warehouses full of servers that host the internet and the cloud Facebook Ireland Datacenter Facebook datacenter 2
Datacenters Cooling ● Heat must be removed to prevent: Overheating ○ ○ Thermal downclocking ○ Component failure http://www.asetek.com/media/1031/rackcdu_d2c_datacenter.jpg 3
Global Energy Consumption (CIA World Factbook) Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 4 India 1,300 5 Russia 1,000 6 Japan 980 7 Canada 640 4
Datacenter Energy Consumption (Avgerinou, 2017) Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 Datacenters (global, est.) 1,600 4 India 1,300 5 Russia 1,000 6 Japan 980 7 Canada 640 5
Datacenter Energy Consumption (Avgerinou, 2017) Energy Consumption Electricity Consumption (TWh/year) 1 China 6,100 2 United States 4,100 3 European Union 3,100 Datacenters (global, est.) 1,600 4 India 1,300 5 Russia 1,000 6 Japan 980 Datacenter Cooling (global, est.) 650 7 Canada 640 6
Datacenter Cooling ● Datacenter cooling is very expensive ○ Infrastructure can cost 10s of millions of dollars for large DCs (Kontorinis, 2014) ○ Generally, more power efficient systems are more expensive up front Open Compute cooling system 7
Datacenter Workloads Google Search: US Load ● Diurnal load is problematic ○ Work is uneven Work is distributed ○ ○ Heat is produced when work is done 8
Datacenter Cooling ● Build a big cooling system for peak load Underutilized most of the time ○ Expensive 100% coverage, low utilization 9
Datacenter Cooling ctd. ● Build a big cooling system for peak load Underutilized most of the time ○ Expensive 100% coverage, low utilization 10
Datacenter Cooling ctd. ● Build a big cooling system for peak load Underutilized most of the time ○ Expensive 100% coverage, low utilization Best 50% coverage, maximum utilization 11
Thermal Time Shifting (TTS) [ISCA ‘15] Coupled Decoupled Release heat Cooling Load during off hours Store heat to flatten peak 3am 7am 7pm 12am Time 12
Cooling Load ● Metric of heat that must be removed ● Datacenter is primarily concerned with IT & support equipment http://www.slideshare.net/spsu/12-cooling-load-calculations 13
A Phase Change Material (PCM) ● Store energy in a Solid->Liquid phase change ● Commercial paraffin wax offers the best properties of currently available PCMs (Skach, 2015) 14
The problem with passive TTS Thermal Time Shifting: Paraffin has a limited range of melting temperatures ● ● Melting temperature cannot be changed ● Power and temperature profiles vary over lifetime of servers Wikimedia Commons 15
Virtual Melting Temperature ● Datacenters need more flexibility ● Create a “virtual” melting temperature separate from the actual melting temperature 16 Microsoft, Wikimedia Commons
Test Infrastructure ● 2U High Throughput Server ● 2-day Google Workload trace divided between 5 datacenter workloads 17
Test Methodology ● 5 common datacenter workloads 1. Web Search 2. Data Caching 3. Video Encoding 4. Virus Scan 5. Clustering ● Consider datacenter where all are colocated ○ Contention mitigation techniques applied (eg. Bubble Up (Mars, 2011) and Protean Code (Laurenzano, 2014) ) 18
Baseline: Load Balancing Schedulers ● Round Robin and Coolest First 19
Baseline: Load Balancing Schedulers ● Round Robin and Coolest First ● Problem: Average cluster temperature is too low to melt wax
Thermal Aware VMT ● Categorize jobs based upon thermal characteristics Binary classification: Would they melt significant wax in isolation? ○ 21
Thermal Aware VMT ● Grouping Value (GV): Controllable ratio of group size Proportional to hot group size ○ Locate ‘hot jobs’ together in ‘hot group’ to melt wax ● 22
Thermal Aware VMT Results ● Hot Group sized to melt wax during peak hours 23
Thermal Aware VMT Results ● Balance between melting wax too soon and not melting enough wax GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small 24
Thermal Aware VMT Results ● Balance between melting wax too soon and not melting enough wax GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small 25
Wax Aware VMT ● Begin with same setup as VMT-TA ● When wax in hot group is fully melted, expand hot group 26
Wax Aware VMT Results ● Hot Group slightly too small: automatically expands during peak load 27
Wax Aware VMT Results ● Wax expansion preserves significant cooling load reduction GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small 28
Wax Aware VMT Results ● Wax expansion preserves significant cooling load reduction GV=24: Hot group is too big GV=22: Hot group is just right GV=20: Hot Group is too small 29
VMT-TA vs. VMT-WA ● Both work well at ideal GV ● VMT-WA offers much more flexibility for unpredictable load Smaller Bigger Hot Group Hot Group 30
Summary ● VMT stores thermal energy when passive TTS alone cannot Reduces maximum cooling load of a diurnal workload ○ ○ Configurable for varying datacenter power and load levels VMT-enabled thermal energy storage can: ● ○ Reduce cooling system size 12% ○ Or allow up to 14% more servers under the same cooling budget 31
Thank you! 32
Questions? 33
Recommend
More recommend