µDPM: Dynamic Power Management for the Microsecond Era Chih-Hsun Chou Laxmi N. Bhuyan Daniel Wong cchou001@cs.ucr.edu bhuyan@cs.ucr.edu danwong@ucr.edu
Computer systems efficiently support . . . ns ms events µ s Killer Masked by Masked by microarchitectural OS-level techniques techniques Microsecond HPCA 2019 2
The Killer Microseconds “System designers can no longer ignore efficient support for Killer microsecond-scale I/O ... Novel Microseconds microsecond-optimized system stacks are needed ” [1] [1] Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. Attack of the killer microseconds. Commun. ACM 60, 4 (March 2017), 48-54. HPCA 2019 3
Computer systems cannot efficiently support Microsecond-scale service time Traditional Monolithic Services Request Response ~milliseconds HPCA 2019 4
Computer systems cannot efficiently support Microsecond-scale service time Emerging Microservices Request Response microseconds HPCA 2019 5
Microservice Example ~700 microservices Source: Adrian Cockcroft, “Monitoring Microservices and Containers: A Challenge” HPCA 2019 6
Implications of Killer Microsecond service time on Dynamic Power Management? HPCA 2019 7
Opportunity for DPM – Latency Slack › Slow down tail latency SLA request processing 120% Tail Latency (% of SLA) (DVFS) 100% 80% › Delay request processing (Sleep) 60% Latency Slack 40% 20% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Load (% of peak load) HPCA 2019 8
Dynamic Power Management Overview › DVFS (Rubik [MICRO’15] , Pegasus [ISCA’14] ) › Rubik adjusts f per request f R0 R3 R2 R1 R4 t › DVFS + Sleep › SleepScale [ISCA’14] finds optimal frequency & C-state depth for 60s epochs f R0 R1 R2 R3 R4 t Epoch 0 Epoch 1 HPCA 2019 9
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time R 1 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time Target Tail Latency R 1 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 1 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 1 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time R 2 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 2 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 2 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time R 3 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10
Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time HPCA 2019 10
DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik DVFS+Sleep SleepScale Deep Sleep DynSleep 39 36.25 Power (W) 33.5 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11
DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik DVFS+Sleep SleepScale Deep Sleep DynSleep 39 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11
DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11
DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 › <250µs: DPM becomes ineffective 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11
DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 › <250µs: DPM becomes ineffective 36.25 D Power (W) P M I n › Surprisingly, sleep-based policies 33.5 e f f e c outperform DVFS-based policies t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11
Fragmented idle periods à Lost Opportunities 50% utilization Longer Service Time 50% utilization Shorter Service Time HPCA 2019 12
Fragmented idle periods à Lost Opportunities DVFS (200 µ s) DVFS (500 µ s) › Short service times Sleep (200 µ s) Sleep (500 µ s) fragment idle periods HPCA 2019 13
Fragmented idle periods à Lost Opportunities DVFS (200 µ s) DVFS (500 µ s) › Short service times Sleep (200 µ s) Sleep (500 µ s) fragment idle periods › Sleep states / request delaying can consolidate idle periods HPCA 2019 13
Significant transition overheads and idle power Busy Idle C-state tran. VFS tran. Idleness and Baseline transition Rubik overhead still Sleepscale account for up to ~25% of DynSleep energy Optimal 0.6 0.68 0.76 0.84 0.92 1 Normalized Energy HPCA 2019 14
DPM inefficiencies Tail Latency Target = 800 µ s Tail Service Time = 78 µ s Arrival R1 DVFS limited in R0 R1 closing Latency Gap t C0 Wasted Energy Wasted C3 Energy C6 C6 Residency Time = 300 µ s * SPECjbb timing HPCA 2019 15
DPM inefficiencies Tail Latency Target = 800 µ s Solution: Aggressive Deep Sleep Tail Service Time = 78 µ s Arrival R1 DVFS limited in R0 R1 closing Latency Gap t C0 Wasted Energy Solution: Wasted Solution: C3 Coordinate DVFS Energy Request Delaying C6 C6 Residency Time = 300 µ s * SPECjbb timing HPCA 2019 15
Key Insight DVFS Baseline Baseline Rubik DVFS+Sleep Deep Sleep SleepScale DynSleep µDPM 39 Careful coordination of DVFS, Sleep state, 36.25 and request delaying Power (W) 33.5 is the key to effective DPM with 30.75 microsecond service times 28 10 100 1000 Avg service time(us) HPCA 2019 16
µDPM › Aggressively Deep Sleep › Delay and slow down request processing to finish just-in-time, even under microsecond request service times › Carefully coordinating DVFS, Sleep, and request delaying Tail Latency Target = 800 µ s Solution: Solution: Aggressive Deep Sleep Tail Service Time = 78 µ s Arrival Coordinate DVFS R1 R0 R1 t C0 C3 Solution: Request Delaying C6 Residency Time = 300 µ s HPCA 2019 17
Can latency-critical workloads utilize deep sleep states? Memcached 33µs Tail Service Time (95 th percentile) SPECjbb 78µs Start Masstree 250µs Xapian 1200µs HPCA 2019 18
Can latency-critical workloads utilize deep sleep states? Memcached 33µs 150µs Target Tail Latency Tail Service Time (95 th percentile) (95 th percentile) SPECjbb 78µs 800µs Start Masstree 250µs 1100µs Xapian 1200µs 2100µs HPCA 2019 18
Can latency-critical workloads utilize deep sleep states? Memcached 33µs 150µs Target Tail Latency Tail Service Time (95 th percentile) (95 th percentile) SPECjbb 78µs 800µs Start Masstree 250µs 1100µs Xapian 1200µs 2100µs Opportunity HPCA 2019 18
Aggressive deep sleep and request delaying › Wakeup after residency time R0 Arrival Req R0 t C0 C3 C6 Residency Time = 300 µ s › Wakeup before residency time if needed to meet tail latency HPCA 2019 19
Recommend
More recommend