dpm dynamic power management for the microsecond era
play

DPM: Dynamic Power Management for the Microsecond Era Chih-Hsun - PowerPoint PPT Presentation

DPM: Dynamic Power Management for the Microsecond Era Chih-Hsun Chou Laxmi N. Bhuyan Daniel Wong cchou001@cs.ucr.edu bhuyan@cs.ucr.edu danwong@ucr.edu Computer systems efficiently support . . . ns ms events s Killer


  1. µDPM: Dynamic Power Management 
 for the Microsecond Era Chih-Hsun Chou 
 Laxmi N. Bhuyan 
 Daniel Wong 
 cchou001@cs.ucr.edu bhuyan@cs.ucr.edu danwong@ucr.edu

  2. 
 Computer systems efficiently support . . . ns ms events µ s Killer 
 Masked by 
 Masked by 
 microarchitectural 
 OS-level 
 techniques techniques Microsecond HPCA 2019 2

  3. The Killer Microseconds “System designers can no longer ignore efficient support for Killer 
 microsecond-scale I/O ... Novel Microseconds microsecond-optimized system stacks are needed ” [1] [1] Luiz Barroso, Mike Marty, David Patterson, and Parthasarathy Ranganathan. 
 Attack of the killer microseconds. Commun. ACM 60, 4 (March 2017), 48-54. HPCA 2019 3

  4. Computer systems cannot efficiently support Microsecond-scale service time Traditional Monolithic Services Request Response ~milliseconds HPCA 2019 4

  5. Computer systems cannot efficiently support Microsecond-scale service time Emerging Microservices Request Response microseconds HPCA 2019 5

  6. Microservice Example ~700 microservices Source: Adrian Cockcroft, “Monitoring Microservices and Containers: A Challenge” HPCA 2019 6

  7. Implications of Killer Microsecond service time on Dynamic Power Management? HPCA 2019 7

  8. Opportunity for DPM – Latency Slack › Slow down 
 tail latency SLA request processing 
 120% Tail Latency (% of SLA) (DVFS) 100% 80% › Delay request processing 
 (Sleep) 60% Latency Slack 40% 20% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Load (% of peak load) HPCA 2019 8

  9. Dynamic Power Management Overview › DVFS (Rubik [MICRO’15] , Pegasus [ISCA’14] ) › Rubik adjusts f per request f R0 R3 R2 R1 R4 t › DVFS + Sleep › SleepScale [ISCA’14] finds optimal frequency & C-state depth for 60s epochs f R0 R1 R2 R3 R4 t Epoch 0 Epoch 1 HPCA 2019 9

  10. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time HPCA 2019 10

  11. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time R 1 arrives HPCA 2019 10

  12. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) sleep time Target Tail Latency R 1 arrives HPCA 2019 10

  13. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 1 arrives HPCA 2019 10

  14. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 1 arrives HPCA 2019 10

  15. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time R 2 arrives HPCA 2019 10

  16. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 2 arrives HPCA 2019 10

  17. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 2 arrives HPCA 2019 10

  18. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time R 3 arrives HPCA 2019 10

  19. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10

  20. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10

  21. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time Target Tail Latency R 3 arrives HPCA 2019 10

  22. Dynamic Power Management Overview › Sleep states (PowerNap [ASPLOS’09] , Dreamweaver [ASPLOS’12] , DynSleep [ISLPED’16] , CARB [CAL ’17] ) Wake sleep time HPCA 2019 10

  23. DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik DVFS+Sleep SleepScale Deep Sleep DynSleep 39 36.25 Power (W) 33.5 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11

  24. DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik DVFS+Sleep SleepScale Deep Sleep DynSleep 39 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11

  25. DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at 
 DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11

  26. DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at 
 DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 › <250µs: DPM becomes ineffective 36.25 D Power (W) P M I n 33.5 e f f e c t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11

  27. DPM ineffective w/ microsecond service time Baseline DVFS Baseline Rubik › >250µs: DVFS effective at 
 DVFS+Sleep SleepScale Deep Sleep DynSleep slowing down request processing 39 › <250µs: DPM becomes ineffective 36.25 D Power (W) P M I n › Surprisingly, sleep-based policies 
 33.5 e f f e c outperform DVFS-based policies t i v e 30.75 28 10 100 1000 Avg service time(us) HPCA 2019 11

  28. Fragmented idle periods à Lost Opportunities 50% 
 utilization Longer Service Time 50% 
 utilization Shorter Service Time HPCA 2019 12

  29. Fragmented idle periods à Lost Opportunities DVFS (200 µ s) DVFS (500 µ s) › Short service times 
 Sleep (200 µ s) Sleep (500 µ s) fragment idle periods HPCA 2019 13

  30. Fragmented idle periods à Lost Opportunities DVFS (200 µ s) DVFS (500 µ s) › Short service times 
 Sleep (200 µ s) Sleep (500 µ s) fragment idle periods › Sleep states / 
 request delaying can 
 consolidate 
 idle periods HPCA 2019 13

  31. Significant transition overheads and idle power Busy Idle C-state tran. VFS tran. Idleness and Baseline transition Rubik overhead still Sleepscale account for up to ~25% of DynSleep energy Optimal 0.6 0.68 0.76 0.84 0.92 1 Normalized Energy HPCA 2019 14

  32. DPM inefficiencies Tail Latency Target = 800 µ s Tail Service Time = 78 µ s Arrival R1 
 DVFS limited in 
 R0 R1 closing Latency Gap t C0 Wasted 
 Energy Wasted 
 C3 Energy C6 C6 Residency Time = 300 µ s * SPECjbb timing HPCA 2019 15

  33. DPM inefficiencies Tail Latency Target = 800 µ s Solution: 
 Aggressive Deep Sleep Tail Service Time = 78 µ s Arrival R1 
 DVFS limited in 
 R0 R1 closing Latency Gap t C0 Wasted 
 Energy Solution: 
 Wasted 
 Solution: 
 C3 Coordinate DVFS Energy Request Delaying C6 C6 Residency Time = 300 µ s * SPECjbb timing HPCA 2019 15

  34. Key Insight DVFS Baseline Baseline Rubik DVFS+Sleep Deep Sleep SleepScale DynSleep µDPM 39 Careful coordination of 
 DVFS, Sleep state, 
 36.25 and request delaying 
 Power (W) 33.5 is the key to 
 effective DPM with 
 30.75 microsecond service times 28 10 100 1000 Avg service time(us) HPCA 2019 16

  35. µDPM › Aggressively Deep Sleep › Delay and slow down request processing to finish just-in-time, even under microsecond request service times › Carefully coordinating DVFS, Sleep, and request delaying Tail Latency Target = 800 µ s Solution: 
 Solution: 
 Aggressive Deep Sleep Tail Service Time = 78 µ s Arrival Coordinate DVFS R1 
 R0 R1 t C0 C3 Solution: 
 Request Delaying C6 Residency Time = 300 µ s HPCA 2019 17

  36. Can latency-critical workloads utilize deep sleep states? Memcached 33µs Tail Service Time 
 (95 th percentile) SPECjbb 78µs Start Masstree 250µs Xapian 1200µs HPCA 2019 18

  37. Can latency-critical workloads utilize deep sleep states? Memcached 33µs 150µs Target Tail Latency 
 Tail Service Time 
 (95 th percentile) (95 th percentile) SPECjbb 78µs 800µs Start Masstree 250µs 1100µs Xapian 1200µs 2100µs HPCA 2019 18

  38. Can latency-critical workloads utilize deep sleep states? Memcached 33µs 150µs Target Tail Latency 
 Tail Service Time 
 (95 th percentile) (95 th percentile) SPECjbb 78µs 800µs Start Masstree 250µs 1100µs Xapian 1200µs 2100µs Opportunity HPCA 2019 18

  39. Aggressive deep sleep and request delaying › Wakeup after residency time R0 Arrival Req R0 t C0 C3 C6 Residency Time = 300 µ s › Wakeup before residency time if needed to meet tail latency HPCA 2019 19

Recommend


More recommend