wattwatcher fine grained power estimation for emerging
play

WattWatcher: Fine-Grained Power Estimation for Emerging Workloads - PowerPoint PPT Presentation

SBAC-PAD 2015 SBAC-PAD 2015 WattWatcher: Fine-Grained Power Estimation for Emerging Workloads Michael LeBeane, Jee Ho Ryoo , Reena Panda, Lizy K. John The University of Texas at Austin jr45842@utexas.edu SBAC-PAD 2015 Motivation


  1. SBAC-PAD 2015 SBAC-PAD 2015 WattWatcher: Fine-Grained Power Estimation for Emerging Workloads Michael LeBeane, Jee Ho Ryoo , Reena Panda, Lizy K. John The University of Texas at Austin jr45842@utexas.edu

  2. SBAC-PAD 2015 Motivation ▪ Understanding power at a fine- granularity is still a challenge Sample Points – Thermal effects, DVFS policies Watts ▪ Not always easy for Core 0 Time researchers OTHER OTHER 4% 3% ▪ Simple and detailed power MC MC 8% 6% OoO OoO 17% estimation is extremely useful 13% ALU L3 L3 ALU 9% 12% 16% 13% Fetch L2 9% ▪ Some methods currently Fetch 17% 12% L2 LS LS 31% 13% 17% available... 2 Michael LeBeane 10/20/2015

  3. SBAC-PAD 2015 Currently Available Methods ▪ Direct Measurements [1] – Hardware probes ▪ Curve Fitting [2,3,4,5,6] – Machine learning models ▪ Power PMCs [6,7] – E.g. Intel RAPL ▪ Simulators[9,10,11,12] – E.g. McPAT plugins to simulation environment 3 Michael LeBeane 10/20/2015

  4. SBAC-PAD 2015 Design Space ▪ Diverse design space to explore (subjective taxonomy) Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++ - ~us-ms - Fast Power PMCs + - ~ms = Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow 4 Michael LeBeane 10/20/2015

  5. SBAC-PAD 2015 Design Space ▪ Diverse design space to explore (subjective taxonomy) Accuracy Detail Frequency Cost ($) Speed Direct Measurements ++ - ~us-ms - Fast Power PMCs + - ~ms = Fast Curve Fitting = = ~us-s + Fast/Offline Training Simulators + + ~ns + Slow WattWatcher + + ~ms + Fast ▪ WattWatcher offers functional-unit power breakdowns in real-time, on real hardware 5 Michael LeBeane 10/20/2015

  6. SBAC-PAD 2015 WattWatcher Overview ▪ Online / Real-Time SUT Workload Performance Counters … System Configuration Access Estimator … ▪ MCPAT-based Configurable Package ALU Power Model Logic Registers OoO L1$ L2$ L3$ Fetch ▪ Configurable OTHER MEM 4% Core 0 8% OoO Core 2 17% L3 ALU Power 12% Core 3 13% ▪ Low Overhead L2 Core 4 FE LS 17% 12% 17% Time 6 Michael LeBeane 10/20/2015

  7. SBAC-PAD 2015 WattWatcher Overview 7 Michael LeBeane 10/20/2015

  8. SBAC-PAD 2015 WattWatcher Hardware Events ▪ Hardware performance Category Hardware Event General Context Switches counters feed McPAT Frequency Voltage Cycles Frontend Branch Mispredictions ▪ Some low-level McPAT events IC Misses iTLB Misses unavailable from counters uops Issued LS/Caches L1 Misses/Hits L2 Misses LLC Misses dTLB Misses ▪ Unavailable statistics estimated Execution FP Scalar from available counters FP Packed FP Width Retirement Uops Retired 8 Michael LeBeane 10/20/2015

  9. SBAC-PAD 2015 WattWatcher Toolkit Overview Processor Hardware Counter Hardware SUT Analyzer Events Descriptor Descriptor Events Power Analysis WattWatcher Analyzer Network WattWatcher Collector Connection Input Formatter Operating HW Event Customized McPAT Interface System Output Formatter Admin WattWatcher Control Application Controller 9 Michael LeBeane 10/20/2015

  10. SBAC-PAD 2015 WattWatcher Toolkit Overview Processor Hardware Counter Hardware SUT Analyzer Events Descriptor Descriptor Events Power Analysis WattWatcher Analyzer Network WattWatcher Collector Connection Input Formatter Operating HW Event Customized McPAT Interface System Output Formatter Admin WattWatcher Control Application Controller 10 Michael LeBeane 10/20/2015

  11. SBAC-PAD 2015 WattWatcher Toolkit Overview Processor Hardware Counter Hardware SUT Analyzer Events Descriptor Descriptor Events Power Analysis WattWatcher Analyzer Network WattWatcher Collector Connection Input Formatter Operating HW Event Customized McPAT Interface System Output Formatter Admin WattWatcher Control Application Controller 11 Michael LeBeane 10/20/2015

  12. SBAC-PAD 2015 WattWatcher Toolkit Overview Processor Hardware Counter Hardware SUT Analyzer Events Descriptor Descriptor Events Power Analysis WattWatcher Analyzer Network WattWatcher Collector Connection Input Formatter Operating HW Event Customized McPAT Interface System Output Formatter Admin WattWatcher Control Application Controller 12 Michael LeBeane 10/20/2015

  13. SBAC-PAD 2015 WattWatcher Calibration 30 35 WattWatcher 1.121 RAPL 30 25 Error 1.066 25 Percentage Error 20 1.041 1.016 20 Watts 0.9907 15 0.9797 0.7855 0.8055 0.8256 0.8506 0.8756 0.8956 0.9207 0.9407 15 10 10 5 5 0 0 800 1000 1200 1400 1600 1800 2000 2200 Frequency ▪ McPAT typically underestimates power [10] ▪ Small amount of course-grained calibration required ▪ More sophisticated corrections are available [13] 13 Michael LeBeane 10/20/2015

  14. SBAC-PAD 2015 Verification ▪ Intel Sandy Bridge Laptop – Intel i7 2720QM – 32nm Process – 45W TDP ▪ Workloads: SPECFP + SPECINT[11], PARSEC[12] ▪ Compared against RAPL counters ▪ Other SUTs: Intel Haswell and AMD Piledriver ▪ All results use previous coarse-grained calibration 14 Michael LeBeane 10/20/2015

  15. SBAC-PAD 2015 Verification xalancbmk : SPECint 18 30 Percentage Error 16 20 Watts 14 10 12 10 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 171 181 191 201 211 221 231 241 251 261 271 281 291 301 bwaves : SPECfp 18 30 Percentage Error 16 20 Watts 14 10 12 10 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 106 111 116 121 126 ▪ Pearson correlation coefficient (0.982, 0.995) 15 Michael LeBeane 10/20/2015

  16. SBAC-PAD 2015 Verification 0.9 25 MAE RMSE MAPE 0.8 MAE/RMSE (Watts) MAPE (Percentage) 20 0.7 0.6 15 0.5 0.4 10 0.3 0.2 5 0.1 0 0 0.7 25 MAE RMSE MAPE MAE/RMSE (Watts) MAPE (Percentage) 0.6 20 0.5 15 0.4 0.3 10 0.2 5 0.1 0 0 ▪ MAPE over all workloads is 2.67% 16 Michael LeBeane 10/20/2015

  17. SBAC-PAD 2015 Case Studies 1: Per Core Power Measurements Phase 1 Phase 2 Phase 3 Phase 1 Phase 2 Phase 3 22 6 19 Core 0 3 Watts 16 All Cores 0 Aggregate 6 13 Core 1 3 Watts 10 1 16 31 46 61 0 Runtime (s) 6 ▪ canneal workload (PARSEC) Core 2 3 ▪ Per core and aggregate 0 breakdown 6 Core 3 3 ▪ RAPL cannot provide core level 0 1 11 21 31 41 51 61 breakdown Runtime (s) 17 Michael LeBeane 10/20/2015

  18. SBAC-PAD 2015 Case Studies 2: Big Data Workloads 140 60 120 50 % CPU Utilization 100 40 80 Watts 30 60 20 40 10 20 0 0 Leakage Power Dynamic Power CPU Utilization ▪ Big Data Workloads: Hadoop 18 Michael LeBeane 10/20/2015

  19. SBAC-PAD 2015 Case Studies 3: Functional Unit Breakdowns 100% 140 90% 120 Percentage of Total Watts 80% 100 70% Total Watts 60% 80 50% 60 40% 30% 40 20% 20 10% 0% 0 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 Runtime (s) Fetch IC ALU LS L2 DC OoO Total ▪ Word count power breakdown 19 Michael LeBeane 10/20/2015

  20. SBAC-PAD 2015 Conclusion ▪ WattWatcher fills an important role in power estimation techniques – Real time results on real hardware – Highly configurable models – Minimal calibration required – Verified over different processors and vendors • MAPE = 2.67% averaged over all benchmarks – Illustrated over a number of interesting case studies Thank you! 20 Michael LeBeane 10/20/2015

  21. SBAC-PAD 2015 References ▪ [1] R. Ge et al. , “ Powerpack: Energy profiling and analysis of high performance systems and applications,” IEEE Transactions on Parallel and Distributed Systems , vol. 21, no. 5, pp. 658 – 671, May 2010. ▪ [2] W. Bircher and L. John, “Complete system power estimation: A trickledown approach based on performance events,” in ISPASS , April 2007, pp. 158 – 168. ▪ [3] G. Contreras and M. Martonosi , “Power prediction for intel xscale processors using performance monitoring unit events,” in ISLPED ’05 , 2005. ▪ [4] S. Gurumurthi et al. , “Using complete machine simulation for software power estimation: The softwatt approach,” in HPCA , 2002. ▪ [5] C. Isci and M. Martonosi , “Runtime power monitoring in high-end processors : methodology and empirical data,” in MICRO , Dec 2003, pp. 93 – 104. ▪ [6] R. Joseph and M. Martonosi , “Run -time power estimation in high performance microprocessors,” in ISLPED 2001 ▪ [ 7] “AMD BIOS and Kernel Developer’s Guide for AMD family 15h Models 00h- 0Fh Processors,” http://support.amd.com/TechDocs/. ▪ [8] J. Dongarra et al. , “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures,” in CGC , Nov 2012, pp. 274 – 281. ▪ [9] D. Brooks, V. Tiwari, and M. Martonosi , “ Wattch: a framework for architectural-level power analysis and optimizations,” in ISCA , June 2000, pp. 83 – 94. ▪ [10] S. Li et al. , “ Mcpat: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in MICRO , Dec 2009, pp. 469 – 480 ▪ [11 ] J. L. Henning, “Spec cpu2006 benchmark descriptions,” SIGARCH Comput. Archit. News. ▪ [12] C. Bienia , “Benchmarking modern multiprocessors,” Ph.D. dissertation, Princeton University, January 2011. ▪ [13] W. Lee, et. al. “ PowerTrain: A Learning-based Calibration of McPAT Power Models ,” in ISLPED 2015 21 Michael LeBeane 10/20/2015

Recommend


More recommend