the elusive metric for the elusive metric for low power
play

The Elusive Metric for The Elusive Metric for Low- -Power - PowerPoint PPT Presentation

The Elusive Metric for The Elusive Metric for Low- -Power Architecture Power Architecture Low Research Research Hsien- -Hsin Hsin Sean Sean Lee Lee Joshua B. Fryman Hsien A. Utku Diril Yuvraj S. Dhillon Center for


  1. The Elusive Metric for The Elusive Metric for Low- -Power Architecture Power Architecture Low Research Research Hsien- -Hsin Hsin “ “Sean Sean” ” Lee Lee Joshua B. Fryman Hsien A. Utku Diril Yuvraj S. Dhillon Center for Experimental Research in Computer Systems Center for Experimental Research in Computer Systems Georgia Institute of Technology Georgia Institute of Technology Atlanta, GA 30332 Atlanta, GA 30332 Workshop for Complexity-Effective Design, San Diego, CA, 2003

  2. Background Picture Background Picture � Energy-Delay product (EDP) [Gonzalez & Horowitz 96] � “Power” is meaningless ( ∝ frequency) � “Energy per instruction” is elusive ( ∝ CV 2 ) � “Energy × Delay” (J/SPEC or J × IPC) is better 3 CV � Use Alpha-power model, ∝ dd ED α (V - V ) dd th � Note that no “physical” meaning of EDP � Widespread adoption � De facto standard by community � Metric for energy and complexity effectiveness � New architectural techniques have arrived � New hardware exploiting low-power opportunities � Temperature-aware power detectors � Voltage & Frequency Scaling � Multi-threshold voltage 2 WCED-03

  3. Outline of the Talk Outline of the Talk � Potential pitfalls � Yeah, we all know, it is obvious…. but � Which “E” goes in ED product? � Impact of new hardware (more transistors) � Methodology matters in deep submicron processes � Observations � Summary 3 WCED-03

  4. Calculating ED Product Calculating ED Product � New architecture solutions save energy at the expense of (insensitive) performance loss � A number of research results were reported in the following manner: � Technique “X” for Data Cache � Reduce 50% energy of Data Cache � Lose 20% IPC � EDP = (1-0.5) × (1+0.2) = 0.60 ⇒ Very Energy efficient � Technique “Y” for Branch Predictor � Reduce 10% energy of Branch Predictor � Lose 20% IPC � EDP = (1-0.1) × (1+0.2) = 1.08 ⇒ Energy inefficient 4 WCED-03

  5. So What is E and What is D in EDP? So What is E and What is D in EDP? � Hypothetical black box � Battery (i.e. E) shared by ⇒ � CPU, DRAM, chipsets, graphics, TFT, Wi-Fi, HDD, flash disk � D typically account for some system effect DDR- such as DRAM latency DRAM Gfx card � Improvement proposed: C.S. � Remove 5% of E from flash disk flash HDD 802.11 � No delay incurred � Is this a good design decision? TFT Display � Flash disk is 10% of total E in system � Improvement amounts to 0.5% system impact � “In-the-noise” improvement � Is the “complexity” worth the effort? Battery � So, is EDP used in the right way? And is EDP so important? 5 WCED-03

  6. Energy Efficiency: E versus D Energy Efficiency: E versus D 100 Esaved=99% Esaved=90% Esaved=58% Esaved=50% Esvaed=30% 10 Esaved=10% Maxmum Delay Tolerance Esaved=5% 1 0.1 0.01 0.001 0.0001 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Power Distribution of a FU w.r.t. target system 6 WCED-03

  7. Example: Energy Efficiency: E vs. D Example: Energy Efficiency: E vs. D 100 Esaved=99% Esaved=90% Esaved=58% Esaved=50% Esvaed=30% 10 Esaved=10% Maxmum Delay Tolerance Esaved=5% 1 0.1 Tolerate ~25% performance loss 0.01 0.001 0.0001 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Energy Distribution w.r.t. target system 7 WCED-03

  8. Using EDP: Pentium Pro Using EDP: Pentium Pro 0.3 IFU (22%) IEU (14%) 0.28 ROB, DCU (11.1%) RS, FPU, Global Clock (7.9%) � Data Source: [Brooks 0.26 RAT, MOB (6.3%) BTB (4.7%) et al. 00] 0.24 � Assume 100% for 0.22 Maximum Delay Tolerance CPU 0.2 � 40% IFU power 0.18 reduction can tolerate 0.16 < 10% performance 0.14 loss 0.12 0.1 0.08 0.06 0.04 0.02 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Energy Saved for a functional unit u 8 WCED-03

  9. But CPU is not 100% of a System But CPU is not 100% of a System Maximum Delay Tolerance 150 CPU=100% 140 CPU=75% 130 120 CPU=50% 110 CPU=25% 100 90 80 70 60 50 40 30 20 10 0 0 0.1 Energy Distribution of µ 0.2 0.3 0.4 w.r.t. CPU only 0.5 0.6 1 0.7 0.9 0.8 0.8 0.7 0.6 0.5 0.9 1 0.4 0.3 0.2 Energy Saving for a functional unit µ 0.1 0 9 WCED-03

  10. Case Study: Filter Cache [Kin et. al 97,00] Case Study: Filter Cache [Kin et. al 97,00] � The Filter Cache design as reported � 58% Energy savings in “L1 Caches” � 21% IPC degradation � ED product as shown � (1-0.58)(1+0.21) << 1 � suggests this is a winning design � Question is “which E ?” 10 WCED-03

  11. Filter Cache: E Values Filter Cache: E Values Esaved = 58% [Kin et al. 00] 1.4 FilterCache CPU=100% � Use StrongARM 110 1.3 CPU=70% CPU=50% 1.2 � 43% ( ◊ ) energy by CPU=25% FilterCache SA-110 (I$+D$=43%) 1.1 Caches Maximum Delay Tolerance 1 � 27% in I-CACHE 0.9 � 16% in D-CACHE � CPU=X% stands for 0.8 X% of overall power 0.7 drawn by CPU 0.6 � Delay Tolerance 0.5 � 33% : CPU=100% FC slowdown 21% 0.4 � 21% : CPU=70% 0.3 � 14% : CPU=50% 0.2 � 6% : CPU=25% 0.1 � Not energy-efficient if 0 CPU < 70% 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Energy distribution for a functional unit u wrt CPU only 11 WCED-03

  12. Rethinking EDP: Rethinking EDP: Switching Activity vs. New Hardware Switching Activity vs. New Hardware � Ignore leakage and short-circuit power � Dynamic switching power is dominant � The “E” would be below � T: Transistor count � f: frequency = ⋅ ⋅ ⋅ 2 = ⋅ ⋅ ⋅ ⋅ 2 P a f C V a f C T V dyn dd g dd avg ≥ P P dyn dyn ref new ⋅ ⋅ ≥ ⋅ + ∆ ⋅ + ∆ a f T a ( f f ) ( T T ) ref new 12 WCED-03

  13. ED Variables ED Variables � The elegant ratio governing E… ∆ ∆ ∆ ∆ a f T f T ≥ 1 + + + ref a f T f T new � To include the application delay, D… 2 ∆ ∆ ∆ a  f T   D  ref ≥ + + +   1 1   a f T D     new � Can be applied to Macromodeling to determine the trade-off between transistor count and performance degradation 13 WCED-03

  14. Impact of Additional Transistor Count Impact of Additional Transistor Count 50 50 30% switching reduced 30% switching reduced 25% switching reduced 25% switching reduced 45 45 10% switching reduced 10% switching reduced 40 40 35 35 % Impact on f % Impact on D 30 30 25 25 20 20 15 15 10 10 5 5 0 0 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 50 % Impact on T (given freq. unchanged) % Impact on T (given delay unchanged by frequency scaling � Given a new avg switching probability of new architecture � LHS: Trading transistors with delay given no freq. scaling � RHS: Delay recovered by freq. scaling 14 WCED-03

  15. Role of Leakage Energy Role of Leakage Energy � As Deep Sub-Micron (DSM) era is upon us... More than 50% power from leakage Source: Intel Corp. Custom Integrated Circuits Conference 2002 � Leakage ignorance could revert conclusion � Early architecture evaluation � Leakage cannot be isolated from switching during evaluation � Additional HW can be harmful 15 WCED-03

  16. Evaluate the Leakage when adding Evaluate the Leakage when adding HW in Early Stage of Arch Definition HW in Early Stage of Arch Definition � Example: Dual-speed pipeline [Pyreddy and Tyson’01] x% inst 1-x% inst � Idea appears to be plausible non-critical critical � Identify critical instructions [Tune et al 01] [Seng et al. 01] � Two datapaths: fast and slow � Critical inst → fast pipe; remainder to slow � Slow pipe consumes less E than fast pipe � E.g. Multi-voltage supply, lower frequency � Let’s evaluate and assume: � N instructions; � x → slow datapath slow fast � (N-x) → fast datapath � How does leakage impact efficiency? � What x value to achieve energy efficiency? 16 WCED-03

  17. Dual Datapath Datapath Leakage Impact Leakage Impact Dual 0.5 � ”r” is power 0.45 ratio of slow vs. Minimum instructions to Slow Datapath fast 0.4 � A small r ⇒ � impair 0.35 performance 0.3 � Slow path becomes 0.25 critical path 0.2 0.15 0.1 r = 0.9 r = 0.75 r = 0.60 0.05 r = 0.5 r = 0.4 r = 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Static-to-Total Energy Ratio Soon to be Today 17 WCED-03

  18. Dual Datapath Datapath Leakage Impact Leakage Impact Dual 0.5 � ”r” is power 0.45 ratio of slow vs. Minimum instructions to Slow Datapath Soon to be fast 0.4 � A small r ⇒ � impair 0.35 performance 0.3 � Slow path becomes 0.25 critical path � % of non-critical 0.2 inst needed for slow datapath 0.15 Today � Today: ~17% 0.1 � Soon: ~40% r = 0.9 r = 0.75 r = 0.60 0.05 r = 0.5 r = 0.4 r = 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Static-to-Total Energy Ratio 18 WCED-03

Recommend


More recommend