Importance of Low-power Designs Cost factor for high-end systems � High-end systems Lecture 24: Power-efficient � Cooling and package cost � > 40 W: 1 W � $1 Designs � Air-cooled techniques: reaching limits � Electricity bill � Reliability Dynamic and static power, processor � Desktop PCs consume around 10% power in US power distribution, low power techniques in processor design, Usability of Portable systems: examples � Battery lifetime Restriction factor for high-performance server design � Power determines processor density Credits: Zhichun Zhu Thesis defense, HPCA’01 Low Power Tutorial, WRL Cacti Model 1 2 Dynamic vs. Static Power Processor Performance vs. Power Trends Dynamic: 10000 � Charge/discharge Pentium 4 Pentium III capacitors when 1000 switching between 0 Frequency (MHz) Pentium Pro Transistor (M) Pentium II and 1 Power (W) 100 Frequency � Short-circuit 80486 Pentium Transistor currents on Power 10 80386 transitions 1 Static (Leakage) � From sub-threshold 0.1 currents 5 7 9 1 3 5 7 9 1 3 8 8 8 9 9 9 9 9 0 0 Source: Intel.com 3 4 Importance of Low-power Architecture Sources of Power Consumption Designs Low power CMOS and logic designs Dynamic (dominant) [Tutorial:HPCA-7] alone can no longer solve all power 1 problems. = ⋅ 2 ⋅ ⋅ P dync C V A f 2 1 = ⋅ ⋅ ⋅ P dync C V 2 A f Static (2~5%) [Butts:MICRO-33] 2 ) ′ = V 0 . 7 V = ⋅ ⋅ ⋅ P N V k I static design leak ′ ′ = × ⇒ = C 0 . 7 2 C P 1 . 4 P dync dync C: capacitance, V: supply voltage, A: activity factor, f: clock rate ′ = f 2 f N: # transistors, k design : design parameter, I leak : leakage current 5 6 1
Low-power Techniques Power-aware Architecture Designs Physical (CMOS) level Utilize low-power circuit techniques Circuit level Exploit application characteristics Logic level Play an important role in low-power designs Architectural level � Pentium III 800 MHz processor OS level [CoolChip’00] Compiler level � Scaled from Pentium Pro: 90 watts. Algorithm/application level � After architectural design and optimization: 22 watts. 7 8 Tradeoff between Performance and Metrics for Power-Performance Power Efficiency Objects for general-purpose system Performance (CPU time or Delay) � Reduce power consumption without 1 degrading performance = ⋅ ⋅ D I CPI f Common solution Power consumption (P) � Access/activate resources only when necessary Energy consumption (E) Question = ⋅ E P D � When is necessary? 9 10 Metrics for Power-Performance Processor Power Distribution Efficiency Example (Alpha 21264) In most cases Power Consumption low power consumption low performance ↓ ⇒↓ ∝ f P ( P f ) � 1 ↓ ⇒↑ ∝ f D ( D ) � f Energy-efficiency metric = ⋅ = 2 EDP E D PD Clock Issue Caches FP Int Mem I/O Others Source: CoolChip Tutorial 11 12 2
Low Power Processor Design Low Power Memory Design Reduce power consumption of memory Reduce power consumption of processor core components � Voltage/frequency scaling: reduce supply voltage and/or frequency when processor is idle � Banked or hierarchical register file � Clock gating: disable clocks to inactive � Sub-banked cache components � Sequential access or way prediction � Pipeline gating: reduce mis-speculated instruction caches execution � Dynamically adjusting cache size � Pipeline balancing: adjust effective pipeline ways � Decay cache for reducing static power for available IPC � Low power DRAM with deep sleeping � Efficient issue logic: cluster structure, adjust modes: four modes in Rambus effective issue queue size, no matching for ready entries, reducing tag matching entries 13 14 Pipeline Gating Set Associative Cache Mis-speculated instruction increase energy tag set offset consumption, typically 16%-105% overhead tag0 data0 tag1 data1 tag2 data2 tag3 data3 Pipeline gating: stall fetching when confidence is low Prevent “bad” instructions from entering the pipeline: may reduce 38% of wrong inst decr low confidence > threshold? BP counter stall incr (when?) =? Mux 4:1 To CPU fetch decode issue exe/wb commit Power per access: 4T + 4D Pipeline gating: speculative control for energy reduction, isca 1998 15 16 Phased N-way Cache Way-prediction N-way Cache tag set offset Way-prediction tag set offset tag0 data0 tag1 data1 tag2 data2 tag3 data3 tag0 data0 tag1 data1 tag2 data2 tag3 data3 =? =? Mux 4:1 To CPU Mux 4:1 To CPU To CPU Power per access: 4T + 1D Correct prediction: 1T + 1D But access time increases 17 18 3
Low Power Server Design Power Evaluation Tools Low power considerations in Processor supercomputing � Wattch � Is high-performance processor the best � Analytical choice? � SimplePower � IBM Blue Gene: 64K nodes with PowerPC � Analytical (e.g. cache) 440 processors designed for low power � Transition-sensitive (e.g. FU) Power management for high- Cache performance servers � CACTI � Meet performance with minimal active nodes � Analytical 19 20 Low Power Technique Summary Power is critical in processor design: cost and dependability Power distributions: clock, issue logic, cache, etc. Architectural approaches � scale voltage, frequency, and/or pipeline width with required performance � reduce mis-speculated execution, eliminate unnecessary cache accesses and data � Many others System approaches: high-performance by low power processors Now low power is as important as performance 21 4
Recommend
More recommend