innovative power control for ultra low power and high
play

Innovative Power Control for Ultra Low-Power and High- Ultra Low - PowerPoint PPT Presentation

Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance System LSIs Hiroshi Nakamura (Univ. of Tokyo) Hideharu Amano (Keio Univ.) Masaaki Kondo (Univ. of Electro-Communications) Mitaro Namiki (Tokyo


  1. Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance System LSIs Hiroshi Nakamura (Univ. of Tokyo) Hideharu Amano (Keio Univ.) Masaaki Kondo (Univ. of Electro-Communications) Mitaro Namiki (Tokyo Univ. of Agriculture and Tech.) Kimiyoshi Usami Kimiyoshi Usami (Shibaura Inst. of Tech.) (Shibaura Inst. of Tech.) JST-CREST ULP Workshop (H.Nakamura) 1

  2. Objective and Strategy System Software  Objective: Co-Opt C d drastic power reduction of ti d ti f high-performance system LSIs timizat  Strategy: Strategy: innovative power control Compiler through tight Co-Optimization / through tight Co Optimization / ion/Co Architecture Co-Design of system software, architecture, and circuit design . architecture and circuit design o-Desig  Principle: Performance: limited by a bottleneck Performance: limited by a bottleneck gn Circuit Technology Power: summation of whole system  Low power and slow operation for  Low power and slow operation for unhurried / idle parts JST-CREST ULP Workshop (H.Nakamura) 2

  3. Role of Design Hierarchy for Low Power When? OS OS Where? Architecture Circuit How? throttle lever of power/performance / f Device Clock Gating, Dual Vth, DVFS Power Gating Back-bias DVFS, Power Gating, Back bias, ..  Circuit Level : Provide levers to throttle performance / power  Architecture OS Level :  Architecture, OS Level : Find a chance to set levers, when and where ??  architecture: Intra-task/process optimization  OS: Inter-task/process optimization JST-CREST ULP Workshop (H.Nakamura) 3

  4. Preferable Throttle Lever  Effectiveness of Processor Reconfig S System t fp int Power Reduction cache Cache  Low Overhead in Area, busy Processor Processor Performance, Power Memory fp int  Controlling the throttle cache Network lever itself takes time System LSI and consumes power  Fine Control Granularity Fi C t l G l it in both Space and Time  Locations of busy / L ti f b / idle idle parts are small and change frequently and change frequently time JST-CREST ULP Workshop (H.Nakamura) 4

  5. Example of Throttle Levers  for dynamic power: Clock Gating, DVFS  both effective, DVFS particular (Power ∝ Vdd 2 ) Vdd 2 ) b th ff ti DVFS ti l (P  Clock Gating: very fine-grained control with little overhead  easily utilized within circuit level design  DVFS: tens of μ s to change Vdd through regulator  moderate granularity  for leakage power: Power Gating, Body Biasing  both effective, but large overhead Vdd in power and performance Circuit Bl Block k  Body biasing: spatial granularity VGND  statically defined regions sleep signal sleep signal sleep Tr. sleep Tr  not easy for fine-grained control t f fi i d t l Power Gating GND JST-CREST ULP Workshop (H.Nakamura) 5

  6. Role of Design Hierarchy for Low Power: The Ideal System System OS OS When? When? When? Where? Where? Architecture Architecture Architecture Architecture H How? ? Circuit Circuit How?  Spatial and Temporal  Spatial and Temporal Granularity is important Device Device  Co-Design of Circuit, Architecture and OS for Power  Co Optimization of Throttle Lever Control:  Co-Optimization of Throttle Lever Control: especially, Co-Optimization of Spatial and Temporal Granularity ex activity localization to make full use of throttle levers ex. activity localization to make full use of throttle levers characteristics by architecture/OS JST-CREST ULP Workshop (H.Nakamura) 6

  7. Team Formation of our Research Project Sub-theme (leader) ( ) System Software y C Co-Optim S System S a and Arch Co-operative System Soft- ware with Arch. (Prof. Namiki) mization o hitecture Software Ultra Low-Power Reconf. Network Architecture (Prof. Amano) ( ) Reconfig e e of System Processor fp int Architecture/ Data Resident Architecture Memory Compiler cache (Prof Nakamura) (Prof. Nakamura) C Co-Optim Archite Circuit Data Resident Compiler (Prof. Kondo) ( ) mization t Design cture an VddH VddL Ultra Low-Power Circuit logic Design (Prof. Usami) g ( ) nd n block block Circuit Design Ci it D i of JST-CREST ULP Workshop (H.Nakamura) 7

  8. (Project 1) Geyser: Low Power Processor through Fine-grained Runtime Power Gating g g  Target: Leakage Power  Background: Leakage reduction techniques so far,  Standby time:  power-gating (Coarse Grain)  Runtime:  Cache-decay, Drowsy-cache, (Coarse Grain in temporal)  Leakage for logic parts (ALU, multiplier, etc.) gets serious  Fast but Leaky transistors are used  Active ratio of those parts are not necessarily high, but active y g parts change frequently, that is, cycle by cycle Objective : Reduce runtime leakage power of logic parts Challenge: how to optimize the granularity of power gating JST-CREST ULP Workshop (H.Nakamura) 8

  9. Instruction Pipeline with Power-Gating  Geyser: MIPS compatible processor with 5-stage pipeline,  Straightforward PG (power-gating)  Turn EX-units into active mode only if necessary  Ex unit gets active when an affecting instruction enters the IF stage  Ex-unit gets active when an affecting instruction enters the IF stage  The activated EX-unit returns to sleep mode after execution MEM MEM WB WB ID ID EX EX IF IF Inst ALU Mult Operation Operation SHIFT SHIFT SHIFT SHIFT Instruction Instruction Instruction Instruction i i Shift Div Shift Detects which unit Detects which unit S Sends wake-up signal d k i l will be used MIPS R3000 pipeline 9 JST-CREST ULP Workshop (H.Nakamura)

  10. Challenges for Run-Time Power-Gating: Energy Overhead Energy Overhead Power Power Break-Even Time (BET) : Energy overhead + 1 3 2 : part of leakage saving 2 : part of leakage saving 1 1 3 3 Normal Leakage + = 2 1 3 2 4 Break-Even Time(BET) ( ) 4 : Net Energy saving Time Wake- Sleep Up  Sleep period should be longer than BET  Sleep period should be longer than BET  Otherwise, total energy consumption increases  BET tells the smallest granularity for Power Gating BET t ll th ll t l it f P G ti 10 JST-CREST ULP Workshop (H.Nakamura)

  11. 11 Break Even Time of Each Functional Unit 90 nm technology 114 25 ℃ 25 ℃ 65 ℃ 65 ℃ 100 ℃ 100 ℃ 125 ℃ 125 ℃ 92 Cycl 74 74 les @20 44 38 28 28 00MHz 26 26 22 16 14 12 12 10 10 8 8 8 6 2 ALU Shift Mult Div CP0  BET is shortened when the chip temperature climbs up  BET is shortened when the chip temperature climbs up  Leakage current depends on temperature heavily  We need Novel PG strategies taking BET into account  We need Novel PG strategies taking BET into account JST-CREST ULP Workshop (H.Nakamura) 11

  12. Power Gating Strategies Requirement: Power off Ex-units longer than BET  static strategy  static strategy  straightforward : Ex-units always in sleep after execution  ideal compiler (ideal compiler-directed): exact average idle time of  ideal compiler (ideal compiler directed): exact average idle time of Ex-units after each instruction is known (for reference only)  dynamic strategy  L1 miss: Ex-units fall asleep only if encountering L1 cache misses  L1 miss penalty = 15 cycles  L2 miss: Ex units fall asleep only if encountering L2 cache misses  L2 miss: Ex-units fall asleep only if encountering L2 cache misses  L2 miss penalty = 200 cycles  both static and dynamic strategies bo s a c a d dy a c s a eg es  ideal compiler + L2 cache miss  ideal (God) : ideal dynamic strategy ( ) y gy  exact idle time of Ex-units are known at anytime, upper limit of PG (for reference only) JST-CREST ULP Workshop (H.Nakamura) 12

  13. Result for Frequently Used Execution Unit FPADD for MGRID ideal compiler: less chance straightforward: BET is straightfor ard BET is for longer BET longer than sleep time  waste of energy L1: resulting sleep time is about 15  ideal for BET<15, but waste of energy for longer BET straightforward ideal compiler L1 L2 Relative L2: resulting sleep time is 200 ideal comp. + L2 Energy  ideal for longer BET  ideal for longer BET ideal (God) compared to non-PG for shorter BET, compiler is effective BET(cycle) JST-CREST ULP Workshop (H.Nakamura) 13

  14. Collaboration with Compiler / OS Suggested Power Gating Strategy  Co-optimization on Control Granularity of the PG lever  compiler direction by assuming short BET, p y g , because compiler-directed PG is effective for shorter BET  for shorter BET (high temperature) compiler direction is  for shorter BET (high temperature), compiler direction is put into use, and take (compiler + L2-miss) strategy  for longer BET (low temperature), take L2-miss strategy, f ( ) but ignore compiler direction  OS is expected to switch between strategies by observing changes on BET g  Power Gating Collaborated with Compiler / OS JST-CREST ULP Workshop (H.Nakamura) 14

Recommend


More recommend