Innovative Power Control for Ultra Low-Power and High- Ultra Low Power and High Performance System LSIs Hiroshi Nakamura (Univ. of Tokyo) Hideharu Amano (Keio Univ.) Masaaki Kondo (Univ. of Electro-Communications) Mitaro Namiki (Tokyo Univ. of Agriculture and Tech.) Kimiyoshi Usami Kimiyoshi Usami (Shibaura Inst. of Tech.) (Shibaura Inst. of Tech.) JST-CREST ULP Workshop (H.Nakamura) 1
Objective and Strategy System Software Objective: Co-Opt C d drastic power reduction of ti d ti f high-performance system LSIs timizat Strategy: Strategy: innovative power control Compiler through tight Co-Optimization / through tight Co Optimization / ion/Co Architecture Co-Design of system software, architecture, and circuit design . architecture and circuit design o-Desig Principle: Performance: limited by a bottleneck Performance: limited by a bottleneck gn Circuit Technology Power: summation of whole system Low power and slow operation for Low power and slow operation for unhurried / idle parts JST-CREST ULP Workshop (H.Nakamura) 2
Role of Design Hierarchy for Low Power When? OS OS Where? Architecture Circuit How? throttle lever of power/performance / f Device Clock Gating, Dual Vth, DVFS Power Gating Back-bias DVFS, Power Gating, Back bias, .. Circuit Level : Provide levers to throttle performance / power Architecture OS Level : Architecture, OS Level : Find a chance to set levers, when and where ?? architecture: Intra-task/process optimization OS: Inter-task/process optimization JST-CREST ULP Workshop (H.Nakamura) 3
Preferable Throttle Lever Effectiveness of Processor Reconfig S System t fp int Power Reduction cache Cache Low Overhead in Area, busy Processor Processor Performance, Power Memory fp int Controlling the throttle cache Network lever itself takes time System LSI and consumes power Fine Control Granularity Fi C t l G l it in both Space and Time Locations of busy / L ti f b / idle idle parts are small and change frequently and change frequently time JST-CREST ULP Workshop (H.Nakamura) 4
Example of Throttle Levers for dynamic power: Clock Gating, DVFS both effective, DVFS particular (Power ∝ Vdd 2 ) Vdd 2 ) b th ff ti DVFS ti l (P Clock Gating: very fine-grained control with little overhead easily utilized within circuit level design DVFS: tens of μ s to change Vdd through regulator moderate granularity for leakage power: Power Gating, Body Biasing both effective, but large overhead Vdd in power and performance Circuit Bl Block k Body biasing: spatial granularity VGND statically defined regions sleep signal sleep signal sleep Tr. sleep Tr not easy for fine-grained control t f fi i d t l Power Gating GND JST-CREST ULP Workshop (H.Nakamura) 5
Role of Design Hierarchy for Low Power: The Ideal System System OS OS When? When? When? Where? Where? Architecture Architecture Architecture Architecture H How? ? Circuit Circuit How? Spatial and Temporal Spatial and Temporal Granularity is important Device Device Co-Design of Circuit, Architecture and OS for Power Co Optimization of Throttle Lever Control: Co-Optimization of Throttle Lever Control: especially, Co-Optimization of Spatial and Temporal Granularity ex activity localization to make full use of throttle levers ex. activity localization to make full use of throttle levers characteristics by architecture/OS JST-CREST ULP Workshop (H.Nakamura) 6
Team Formation of our Research Project Sub-theme (leader) ( ) System Software y C Co-Optim S System S a and Arch Co-operative System Soft- ware with Arch. (Prof. Namiki) mization o hitecture Software Ultra Low-Power Reconf. Network Architecture (Prof. Amano) ( ) Reconfig e e of System Processor fp int Architecture/ Data Resident Architecture Memory Compiler cache (Prof Nakamura) (Prof. Nakamura) C Co-Optim Archite Circuit Data Resident Compiler (Prof. Kondo) ( ) mization t Design cture an VddH VddL Ultra Low-Power Circuit logic Design (Prof. Usami) g ( ) nd n block block Circuit Design Ci it D i of JST-CREST ULP Workshop (H.Nakamura) 7
(Project 1) Geyser: Low Power Processor through Fine-grained Runtime Power Gating g g Target: Leakage Power Background: Leakage reduction techniques so far, Standby time: power-gating (Coarse Grain) Runtime: Cache-decay, Drowsy-cache, (Coarse Grain in temporal) Leakage for logic parts (ALU, multiplier, etc.) gets serious Fast but Leaky transistors are used Active ratio of those parts are not necessarily high, but active y g parts change frequently, that is, cycle by cycle Objective : Reduce runtime leakage power of logic parts Challenge: how to optimize the granularity of power gating JST-CREST ULP Workshop (H.Nakamura) 8
Instruction Pipeline with Power-Gating Geyser: MIPS compatible processor with 5-stage pipeline, Straightforward PG (power-gating) Turn EX-units into active mode only if necessary Ex unit gets active when an affecting instruction enters the IF stage Ex-unit gets active when an affecting instruction enters the IF stage The activated EX-unit returns to sleep mode after execution MEM MEM WB WB ID ID EX EX IF IF Inst ALU Mult Operation Operation SHIFT SHIFT SHIFT SHIFT Instruction Instruction Instruction Instruction i i Shift Div Shift Detects which unit Detects which unit S Sends wake-up signal d k i l will be used MIPS R3000 pipeline 9 JST-CREST ULP Workshop (H.Nakamura)
Challenges for Run-Time Power-Gating: Energy Overhead Energy Overhead Power Power Break-Even Time (BET) : Energy overhead + 1 3 2 : part of leakage saving 2 : part of leakage saving 1 1 3 3 Normal Leakage + = 2 1 3 2 4 Break-Even Time(BET) ( ) 4 : Net Energy saving Time Wake- Sleep Up Sleep period should be longer than BET Sleep period should be longer than BET Otherwise, total energy consumption increases BET tells the smallest granularity for Power Gating BET t ll th ll t l it f P G ti 10 JST-CREST ULP Workshop (H.Nakamura)
11 Break Even Time of Each Functional Unit 90 nm technology 114 25 ℃ 25 ℃ 65 ℃ 65 ℃ 100 ℃ 100 ℃ 125 ℃ 125 ℃ 92 Cycl 74 74 les @20 44 38 28 28 00MHz 26 26 22 16 14 12 12 10 10 8 8 8 6 2 ALU Shift Mult Div CP0 BET is shortened when the chip temperature climbs up BET is shortened when the chip temperature climbs up Leakage current depends on temperature heavily We need Novel PG strategies taking BET into account We need Novel PG strategies taking BET into account JST-CREST ULP Workshop (H.Nakamura) 11
Power Gating Strategies Requirement: Power off Ex-units longer than BET static strategy static strategy straightforward : Ex-units always in sleep after execution ideal compiler (ideal compiler-directed): exact average idle time of ideal compiler (ideal compiler directed): exact average idle time of Ex-units after each instruction is known (for reference only) dynamic strategy L1 miss: Ex-units fall asleep only if encountering L1 cache misses L1 miss penalty = 15 cycles L2 miss: Ex units fall asleep only if encountering L2 cache misses L2 miss: Ex-units fall asleep only if encountering L2 cache misses L2 miss penalty = 200 cycles both static and dynamic strategies bo s a c a d dy a c s a eg es ideal compiler + L2 cache miss ideal (God) : ideal dynamic strategy ( ) y gy exact idle time of Ex-units are known at anytime, upper limit of PG (for reference only) JST-CREST ULP Workshop (H.Nakamura) 12
Result for Frequently Used Execution Unit FPADD for MGRID ideal compiler: less chance straightforward: BET is straightfor ard BET is for longer BET longer than sleep time waste of energy L1: resulting sleep time is about 15 ideal for BET<15, but waste of energy for longer BET straightforward ideal compiler L1 L2 Relative L2: resulting sleep time is 200 ideal comp. + L2 Energy ideal for longer BET ideal for longer BET ideal (God) compared to non-PG for shorter BET, compiler is effective BET(cycle) JST-CREST ULP Workshop (H.Nakamura) 13
Collaboration with Compiler / OS Suggested Power Gating Strategy Co-optimization on Control Granularity of the PG lever compiler direction by assuming short BET, p y g , because compiler-directed PG is effective for shorter BET for shorter BET (high temperature) compiler direction is for shorter BET (high temperature), compiler direction is put into use, and take (compiler + L2-miss) strategy for longer BET (low temperature), take L2-miss strategy, f ( ) but ignore compiler direction OS is expected to switch between strategies by observing changes on BET g Power Gating Collaborated with Compiler / OS JST-CREST ULP Workshop (H.Nakamura) 14
Recommend
More recommend