Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera Graduate School of Informatics, Kyoto University, Japan 1
Energy Reduction by Dynamic Voltage Scaling Threshold voltage tuning ( 𝑊 th ) Supply voltage tuning ( 𝑊 DD ) DVFS: Dynamic Voltage and ABB: Adaptive Body Biasing Frequency Scaling Delay Delay Energy Energy Threshold voltage ( 𝑊 th ) Supply voltage ( 𝑊 DD ) Dynamic energy Static energy DD - and 𝑊 th -tuning technique for energy minimization 𝑊 2
Minimum Energy Point Tracking (MEP Tracking) Energy minimization by voltage scaling under a given frequency 1.2 Supply Voltage [V] MEPT example: Renesas SOTB 65-nm 1.0 140 pJ Cell-based memory 0.8 90 pJ Performance contour 0.6 0.4 - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small 𝑊 Large 𝑊 th th Minimum Energy Point: MEP (Best combination of 𝑊 DD and 𝑊 th ) Target: MEP tracking technique for processors 3
Activity Factor Dependency of MEP Curves (Activity 𝟐𝟏𝟏% → 𝟐𝟏% ) Issue: MEPs heavily depend on activity factors (toggle rates) 1.2 Optimized Supply Voltage [V] 12 pJ 1.0 0.8 Performance contour 0.6 Unoptimized 0.4 20 pJ - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small 𝑊 Large 𝑊 th th Activity factor: Important parameter determining MEPs 4
Overview of This Work MEP with 10% activity ≅ On-chip memory 1.2 Supply Voltage [V] 1.0 0.8 Performance contour 0.6 MEP with 100% activity 0.4 ≅ Logic circuits - 0.5 - 1.0 - 1.5 - 2.0 0 Body Bias [V] Small 𝑊 Large 𝑊 th th Individual voltage scaling problem in logic and memory circuits Heuristic algorithm for runtime optimization 5
Outline • Background • Individual Voltage Scaling Problem • Silicon Measurement • Conclusion 6
(Existing) Uniform Voltage Scaling Problem MEP curve Circuit energy 𝐹 min Performance contour DD for 𝐸 = 𝐸 0 s. t. 𝐸 ≤ 𝐸 0 𝑊 Target performance 𝑊 DD , 𝑊 th ∈ ℝ Circuit delay Solution 𝑊 th • Existing approach: Runtime MEP tracking [5] Tunes 𝑊 DD and 𝑊 th iteratively Initial point DD Requires only simple circuits 𝑊 Enables to track MEPs at runtime even if 𝐸 = 𝐸 0 target performance Finish 𝑊 th temperature dynamically change Energy & delay monitoring activity 7 (MEP check)
Individual Voltage Scaling Problem 𝑊 𝑊 DD,M 𝑊 𝑊 DD,L th,M th,L 𝐹 L + 𝐹 M min Memory Logic 𝐸 L + 𝐸 M ≤ 𝐸 0 s. t. 𝐸 M 𝐸 L 𝑊 DD,L , 𝑊 th,L , 𝑊 DD,M , 𝑊 th,M ∈ ℝ Constraint 𝐸 0 L No runtime algorithms due to complex delay assignment between 𝐸 L and 𝐸 M Logic Memory Power Power Voltage scaling in logic Huge energy saving Voltage boost in mem. Delay Delay 𝐸 0 𝐸 0 This work: Heuristic algorithm for runtime voltage scaling 8
Various Strategies in Uniform Voltage Scaling Delay contour ( 𝐸 L + 𝐸 M = 𝐸 0 ) DD 𝑊 Memory MEP ( 𝐹 M min.) Processor MEP ( 𝐹 L + 𝐹 M min.) Logic MEP ( 𝐹 L min.) 𝑊 th 𝐹 L optimized, but 𝐹 M NOT optimized 𝐹 L , 𝐹 M balanced ⇒ Solution in uniform voltage scaling 9
Concept of the Proposed Heuristic Algorithm Delay contour ( 𝐸 L + 𝐸 M = 𝐸 0 ) DD 𝑊 Memory MEP ( 𝐹 M min.) Processor MEP ( 𝐹 L + 𝐹 M min.) Logic MEP ( 𝐹 L min.) 𝑊 th Logic voltages ( 𝑊 DD,L , 𝑊 th,L ) Memory voltages ( 𝑊 DD,M , 𝑊 th,M ) Point: 𝐸 L and 𝐸 M are constant over the delay contour ( ) Enable local minimum energy point operation 10
Simple Heuristic Algorithm for Individual Voltage Scaling Logic MEP Step 1 Logic Energy Optimization Delay contour DD,M 1. Uniform voltage tuning in Logic & Mem. 𝐸 L + 𝐸 M = 𝐸 0 Init. point (i.e., 𝑊 DD,M & 𝑊 th,M ) DD,L = 𝑊 th,L = 𝑊 DD,L = 𝑊 Enables to apply existing techniques Mem. MEP 2. Find logic MEP ( ) 𝑊 𝑊 th,L = 𝑊 th,M Step 2 Memory Energy Optimization 1. Tune only mem. voltages ( 𝑊 DD,M & 𝑊 th,M ) Tune only mem. voltages DD,M 2. Find memory MEP ( ) DD,L ≠ 𝑊 Enable runtime energy optimization Fix logic voltages 𝑊 Local minimum energy point operation 𝑊 th,L ≠ 𝑊 11 th,M
Outline • Background • Individual Voltage Scaling Problem • Silicon Measurement • Conclusion 12
Case Study: 32-bit RISC Processor Target • Renesas SOTB 65-nm • On-chip memory - 4 kB I-Cache + TAG - 8 kB I-SPM - 16 kB D-SPM Standard-cell based memory Logic ( 𝑊 DD,L ) Mem. ( 𝑊 DD,M ) Main memory (DCT loop) Supply voltage & body bias I/O • Individual in logic and mem. - Body bias for nMOSFETs in logic circuits is fixed at GND • No level converters between logic and memory 13 Body bias 𝑊 𝑊 BN,M 𝑊 BP,L BP,M
Activity Factor Dependency of Memory MEPs ( 𝑊 BB,M ) DD,L = 𝑊 DD,M & 𝑊 BB,L = 𝑊 Fmax contour of the fabricated processor [MHz] 1.2 𝜷 𝐍 = 𝟏. 𝟏𝟐 𝜷 𝐍 = 𝟏. 𝟐 Supply Voltage [V] 𝜷 𝐍 = 𝟐 𝛽 M : Memory activity factor 1.0 1 Activate in each clock cycle 0.8 0.1 Activate once in 10 clock cycles 0.01 0.6 Activate once in 100 clock cycles Logic 0.4 MEP -0.5 -1.5 0 -1.0 -2.0 Body Bias [V] Small 𝑊 Large 𝑊 th th MEPs move to the upper right as activity 𝛽 M decreases 14
Measurement Results of the Proposed Algorithm ( 𝛽 M = 0.01 ) Fmax contour of the fabricated processor [MHz] 1.2 Step 1 Supply Voltage [V] 1.0 1. Uniform voltage scaling 2. Find logic MEP ( ) 0.8 Step 2 1. Fix logic voltages @ 0.6 Mem. 2. Tune only mem. voltage & MEP Logic find mem. MEP ( ) 0.4 MEP -0.5 -1.5 0 -1.0 -2.0 Body Bias [V] Small 𝑊 Large 𝑊 th th Individual voltage tuning achieved by the proposed algorithm 15
Energy Reduction by Individual Voltage Scaling ( 𝛽 M = 0.01 ) 100 Memory static energy Memory dynamic energy 80 Logic static energy −10% Logic dynamic energy Total Energy 60 −13% Consumption −15% [pJ / cycle] −16% 40 20 0 Fmax 4 MHz 8 MHz 20 MHz 29 MHz 16 Up to 16% energy reduction by individual voltage scaling
Conclusion & Future Work Conclusion • Individual voltage scaling problem in logic and memory presented - Key: Activity factor gap between logic and memory circuits • A heuristic algorithm proposed for runtime energy optimization • Case study using RSIC processors in 65-nm process - Up to 16% energy reduction compared with uniform voltage scaling Future work • Energy overhead compared with the global solution • Energy overhead introduced by fine- grained voltage tuning, etc… 17
18
Energy Reduction by Individual Voltage Scaling ( 𝛽 M = 0.1 ) 100 Memory static energy −5% Memory dynamic energy 80 Logic static energy Logic dynamic energy Total Energy −7% 60 Consumption −11% −9% [pJ / cycle] 40 20 0 Fmax 4 MHz 8 MHz 20 MHz 29 MHz No energy improvement when 𝛽 M = 1 19
Definition of 𝛽 M On-chip memory property • No clock gating circuits • Dynamic energy consumption @ each clock cycle Implemented on-chip memory has large activity factor • Parameter 𝛽 M implemented to scale activity factor Measured value Evaluated value × 𝛽 M Dynamic energy Static energy Measured Leakage 20 memory energy energy
System-Level Optimization Problem The problem can be abstracted to system-level optimization CPU execution time 𝑊 DD , 𝑊 th ( ≃ 𝐸 M ) CPU Low activity ( ≃ Memory) Time DSP execution time ( ≃ 𝐸 L ) DSP High activity ( ≃ Logic) Time 𝑊 DD , 𝑊 th Deadline ( ≃ 𝐸 0 ) Future work: Applying the heuristic to system-level optimization 21
Recommend
More recommend