SLIDE 1
Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera
1
Graduate School of Informatics, Kyoto University, Japan
SLIDE 2 Energy Reduction by Dynamic Voltage Scaling
2
𝑊
DD- and 𝑊 th-tuning technique for energy minimization Supply voltage (𝑊
DD)
Threshold voltage (𝑊
th)
Energy Energy Delay Delay
Static energy Supply voltage tuning (𝑊
DD)
Threshold voltage tuning (𝑊
th)
DVFS: Dynamic Voltage and Frequency Scaling ABB: Adaptive Body Biasing
Dynamic energy
SLIDE 3 Minimum Energy Point Tracking (MEP Tracking)
3 Energy minimization by voltage scaling under a given frequency
MEPT example: Renesas SOTB 65-nm Cell-based memory
Target: MEP tracking technique for processors
Minimum Energy Point: MEP (Best combination of 𝑊
DD and 𝑊 th)
1.2 1.0 0.8 0.6 0.4
Supply Voltage [V] Body Bias [V]
Large 𝑊
th
140 pJ 90 pJ Small 𝑊
th
Performance contour
SLIDE 4 Activity Factor Dependency of MEP Curves (Activity 𝟐𝟏𝟏% → 𝟐𝟏%)
4 Activity factor: Important parameter determining MEPs Issue: MEPs heavily depend on activity factors (toggle rates)
1.2 1.0 0.8 0.6 0.4
Supply Voltage [V] Body Bias [V]
Large 𝑊
th
Small 𝑊
th
Performance contour 12 pJ Optimized 20 pJ Unoptimized
SLIDE 5 Overview of This Work
5 Individual voltage scaling problem in logic and memory circuits Heuristic algorithm for runtime optimization
1.2 1.0 0.8 0.6 0.4
Supply Voltage [V] Body Bias [V]
Large 𝑊
th
Small 𝑊
th
MEP with 10% activity ≅ On-chip memory MEP with 100% activity ≅ Logic circuits Performance contour
SLIDE 6 Outline
- Background
- Individual Voltage Scaling Problem
- Silicon Measurement
- Conclusion
6
SLIDE 7 (Existing) Uniform Voltage Scaling Problem
7 min 𝐹
𝐸 ≤ 𝐸0
𝑊
DD
𝑊
th
𝑊
DD, 𝑊 th ∈ ℝ
MEP curve
- Existing approach: Runtime MEP tracking [5]
Solution
𝐸 = 𝐸0
𝑊
DD
𝑊
th
Initial point Finish Performance contour for 𝐸 = 𝐸0
Enables to track MEPs at runtime even if
Energy & delay monitoring (MEP check)
Requires only simple circuits dynamically change target performance temperature activity Tunes 𝑊
DD and 𝑊 th iteratively
Circuit energy Circuit delay Target performance
SLIDE 8 Individual Voltage Scaling Problem
8 This work: Heuristic algorithm for runtime voltage scaling min 𝐹L + 𝐹M
𝐸L + 𝐸M ≤ 𝐸0 𝑊
DD,L, 𝑊 th,L, 𝑊 DD,M, 𝑊 th,M ∈ ℝ
Logic Memory
𝐸L 𝐸M Constraint 𝐸0
𝑊
DD,L
𝑊
th,L
𝑊
DD,M 𝑊 th,M
L No runtime algorithms due to complex delay assignment between 𝐸L and 𝐸M
Delay Delay Power Power
𝐸0 𝐸0
Voltage scaling in logic Voltage boost in mem.
Logic Memory
Huge energy saving
SLIDE 9
Various Strategies in Uniform Voltage Scaling
9 𝑊
DD
𝑊
th
Logic MEP (𝐹L min.) Memory MEP (𝐹M min.) Processor MEP (𝐹L + 𝐹M min.)
𝐹L optimized, but 𝐹M NOT optimized 𝐹L, 𝐹M balanced ⇒ Solution in uniform voltage scaling
Delay contour (𝐸L + 𝐸M = 𝐸0)
SLIDE 10
Logic MEP (𝐹L min.) Memory MEP (𝐹M min.) Processor MEP (𝐹L + 𝐹M min.)
Concept of the Proposed Heuristic Algorithm
10
Delay contour (𝐸L + 𝐸M = 𝐸0)
𝑊
DD
𝑊
th
Logic voltages (𝑊
DD,L, 𝑊 th,L)
Memory voltages (𝑊
DD,M, 𝑊 th,M)
Enable local minimum energy point operation Point: 𝐸L and 𝐸M are constant over the delay contour ( )
SLIDE 11 Simple Heuristic Algorithm for Individual Voltage Scaling
11 𝑊
DD,L = 𝑊 DD,M
𝑊
th,L = 𝑊 th,M
Logic MEP
Step 1
- 1. Uniform voltage tuning in Logic & Mem.
(i.e., 𝑊
DD,L = 𝑊 DD,M & 𝑊 th,L = 𝑊 th,M)
Enables to apply existing techniques
𝑊
DD,L ≠ 𝑊 DD,M
𝑊
th,L ≠ 𝑊 th,M
- 1. Tune only mem. voltages (𝑊
DD,M & 𝑊 th,M)
Enable runtime energy optimization Step 2 Local minimum energy point operation
Tune only mem. voltages
Delay contour 𝐸L + 𝐸M = 𝐸0
Fix logic voltages
Logic Energy Optimization Memory Energy Optimization
SLIDE 12 Outline
- Background
- Individual Voltage Scaling Problem
- Silicon Measurement
- Conclusion
12
SLIDE 13 Case Study: 32-bit RISC Processor
13
- On-chip memory
- 4 kB I-Cache + TAG
- 8 kB I-SPM
- 16 kB D-SPM
- Renesas SOTB 65-nm
I/O
Logic (𝑊
DD,L)
Main memory (DCT loop)
DD,M)
Body bias 𝑊
BP,L
𝑊
BN,M 𝑊 BP,M
- Individual in logic and mem.
- Body bias for nMOSFETs in
logic circuits is fixed at GND
- No level converters between
logic and memory Target Supply voltage & body bias Standard-cell based memory
SLIDE 14 Activity Factor Dependency of Memory MEPs (𝑊
DD,L = 𝑊 DD,M & 𝑊 BB,L = 𝑊 BB,M)
14
1.2 1.0 0.8 0.6 0.4
MEPs move to the upper right as activity 𝛽M decreases
Small 𝑊
th
Large 𝑊
th
Logic MEP 𝜷𝐍 = 𝟐 𝜷𝐍 = 𝟏. 𝟐 𝜷𝐍 = 𝟏. 𝟏𝟐 𝛽M: Memory activity factor
1 Activate in each clock cycle Activate once in 10 clock cycles
Supply Voltage [V] Body Bias [V]
Fmax contour of the fabricated processor [MHz] Activate once in 100 clock cycles 0.1 0.01
SLIDE 15 Measurement Results of the Proposed Algorithm (𝛽M = 0.01)
15
1.2 1.0 0.8 0.6 0.4
Logic MEP Mem. MEP
Step 1
- 1. Uniform voltage scaling
- 2. Find logic MEP ( )
Step 2
- 1. Fix logic voltages @
- 2. Tune only mem. voltage &
find mem. MEP ( )
Individual voltage tuning achieved by the proposed algorithm
Small 𝑊
th
Large 𝑊
th
Supply Voltage [V] Body Bias [V]
Fmax contour of the fabricated processor [MHz]
SLIDE 16
Energy Reduction by Individual Voltage Scaling (𝛽M = 0.01)
16
4 MHz 8 MHz 20 MHz 29 MHz Total Energy Consumption [pJ / cycle] 20 40 60 80 100
Logic dynamic energy Logic static energy Memory dynamic energy Memory static energy
−15% −16% −13% −10% Fmax Up to 16% energy reduction by individual voltage scaling
SLIDE 17 Conclusion & Future Work
17
- Individual voltage scaling problem in logic and memory presented
Conclusion
- A heuristic algorithm proposed for runtime energy optimization
- Case study using RSIC processors in 65-nm process
- Up to 16% energy reduction compared with uniform voltage scaling
Future work
- Energy overhead compared with the global solution
- Energy overhead introduced by fine-grained voltage tuning, etc…
- Key: Activity factor gap between logic and memory circuits
SLIDE 18
18
SLIDE 19
Energy Reduction by Individual Voltage Scaling (𝛽M = 0.1)
19 Fmax
4 MHz 8 MHz 20 MHz 29 MHz 20 40 60 80 100
−11% −9% −7% −5%
No energy improvement when 𝛽M = 1
Total Energy Consumption [pJ / cycle]
Logic dynamic energy Logic static energy Memory dynamic energy Memory static energy
SLIDE 20 Definition of 𝛽M
20
- No clock gating circuits
- Dynamic energy consumption @ each clock cycle
Implemented on-chip memory has large activity factor
On-chip memory property
- Parameter 𝛽M implemented to scale activity factor
Measured memory energy Leakage energy × 𝛽M
Measured value
Dynamic energy Static energy
Evaluated value
SLIDE 21 System-Level Optimization Problem
21
CPU (≃ Memory) DSP (≃ Logic)
The problem can be abstracted to system-level optimization
Low activity High activity Time Time CPU execution time (≃ 𝐸M) DSP execution time (≃ 𝐸L) Deadline (≃ 𝐸0)
Future work: Applying the heuristic to system-level optimization
𝑊
DD, 𝑊 th
𝑊
DD, 𝑊 th