Individual Voltage Scaling in Logic and Memory Circuits towards - - PowerPoint PPT Presentation

individual voltage scaling in logic and memory circuits
SMART_READER_LITE
LIVE PREVIEW

Individual Voltage Scaling in Logic and Memory Circuits towards - - PowerPoint PPT Presentation

Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera Graduate School of Informatics, Kyoto University, Japan 1 Energy Reduction by Dynamic


slide-1
SLIDE 1

Individual Voltage Scaling in Logic and Memory Circuits towards Runtime Energy Optimization in Processors Jun Shiomi, Tohru Ishihara, Hidetoshi Onodera

1

Graduate School of Informatics, Kyoto University, Japan

slide-2
SLIDE 2

Energy Reduction by Dynamic Voltage Scaling

2

𝑊

DD- and 𝑊 th-tuning technique for energy minimization Supply voltage (𝑊

DD)

Threshold voltage (𝑊

th)

Energy Energy Delay Delay

Static energy Supply voltage tuning (𝑊

DD)

Threshold voltage tuning (𝑊

th)

DVFS: Dynamic Voltage and Frequency Scaling ABB: Adaptive Body Biasing

Dynamic energy

slide-3
SLIDE 3

Minimum Energy Point Tracking (MEP Tracking)

3 Energy minimization by voltage scaling under a given frequency

MEPT example: Renesas SOTB 65-nm Cell-based memory

Target: MEP tracking technique for processors

Minimum Energy Point: MEP (Best combination of 𝑊

DD and 𝑊 th)

  • 0.5
  • 1.0
  • 1.5
  • 2.0

1.2 1.0 0.8 0.6 0.4

Supply Voltage [V] Body Bias [V]

Large 𝑊

th

140 pJ 90 pJ Small 𝑊

th

Performance contour

slide-4
SLIDE 4

Activity Factor Dependency of MEP Curves (Activity 𝟐𝟏𝟏% → 𝟐𝟏%)

4  Activity factor: Important parameter determining MEPs Issue: MEPs heavily depend on activity factors (toggle rates)

1.2 1.0 0.8 0.6 0.4

  • 0.5
  • 1.0
  • 1.5
  • 2.0

Supply Voltage [V] Body Bias [V]

Large 𝑊

th

Small 𝑊

th

Performance contour 12 pJ Optimized 20 pJ Unoptimized

slide-5
SLIDE 5

Overview of This Work

5  Individual voltage scaling problem in logic and memory circuits  Heuristic algorithm for runtime optimization

1.2 1.0 0.8 0.6 0.4

  • 0.5
  • 1.0
  • 1.5
  • 2.0

Supply Voltage [V] Body Bias [V]

Large 𝑊

th

Small 𝑊

th

MEP with 10% activity ≅ On-chip memory MEP with 100% activity ≅ Logic circuits Performance contour

slide-6
SLIDE 6

Outline

  • Background
  • Individual Voltage Scaling Problem
  • Silicon Measurement
  • Conclusion

6

slide-7
SLIDE 7

(Existing) Uniform Voltage Scaling Problem

7 min 𝐹

  • s. t.

𝐸 ≤ 𝐸0

𝑊

DD

𝑊

th

𝑊

DD, 𝑊 th ∈ ℝ

MEP curve

  • Existing approach: Runtime MEP tracking [5]

Solution

𝐸 = 𝐸0

𝑊

DD

𝑊

th

Initial point Finish Performance contour for 𝐸 = 𝐸0

 Enables to track MEPs at runtime even if

Energy & delay monitoring (MEP check)

 Requires only simple circuits dynamically change target performance temperature activity  Tunes 𝑊

DD and 𝑊 th iteratively

Circuit energy Circuit delay Target performance

slide-8
SLIDE 8

Individual Voltage Scaling Problem

8 This work: Heuristic algorithm for runtime voltage scaling min 𝐹L + 𝐹M

  • s. t.

𝐸L + 𝐸M ≤ 𝐸0 𝑊

DD,L, 𝑊 th,L, 𝑊 DD,M, 𝑊 th,M ∈ ℝ

Logic Memory

𝐸L 𝐸M Constraint 𝐸0

𝑊

DD,L

𝑊

th,L

𝑊

DD,M 𝑊 th,M

L No runtime algorithms due to complex delay assignment between 𝐸L and 𝐸M

Delay Delay Power Power

𝐸0 𝐸0

Voltage scaling in logic Voltage boost in mem.

Logic Memory

Huge energy saving

slide-9
SLIDE 9

Various Strategies in Uniform Voltage Scaling

9 𝑊

DD

𝑊

th

Logic MEP (𝐹L min.) Memory MEP (𝐹M min.) Processor MEP (𝐹L + 𝐹M min.)

𝐹L optimized, but 𝐹M NOT optimized 𝐹L, 𝐹M balanced ⇒ Solution in uniform voltage scaling

Delay contour (𝐸L + 𝐸M = 𝐸0)

slide-10
SLIDE 10

Logic MEP (𝐹L min.) Memory MEP (𝐹M min.) Processor MEP (𝐹L + 𝐹M min.)

Concept of the Proposed Heuristic Algorithm

10

Delay contour (𝐸L + 𝐸M = 𝐸0)

𝑊

DD

𝑊

th

Logic voltages (𝑊

DD,L, 𝑊 th,L)

Memory voltages (𝑊

DD,M, 𝑊 th,M)

 Enable local minimum energy point operation Point: 𝐸L and 𝐸M are constant over the delay contour ( )

slide-11
SLIDE 11

Simple Heuristic Algorithm for Individual Voltage Scaling

11 𝑊

DD,L = 𝑊 DD,M

𝑊

th,L = 𝑊 th,M

Logic MEP

  • Mem. MEP

Step 1

  • 1. Uniform voltage tuning in Logic & Mem.

(i.e., 𝑊

DD,L = 𝑊 DD,M & 𝑊 th,L = 𝑊 th,M)

Enables to apply existing techniques

  • 2. Find logic MEP ( )

𝑊

DD,L ≠ 𝑊 DD,M

𝑊

th,L ≠ 𝑊 th,M

  • 1. Tune only mem. voltages (𝑊

DD,M & 𝑊 th,M)

  • 2. Find memory MEP ( )

 Enable runtime energy optimization Step 2  Local minimum energy point operation

  • Init. point

Tune only mem. voltages

Delay contour 𝐸L + 𝐸M = 𝐸0

Fix logic voltages

Logic Energy Optimization Memory Energy Optimization

slide-12
SLIDE 12

Outline

  • Background
  • Individual Voltage Scaling Problem
  • Silicon Measurement
  • Conclusion

12

slide-13
SLIDE 13

Case Study: 32-bit RISC Processor

13

  • On-chip memory
  • 4 kB I-Cache + TAG
  • 8 kB I-SPM
  • 16 kB D-SPM
  • Renesas SOTB 65-nm

I/O

Logic (𝑊

DD,L)

Main memory (DCT loop)

  • Mem. (𝑊

DD,M)

Body bias 𝑊

BP,L

𝑊

BN,M 𝑊 BP,M

  • Individual in logic and mem.
  • Body bias for nMOSFETs in

logic circuits is fixed at GND

  • No level converters between

logic and memory Target Supply voltage & body bias  Standard-cell based memory

slide-14
SLIDE 14

Activity Factor Dependency of Memory MEPs (𝑊

DD,L = 𝑊 DD,M & 𝑊 BB,L = 𝑊 BB,M)

14

1.2 1.0 0.8 0.6 0.4

  • 0.5
  • 1.0
  • 1.5
  • 2.0

 MEPs move to the upper right as activity 𝛽M decreases

Small 𝑊

th

Large 𝑊

th

Logic MEP 𝜷𝐍 = 𝟐 𝜷𝐍 = 𝟏. 𝟐 𝜷𝐍 = 𝟏. 𝟏𝟐 𝛽M: Memory activity factor

1 Activate in each clock cycle Activate once in 10 clock cycles

Supply Voltage [V] Body Bias [V]

Fmax contour of the fabricated processor [MHz] Activate once in 100 clock cycles 0.1 0.01

slide-15
SLIDE 15

Measurement Results of the Proposed Algorithm (𝛽M = 0.01)

15

1.2 1.0 0.8 0.6 0.4

  • 0.5
  • 1.0
  • 1.5
  • 2.0

Logic MEP Mem. MEP

Step 1

  • 1. Uniform voltage scaling
  • 2. Find logic MEP ( )

Step 2

  • 1. Fix logic voltages @
  • 2. Tune only mem. voltage &

find mem. MEP ( )

 Individual voltage tuning achieved by the proposed algorithm

Small 𝑊

th

Large 𝑊

th

Supply Voltage [V] Body Bias [V]

Fmax contour of the fabricated processor [MHz]

slide-16
SLIDE 16

Energy Reduction by Individual Voltage Scaling (𝛽M = 0.01)

16

4 MHz 8 MHz 20 MHz 29 MHz Total Energy Consumption [pJ / cycle] 20 40 60 80 100

Logic dynamic energy Logic static energy Memory dynamic energy Memory static energy

−15% −16% −13% −10% Fmax  Up to 16% energy reduction by individual voltage scaling

slide-17
SLIDE 17

Conclusion & Future Work

17

  • Individual voltage scaling problem in logic and memory presented

Conclusion

  • A heuristic algorithm proposed for runtime energy optimization
  • Case study using RSIC processors in 65-nm process
  • Up to 16% energy reduction compared with uniform voltage scaling

Future work

  • Energy overhead compared with the global solution
  • Energy overhead introduced by fine-grained voltage tuning, etc…
  • Key: Activity factor gap between logic and memory circuits
slide-18
SLIDE 18

18

slide-19
SLIDE 19

Energy Reduction by Individual Voltage Scaling (𝛽M = 0.1)

19 Fmax

4 MHz 8 MHz 20 MHz 29 MHz 20 40 60 80 100

−11% −9% −7% −5%

 No energy improvement when 𝛽M = 1

Total Energy Consumption [pJ / cycle]

Logic dynamic energy Logic static energy Memory dynamic energy Memory static energy

slide-20
SLIDE 20

Definition of 𝛽M

20

  • No clock gating circuits
  • Dynamic energy consumption @ each clock cycle

Implemented on-chip memory has large activity factor

On-chip memory property

  • Parameter 𝛽M implemented to scale activity factor

Measured memory energy Leakage energy × 𝛽M

Measured value

Dynamic energy Static energy

Evaluated value

slide-21
SLIDE 21

System-Level Optimization Problem

21

CPU (≃ Memory) DSP (≃ Logic)

The problem can be abstracted to system-level optimization

Low activity High activity Time Time CPU execution time (≃ 𝐸M) DSP execution time (≃ 𝐸L) Deadline (≃ 𝐸0)

Future work: Applying the heuristic to system-level optimization

𝑊

DD, 𝑊 th

𝑊

DD, 𝑊 th