Chapter 17: Low-Power Design Keshab K. Parhi and Viktor Owall

Chapter 17 Speed IC Design Space Area S p e e d Complexity Design Space New Power 2

VLSI Digital Signal Processing Systems • Technology trends: – 200-300M chips by 2010 (0.07 micron CMOS) • Challenges: – Low-power DSP algorithms and architectures – Low-power dedicated / programmable systems – Multimedia & wireless system-driven architectures – Convergence of Voice, Video and Data – LAN, MAN, WAN, PAN – Telephone Lines, Cables, Fiber, Wireless – Standards and Interoperability Chapter 17 3

Power Consumption in DSP • Low performance portable applications: – Cellular phones, personal digital assistants – Reasonable battery lifetime, low weight • High performance portable systems: – Laptops, notebook computers • Non-portable systems: – Workstations, communication systems – DEC alpha: 1 GHz, 120 Watts – Packaging costs, system reliability Chapter 17 4

Power Dissipation Two measures are important • Peak power (Sets dimensions) = × P V i peak DD DD max • Average power (Battery and cooling) T V av = DD P i (t) dt DD T 0 Chapter 17 5

CMOS Power Consumption = + + = P P P P tot dyn sc leakage = + + 2 α f C V V I I V L DD DD sc leakage DD = α probabilit y for switching Chapter 17 6

Dynamic Power Consumption V DD Energy charged in a capacitor Charge E C = CV 2 /2 = C L V DD2 /2 Energy E c is also discharged, i.e. E tot = C L V DD2 Discharge Power consumption P = C L V DD2 f Chapter 17 7

Off-Chip Connections have High Capacitive Load Reduced off Chip Data Transfers by System Integration Ideally a Single Chip Solution Reduced Power Consumption Chapter 17 8

Switching Activity ( α ): Example P a =0.5 P x =0.25 P b =0.5 7 = = P 0.4375 z 16 P c =0.5 P y =0.25 P d =0.5 P a =0.5 P x =0.25 3 = = P 0.375 P b =P c =0.5 z 8 P y =0.25 P d =0.5 Due to correlation Chapter 17 9

Increased Switching Activity due to Glitching a x b=0 z c Delay in gate a Extra transition c due to race x Dissipates energy z Chapter 17 10

Clock Gating and Power Down CL Control K Module circuitry is A needed for Enable clock gating A Module and power B Enable down B Module and C Enable Needs wake-up C Only active modules should be clocked! Chapter 17 11

Carry Ripple 0 0 0 0 Add i Add i+1 Add i+2 Add i+3 C i+1 C i+2 C i+3 C i+4 S i S i+1 S i+2 S i+3 Transitions due to carry propagation Chapter 17 12

Balancing A B Operations C Example: Addition D E A B C D E F G H F G H S S Chapter 17 13

Delay as function of Supply Chapter 17 14

Delay as function of Threshold Chapter 17 15

Dual V T Technology Reduced V DD α α α α Increased delay Low V T α α α α Faster but Increased Leakage Low V T in critical path Chapter 17 16

High V T stand-by V DD standby High V T α α α low leakage α Low V T Low leakage in stand by when CL Fast high V T tansistors turned off high leakage High V T α α low leakage α α standby Chapter 17 17

Low Power Gate Resizing • Systematic capture and elimination of slack using fictitious entities called Unit Delay Fictitious Buffers. • Replace unnecessary fast gates by slower lower power gates from an underlying gate library . • Use a simple relation between a gate’s speed and power and the UDF’s in its fanout nets. Model the problem as an efficiently solvable ILP similar to retiming. • In Proceedings of ARVLSI’99 Georgia Tech. 7 7 1 1 0 4 1 3 6 3 3 3 0 -3 3 3 1 1 UDF 3 0 -3 3 3 Displacement Variables Critical Path = 8, UDF’s in Boxes Critical Path = 8, UDF’s in Boxes Chapter 17 18

Dual Supply Voltages for Low Power • Components on the Critical Path exhibit no slack but components off the critical path exhibit excessive slack. • A high supply voltage VDDH for critical path components and a low supply voltage VDDL for non critical path components. • Throughput is maintained and power consumption is lowered. V. Sundararajan and K.K. Parhi, "Synthesis of Low Power CMOS VLSI Circuits using Dual Supply Voltages", Prof. of ACM\/IEEE Design Automation Conference, pp. 72-75, New Orleans, June 1999 Chapter 17 19

Dual Supply Voltages for Low Power • Systematic capture and elimination of slack using fictitious entities called Unit Delay Fictitious Buffers. • Switch unnecessarily fast gates to to lower supply voltage VDDL thereby saving power, critical path gates have a high supply voltage of VDDH . • Use a simple relation between a gate’s speed/power and supply voltage with the UDF’s in its fanout nets. Model the problem as an approximately solvable ILP . Critical Path = 8, UDF’s in Boxes LC = Level Converter Critical Path = 8, UDF’s in Boxes 7 7 1 1 VDDH VDDH 0 4 1 3 3 3 3 VDDH 0 -3 3 3 1 1 VDDL UDF 3 0 -3 3 3 VDDH Displacement VDDH Variables Chapter 17 20

Dual Threshold CMOS VLSI for Low Power • Systematic capture and elimination of slack using fictitious entities called Unit Delay Fictitious Buffers. • Gates on the critical path have a low threshold voltage VTL and unnecessarily fast gates are switched to a high threshold voltage VTH. • Use a simple relation between a gate’s speed /power and threshold voltage with the UDF’s in its fanout nets. Model the problem as an efficiently approximable 0-1 ILP. Critical Path = 8, UDF’s in Boxes 7 7 1 1 VTL VTL 0 4 1 3 3 3 3 VTL 0 -3 3 3 1 1 VTH UDF 3 0 -3 3 3 VTL Displacement VTL Variables Chapter 17 21 Critical Path = 8, UDF’s in Boxes

Experimental Results • Table :ISCAS’85 Benchmark Ckts Resizing (20 Sizes) Dual VDD Dual Vt (5v, 2.4v) Power Power Ckt #Gates Power CPU(s) CPU(s) Savings Savings Savings C1908 880 15.27% 87.5 49.5% 739.05 84.92% c2670 1211 28.91% 164.38 57.6% 1229.37 90.25% c3540 1705 37.11% 312.51 57.7% 1743.75 83.36% c5315 2351 41.91% 660.56 62.4% 4243.63 91.56% c6288 2416 5.57% 69.58 62.7% 7736.05 61.75% c7552 3624 54.05% 1256.76 59.6% 9475.1 90.90% V. Sundararajan and K.K. Parhi, "Low Power Synthesis of Dual Threshold Voltage CMOS VLSI Circuits” Proc. of 1999 IEEE Int. Symp. on Low-Power Electronics and Design, pp. 139-144, San Diego, Aug. 1999 Chapter 17 22

HEAT: Hierarchical Energy Analysis Tool • Salient features: – Based on stochastic techniques – Transistor-level analysis – Effectively models glitching activity – Reasonably fast due to its hierarchical nature Chapter 17 23

Theoretical Background NS • Signal probability: ( ) x j i = = 1 1 lim j p – S=T / T ,where x → ∞ i NS clk gd N 0 = − 1 T :clock period 1 p p clk x x i i T : smallest gate delay gd • Transition probability: NS ( ) + ( 1 ) x j x j i i = → 1 1 0 = lim j p x → ∞ NS i N → → → → + + + = 1 0 1 1 0 1 0 0 1 p p p p x x x x i i i i → 0 1 p • Conditional probability: = 1 / 0 x p i → → + 0 1 0 0 x i p p x x i i Chapter 17 24

State Transition Diagram Modeling + = − + ⋅ ⋅ ( 1 ) ( 1 ( )) ( ) ( ) ( ) Node n x n x n x n node n 2 1 1 2 2 + = − + ⋅ ⋅ ( 1 ) ( 1 ( )) ( ) ( ) ( ) node n x n x n x n node n 2 1 1 2 2 + = − + − ( 1 ) ( 1 ( )) ( 1 ( )) node n x n x n 3 1 2 Chapter 17 25

The HEAT algorithm • Partitioning of systems unit into smaller sub-units • State transition diagram modeling • Edge energy computation (HSPICE) • Computation of steady-state probabilities (MATLAB) • Edge activity computation • Computation of average energy Energy = ⋅ EA j W j j Chapter 17 26

Performance Comparison Power Run-time 9000 8000 45000 40000 7000 35000 6000 30000 5000 uW 25000 4000 sec 20000 SPICE 3000 15000 HEAT 2000 10000 1000 5000 0 0 BW4 HY4 BW8 HY8 BW4 HY4 BW8 HY8 circuit circuit J. Satyanarayana and K.K. Parhi, "Power Estimation of Digital Datapaths using HEAT Tool", IEEE Design and Test Magazine, 17(2), pp. 101-110, April-June 2000 Chapter 17 27

Finite field arithmetic -- Addition and Multiplication A = a m − 1 α m − 1 + ... + a 1 α + a 0 m − 1 α m − 1 + ... + b B = b 1 α + b 0 ( ) ( ) α + a 0 + b ( ) A + B = a m − 1 + b α m − 1 + ... + a 1 + b m − 1 1 0 ( ) b m − 1 α m − 1 + ... + b ( ) mod p ( x ) ( ) A ⋅ B = a m − 1 α m − 1 + ... + a 1 α + a 0 1 α + b 0 Polynomial addition over GF(2) one’s complement operation --> XOR gates Polynomial multiplication and modulo operation (modulo primitive polynomial p(x) ) Chapter 17 28

Programmable finite field multiplier Parallel Digit-serial Array-type MAC2 + DEGRED2 MAC2 Four MAC2 Instr . DEGRED2 DEGRED2 Chapter 17 29

Chapter 17: Low-Power Design Keshab K. Parhi and Viktor Owall - PowerPoint PPT Presentation

Chapter 17: Low-Power Design Keshab K. Parhi and Viktor Owall Chapter 17 Speed IC Design Space Area S p e e d Complexity Design Space New Power 2 VLSI Digital Signal Processing Systems Technology trends: 200-300M chips by

Chapter 18: Programmable DSPs Keshab K. Parhi and Viktor Owall DSP Applications DSP applications

VLSI Digital Signal Processing Systems Keshab K. Parhi VLSI Digital Signal Processing Systems

Chapter 3: Pipelining and Parallel Processing Keshab K. Parhi Outline Introduction

Chapter 8: Fast Convolution Keshab K. Parhi Chapter 8 Fast Convolution Introduction

Chapter 7: Systolic Architecture Design Keshab K. Parhi Syst olic ar chit ect ur es ar e

Chapter 11: Scaling and Round-off Noise Keshab K. Parhi Outline Introduction Scaling

Chapter 10: Pipelined and Parallel Recursive and Adaptive Filters Keshab K. Parhi Outline

Chapter 9: Algorithmic Strength Reduction in Filters and Transforms Keshab K. Parhi Outline

Chapter 6: Folding Keshab K. Parhi Folding is a t echnique t o reduce t he silicon area by t

Chapter 4: Retiming Keshab K. Parhi Ret iming : Moving around exist ing delays Does not

Chapter 5: Unfolding Keshab K. Parhi Unf olding P arallel P rocessing 2-unfolded (1)

Chapt er 15: Numer ical St r engt h Reduct ion Keshab K. Parhi Sub-expression eliminat ion

Keshab K. Parhi: Patents, Books, Journal and Conference Publications, and Book Chapters Summary :

Chapt er 14: Redundant Arit hmet ic Keshab K. Parhi A non-redundant radix-r number has

Chapt er 13: Bit Level Arit hmet ic Archit ect ures Keshab K. Parhi A W-bit f ixed point

Design Methodologies Power Consumption Power Consumption Area Viktor wall Viktor

Speech Processing 11-492/18-492 Spoken Dialog Systems Case-study: Personal Digital Assistants

SoC: Security-on-chip ! MPSoC (July 2005) Srivaths Ravi NEC Laboratories America Princeton, NJ

Readability: a one-hundred-year-old field still in his teens Thomas Franois CENTAL (IL&C),

Computer Chinese Chess Tsan-sheng Hsu tshsu@iis.sinica.edu.tw

Motivation Distributed computing, WWW, ubiquity Need interoperability Open systems

Data Management CS 4720 Mobile Application Development CS 4720 Desktop Applications

Anatomy of contemporary GSM cellphone hardware Harald Welte < laforge@gnumonks.org > April

CS 525W: Mobile Ubiquitous Computing and Wireless Networking Emmanuel Agu A Little about me

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us