WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM - PowerPoint PPT Presentation

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan Gomathisankaran Akhilesh Tyagi Iowa State University Iowa State University gmdev@iastate.edu tyagi@iastate.edu ➀ Introduction ➁ Proposed Circuit Technique ➂ Reducing static energy in On-Chip Caches ➃ Model Validity ➄ Conclusion and Future Work Produced with L A T EX seminar style & PSTricks 1

I NTRODUCTION Expected increase in the static leakage current ➜ Feature Size to reach 22 nm in 2016 ➜ Leakage current to increase by factor of 1K-10K in going from 180 nm to 70 nm Leakage current will play a major role in circuit design ➜ Not only arrays but also high fan-out logic will be affected New design methodologies have to be invented to avoid Red Brick Wall ➜ We propose warmup-CMOS which uses depletion mode transistors I NTRODUCTION 2

S UBTHRESHOLD L EAKAGE IN CMOS Various leakage mechanisms ➜ PN Reverse Bias, Weak Inversion, DIBL, GIDL, Punchthrough Leakage Current q n ′ kT ( V g − V s − V th 0 − γ ′ V s + ηV ds ) � ∗ B I sub = A ∗ exp � (1) 2 W eff � kT e 1 . 8 A = µ 0 C ox q � L eff B = 1 − exp ( − qV ds ) kT S UBTHRESHOLD L EAKAGE IN CMOS 3

S UBTHRESHOLD L EAKAGE IN CMOS Various leakage mechanisms ➜ PN Reverse Bias, Weak Inversion, DIBL, GIDL, Punchthrough Leakage Current q n ′ kT ( V g − V s − V th 0 − γ ′ V s + ηV ds ) � ∗ B I sub = A ∗ exp � (1) 2 W eff � kT e 1 . 8 A = µ 0 C ox q � L eff B = 1 − exp ( − qV ds ) kT S UBTHRESHOLD L EAKAGE IN CMOS 3- A

E ARLIER R ESEARCH Gated-V dd + Interposes a high-V t transistor between the circuit and one of the power supply rails + Reduces the leakage current of a normal transistor to effectively the leakage current of the high-V t control transistor - Contents of the cell are lost - Control algorithm should be smart ABB-MTCMOS + Dynamically raise V t by modulating the back-gate bias voltage, i.e., V t = V t 0 + γ ( √ φ bi + V sb − √ φ bi ) - Higher energy/delay per transition and higher V dd + offsets the leakage power savings E ARLIER R ESEARCH 4

E ARLIER R ESEARCH Gated-V dd + Interposes a high-V t transistor between the circuit and one of the power supply rails + Reduces the leakage current of a normal transistor to effectively the leakage current of the high-V t control transistor - Contents of the cell are lost - Control algorithm should be smart ABB-MTCMOS + Dynamically raise V t by modulating the back-gate bias voltage, i.e., V t = V t 0 + γ ( √ φ bi + V sb − √ φ bi ) - Higher energy/delay per transition and higher V dd + offsets the leakage power savings E ARLIER R ESEARCH 4- A

DVS + In sub-micron processes leakage current increases exponentially with supply voltage + Supply voltage is reduced to an optimum value (knee point of the curve, 1.5*V t ) + Two-fold reduction (both voltage and current) of the leakage power is achieved - Memory cell in standby ( drowsy ) mode cannot be read or written What is Missing? ➜ A comprehensive solution which has low (much less) control overhead and still achieves the maximum possible leakage reduction ➜ Reduction is maximum if the circuit is in standby or low-leakage mode whenever it is not used E ARLIER R ESEARCH 5

DVS + In sub-micron processes leakage current increases exponentially with supply voltage + Supply voltage is reduced to an optimum value (knee point of the curve, 1.5*V t ) + Two-fold reduction (both voltage and current) of the leakage power is achieved - Memory cell in standby ( drowsy ) mode cannot be read or written What is Missing? ➜ A comprehensive solution which has low (much less) control overhead and still achieves the maximum possible leakage reduction ➜ Reduction is maximum if the circuit is in standby or low-leakage mode whenever it is not used E ARLIER R ESEARCH 5- A

O UR P ROPOSED S OLUTION Warm Inverter ➜ Our solution uses Depletion mode devices V dd = 1V ➜ The circuit is warm , i.e, when not accessed Depletion V P W R is less than V dd and V GND is greater ACC V = −0.65V than GND TdepN VPWR ➜ When compared to normal inverter in same V = −0.2V TP technology, warm inverter achieves 377X IN OUT leakage current reduction V = 0.2V TN VGND ACC V = 0.65V TdepP Depletion Steady State Response IN (V) OUT (V) V P W R (V) V GND (V) Ioff ( p A) 0.0 0.949 0.949 0.148 10 1.0 0.052 0.852 0.052 01 O UR P ROPOSED S OLUTION 6

Limitations: ➜ Performance Penalty, as NMOS in the charging path and PMOS in the discharging path ➜ Energy Penalty, Extra Switching Energy = ξ = 0 . 3 ∗ C diff J ➜ Cascading Effect, for a cross coupled inverter we get High = 742 mV , Low = 225 mV , I off = 515 pA (compare with actual I off 6.25 nA ) Performance Impact tpLH ( ps ) tpHL ( ps ) tr ( ps ) tf ( ps ) Base 16.8 10.54 33.63 17.31 New 25.9 16.32 40.72 30.89 %Inc 54.2 54.80 21.10 78.50 O UR P ROPOSED S OLUTION 7

A PPLICATION TO C ACHES Cache Access Timing for a 32KB, 4-way, 1 RW Port, 1 Sub-bank Cache ADDRESS Data Array Delay ( ps ) Tag Array Delay ( ps ) BIT LINES BIT LINES Decoder 208.572 099.410 Wordline 115.975 044.415 WORD WORD LINES ADDRESS DECODER LINES Bitline 011.765 011.898 Senseamp 072.625 044.625 DATA ARRAY TAG ARRAY Compare - 112.912 Mux Driver - 150.077 Sel Inverter - 016.612 Total 408.936 479.949 ➜ L1 cache sizes are typically 32KB - 64KB (Athlon has 128KB) COL MUX COL MUX COL MUX COL MUX ➜ L1 miss rates are on the average 2% SENSE SENSE SENSE SENSE AMP AMP AMP AMP ➜ On-Chip L2 caches are in the range of 256KB (Centrino has 1MB) COMPARATOR MUX/OUTPUT DRIVER ➜ We used CACTI 3.0 to find the cache Hit? DATA access timing Cache architecture of a n -way Set-Associative Cache A PPLICATION TO C ACHES 8

Simulation Setup: Warm SRAM configuration W = W min ➜ A depletion device pair per cell would depN V dd increase the area hence offset the energy WL savings V PWR ➜ The wordline access signal is used to control the depletion devices SRAM SRAM SRAM 1 2 16 ➜ PMOS dep is 4W min , as cache read is in critical path this is justified V GND ➜ Upto 6X increase in bitline delay (data WL array) will have no impact on cache W = 4*W min depP access time BIT BIT ➜ Simulation is performed in HSPICE for a Subarray of size 128X256 Basic SRAM cell V PWR ➜ W L is not affected by addition of 16*C g V t =0.39V V dd V t =0.39V ➜ W L is generated from WL and since it is driving only 64*C g it delay can be made one tenth of W L Gnd V GND WL A PPLICATION TO C ACHES 9

Leakage Reduction: ➜ Leakage power reduction - 23X ➜ V H has moved closer to | V T depN | , because one NMOS dep is shared with 16 SRAM cells ➜ V L has moved closer to V dd − | V T depP | , but not as much as | V H | , because width of PMOS dep has been increased Steady State Response of a WARM SRAM Cell Param Base Warm SRAM IL ( p A) 6250 262 V( BIT ) (V) 1.0 0.686 V( BIT ) (V) 0.0 0.252 A PPLICATION TO C ACHES 10

Analysis of Write Operation: ➜ Transition delay values are as shown in the table ➜ Write operation is not getting affected by the presence of Depletion mode devices ➜ Two reasons, • Faster WL means V GND transits to zero even before the access transistors are turned on • Since bits transit from non-zero initial value to V H , the peak current requirement for the transition is smaller and could be supplied by the single NMOS dep Transient Analysis Parameters and Response Param Value Param Value W L tr and tf 100 ps Base tr 47.0 ps W L tr and tf 10 ps Base tf 22.0 ps W L Pulse Width 200 ps Warm SRAM tr 50.1 ps Vbitpre 0.5 V Warm SRAM tf 00.0 ps A PPLICATION TO C ACHES 11

Analysis of Write Operation (contd.): ➜ Irrespective of bit state changes, V P W R node and one of the output node (OUT H ) needs to be pulled up ➜ Considering the capacitance of V P W R node and OUT H node the extra energy would be 327.9*C diff ➜ For 70 nm device this would be 36 fJ or 0.14 fJ /bit which does not change state ➜ Warm SRAM uses more energy when 70 bits or less undergo state transition ➜ This extra energy (36 fJ ) is insignificant when compared to dynamic energy per access (0.3 nJ ), hence we ignored its impact Write Energy Comparison No of Bits Energy (fJ) Peak Current (mA) Base Warm SRAM Base Warm SRAM 256 320 144 5.53 0.997 192 240 132 4.14 0.930 128 160 118 2.75 0.840 64 80 99 1.36 0.735 A PPLICATION TO C ACHES 12

Analysis of Read Operation: ➜ Tag array access forms the critical path, hence Warm SRAM is used only in Data Array ➜ Since we use Hight-V t access transistors in SRAM cell, access time for precharge voltage of 0.5V closely matches with CACTI’s estimated value ➜ Bitline delay increases by 4.5X for Warm SRAM, which doesn’t increase both cache access time and wave pipelined cycle time ➜ The extra energy estimated in write operation also applies to read ➜ As V P W R node takes finite amount of time to discharge, extra energy depends on the inter-access time A PPLICATION TO C ACHES 13

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM - PowerPoint PPT Presentation

WARM SRAM: A Novel Scheme to Reduce Static Leakage Energy in SRAM Arrays Mahadevan Gomathisankaran Akhilesh Tyagi Iowa State University Iowa State University gmdev@iastate.edu tyagi@iastate.edu Introduction Proposed Circuit Technique

Processor + SRAM By: Jakub Hladik, Tim Lindquist The SRAM SRAM REQUIREMENTS: 256x8bit

COMP 590-154: Computer Architecture Memory / DRAM SRAM vs. DRAM SRAM = Static RAM As

Digital Leakage Today Analog and Digital Leakage LTE interference Kendall Robinson Regional

Hardware Design with VHDL Design Example: SRAM ECE 443 External SRAM A common type of system

Warm Mix Asphalt Warm Mix Asphalt (WMA 101) (WMA 101) What Is Warm Mix Asphalt ? What Is Warm

Encrypted Search: Leakage Attacks Seny Kamara How do we Deal with Leakage? Our definitions

Background Allen Tanner built an SRAM/ROM generator program back in 2004 the ROM seems to

Background w Allen Tanner built an SRAM/ROM generator program back in 2004 n the ROM seems

Background memCellsF09 Allen Tanner built an SRAM/ROM generator program back in 2004 Single-

Carbon leakage: theory, evidence and policy PMR Webinar on Carbon Leakage John Ward November 24

Encrypted Search: Leakage Suppression Seny Kamara How Should we Handle Leakage? Approach #1:

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Scheme Announcements Scheme Scheme is a Dialect of Lisp 4 Scheme is a Dialect of Lisp What

Static and Method Overloading static One per class, not per object static variables

4. Droplet Growth in Warm Clouds In warm clouds, droplets can grow by condensation in a

Main Memory and DRAM Nima Honarmand Spring 2016 :: CSE 502 Computer Architecture SRAM vs.

Tarek Bohsali Microsoft SESSION SUMMARY [PRES ESEN ENTATI TION N TITLE LE] [PRES ESEN

Generating Low-Overhead Dynamic Binary Translators Mathias Payer and Thomas R. Gross Department

Harnessing Harnessing Grid Resources with Grid Resources with Data- -Centric Task Farms

PRESENTATION TO INQUIRY INTO OP BURNHAM AND RELATED MATTERS 4 APRIL 2019 Good afternoon Sir

SALES HISTORY SINCE IPO 35,000,000 2019 19-20 20 in su summa mary 30,000,000 15

Virtual CDN Implementation Eugene E. Otoakhia - eugene.otoakhia@bt.com, BT Peter Willis

District of Columbia Geographic Information System Steering Committee July 18 th , 2012 Matt

Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas N.

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us