Asynchronous Computing in Sense Amplifier-based Pass Transistor Logic Tsung-Te Liu, Louis P. Alarcón, Matthew D. Pierson, and Jan M. Rabaey University of California, Berkeley 1
Outline • Motivation • Sense Amplifier-based Pass Transistor Logic (SAPTL) • Self-timed SAPTL – Bundled-data self-timed SAPTL design – Dual-rail self-timed SAPTL design • Simulation results – Area – Energy-delay – Leakage • Conclusion 2
Focus: Ultra Low Voltage Design Supply voltage vs. CMOS technology node 4.0 3.5 Supply Voltage V DD (V) 3.0 High Performance Voltage scaling: 2.5 • Most effective technique for 2.0 power reduction • But at low supply voltage Low Power 1.5 – Leakage dominates 1.0 Goal of Ultra Low – Delay is large and variable Voltage Design 0.5 < 300mV 0.0 0 0.1 0.2 0.3 0.4 0.5 Channel Length L ( m) [ITRS] 3
Issue-1: Leakage dominates at low supply voltage Energy vs. Supply voltage 10 4 • 203-stage NAND2 ring oscillator Energy [fJ] 10 3 • 90-nm CMOS standard cell 10 2 10 1 0.4 0 0.2 0.6 0.8 1 Supply Voltage [V] [JOLPE] • Leakage dominates the power and energy at low V DD • Leakage determines standby power for portable devices 4
Issue-2: Delay is large & variable at low supply voltage Delay vs. Supply Voltage Delay spread due to transistor mismatches -10% ~ +15% Count -5 1x10 -6 10 Normalized Delay Delay [S] Delay spread due to process variations -7 10 -60% ~ +100% Count -8 10 Normalized Delay 0.0 0.2 0.4 0.6 0.8 1.0 • Supply = 300mV Supply voltage [V] • Delay increases exponentially at low V DD • Large delay spread in actual technology environment 5
Outline • Motivation • Sense Amplifier-based Pass Transistor Logic (SAPTL) • Self-timed SAPTL – Bundled-data self-timed SAPTL design – Dual-rail self-timed SAPTL design • Simulation results – Area – Energy-delay – Leakage • Conclusion 6
Solution: Use pass transistor logic! Pass Transistor Network drivers receivers • No VDD and GND connections in the logic path – Nearly zero leakage through the logic path – Can keep scaling down the threshold voltage [JOLPE] 7
SAPTL: Sense Amplifier-based Pass Transistor Logic Pass Transistor network Data inputs Sense Stack Data outputs Amplifier Timing control CK Sense amplifier to recover Leakage path only exists in the voltage swing and reduce delay root node driver and sense amplifier [JOLPE] 8
SAPTL Computation: to contain leakage while reducing threshold Stack Sense Amplifier P 0 B S A S B CK CK Out Out Root B CK CK Input A to to S S B sense next CK amp SAPTL S S • Outputs pre-charge to V DD to reset • Current steering • Works for very low I on to I off ratio • Latch retains the computation results • Could be programmable • Low voltage operation down to 300mV [JOLPE] 9
SAPTL Control: to deal with delay variation & uncertainty Data inputs Sense Stack Data outputs Amplifier Timing control Fact: • Computation delay varies in actual technology environment – I on to I off current ratio varies and changes the stack delay – Offset variations in the sense amplifier reduce the stack output margins • Energy efficiency is function of timing accuracy – Turn off the sense amplifier when inactive • Synchronous timing is too conservative – Unable to track local delay variations – Additional skew and jitter in the clock distribution Solution: Use Self-timed control! – Exploit local timing information – Low implementation cost in SAPTL 10
Outline • Motivation • Sense Amplifier-based Pass Transistor Logic (SAPTL) • Self-timed SAPTL – Bundled-data self-timed SAPTL design – Dual-rail self-timed SAPTL design • Simulation results – Area – Energy-delay – Leakage • Conclusion 11
Data Evaluation in Bundled-Data Self-Timed SAPTL 1. Din 4. Enable H 2. Reqin 5. Dout L Data inputs 3. Sout 6. Ackout & Reqout RTA1: Din and Din Din < Reqin Dout Sout Driver Sense H Root Reqin Stack L Sout Amplifier Dout RTA2: Reqout Enable Timing control Sout < Enable Delay C Ackin Ackout * RTA : Relative Timing Assumption 12
Data Reset in Bundled-Data Self-Timed SAPTL 1. Din Data inputs H 2. Sout unwanted GLITCH! L Din and Din Sout Driver Dout Sense Root Stack Reqin Amplifier Sout Logical low Dout Reqout Enable Delay C Ackin Ackout Issue: Opposite stack output charges up if Reqin is still low • Degrades the reliability of the self-timed operation • Increases the energy dissipation 13
Solution to Avoid Data Reset Glitch Use different handshake protocols Use Din < Reqin for data evaluation • Use Reqin < Din for data reset • Implementation-1: Late reset • Increase data delay: delay data input signals • Easy to implement • But...retards the data reset operation Implementation-2: Early reset • Decrease control delay: trigger driver before data inputs reset • Maintains the original data reset latency and throughput • But…requires an additional relative timing assumption Our choice! ...which is easy to meet 14
Glitch-Free Bundled-Data Self-Timed SAPTL Data inputs, Din and Din Driver Dout Sense Reqin Stack T Data Amplifier Dout T Control Reqout Delay C Ackin Ackout • Replace the global clock signal with local handshake logic • Delay line controls the local timing • Still have a delay-matching issue (T Control > T Data ) 15
Outline • Motivation • Sense Amplifier-based Pass Transistor Logic (SAPTL) • Self-timed SAPTL – Bundled-data self-timed SAPTL design – Dual-rail self-timed SAPTL design • Simulation results – Area – Energy-delay – Leakage • Conclusion 16
Dual-Rail Self-Timed SAPTL Design Ackin Data inputs, C Din and Din Driver Dout Sense Reqin Stack Amplifier Dout Reqout Delay C Ackin Ackout Use stack outputs as trigger signals for sense amplifier 17
Dual-Rail Self-Timed SAPTL Data inputs, Din and Din C Driver Dout Reqin Stack Ackin C Dout Reqout Ackout • No delay matching issue anymore • C-element combines several functions • Protocol control • Signal recovery • Hardware complexity similar to the original synchronous SAPTL design 18
Outline • Motivation • Sense Amplifier-based Pass Transistor Logic (SAPTL) • Self-timed SAPTL – Bundled-data self-timed SAPTL design – Dual-rail self-timed SAPTL design • Simulation results – Area – Energy-delay – Leakage • Conclusion 19
SAPTL5 Area Comparison in 90-nm CMOS Delay Stack Stack Line Driver Driver C-Element Sense Amplifier Sense Amplifier AND Synchronous SAPTL: Bundled-data SAPTL: 20 m x 18 m 25 m x 19 m (~ 32% larger area) * Dual-rail SAPTL: 20 m x 21 m (~ 17% larger than synchronous SAPTL) 20
Energy-Delay Simulations • Synchronous & bundled-data 100K SAPTL5 Energy-Delay Plot must slow down to account for VDD = 300mV process variations VDD = 1V Delay [FO4 (@V DD = 1V)] • Early reset of self-timed SAPTL 10K reduces latency and energy Synchronous Bundled-Data 1K Dual-Rail 100 00 Note: Synchronous SAPTL 100 FO4 plot excludes the clock 10 distribution energy 10 1 10 100 1 10 100 Energy [fJ] 21
Leakage Simulations 100 00 Note: Synchronous SAPTL Bundled-Data plot excludes the clock Leakage Current [nA] distribution leakage Dual-Rail • Dual-rail design requires larger 10 10 C-elements and thus suffers Synchronous higher leakage • Leakage difference diminishes with lower supply voltages SAPTL5 Leakage Current Plot VDD = 300mV VDD = 1V 1 1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.2 0.4 0.6 0.8 1 Supply Voltage [V] 22
Conclusion • SAPTL is a promising candidate for low energy computation at low supply voltage • Self-timed operation improves reliability and energy-delay performance without increasing hardware complexity • Dual-rail self-timed SAPTL achieves better energy and speed performance in technologies with process variations 23
Acknowledgement • Marly Roncken and Ivan Sutherland • Berkeley Wireless Research Center • NSF Infrastructure Grant • STMicroelectronics • Gigascale Systems Research Center Thank you! 24
Recommend
More recommend