statistical analysis and optimization of asynchronous
play

Statistical Analysis and Optimization of Asynchronous Digital - PowerPoint PPT Presentation

Statistical Analysis and Optimization of Asynchronous Digital Circuits Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley 1 Outline Motivation Variability model of CMOS digital circuit Performance model for different


  1. Statistical Analysis and Optimization of Asynchronous Digital Circuits Tsung-Te Liu and Jan M. Rabaey University of California, Berkeley 1

  2. Outline • Motivation • Variability model of CMOS digital circuit • Performance model for different timing schemes • Performance comparison • Conclusion 2

  3. Variability Continues to Increase as Technology and Voltage Scales Down Delay spread due to process variations Device variability vs. Technology node -40% ~ +30% @1V Count Normalized Delay -80% ~ +110% @0.3V Count [Cao, ASU] Normalized Delay • Higher variability with finer design rules and larger wafers • Higher variability with lower supply voltages 3

  4. Circuit Performance Characteristics with Different Timing Schemes Self-timed circuit Probability Conventional Original synchronous circuit circuit A A: protocol circuit delay B B: 3 σ delay variation Computation Delay • Self-timed circuit is a variation-monitoring circuit by itself • Becomes advantageous when the variation is large (B>A) • Statistical analysis framework is necessary 4

  5. Statistical Analysis Framework Circuit Variability Model Performance Model • Supply voltage • Computation overhead • Logic depth • Communication overhead • Width and length • Delay and energy performance • Body bias Determine the optimal timing strategy in the presence of variability Energy Processors Communications Sensors 0 Delay 5

  6. Outline • Motivation • Variability model of CMOS digital circuit • Performance model for different timing schemes • Performance comparison • Conclusion 6

  7. Delay Model of CMOS Digital Circuit 2 10 1 Simulation data Model 0.5 Delay [FO4(@V DD =1V)] 2 ( ) V DD " V th 0 Error [%] I ! # & 1 + V DD " V th � 0.5 % ( E sat L $ ' 1 10 � 1 � 1.5 0.2 0.4 0.6 0.8 1 Supply Voltage [V] I ! exp V DD " V th # & 2 / 2 ) , # & % ( ln 1 + exp V DD " V th S $ ' 0 3 % ( + . $ 2 S ' * - 1 4 0 I ! 10 0.2 0.4 0.6 0.8 1 / 2 ) , # & 5 5 1 + ln 1 + exp V DD " V th Supply Voltage [V] 0 + . 3 % ( 4-stage FO4 INV chain E sat L 5 $ ' 5 * - 1 4 • One unified current model across different operating regions • Model error <2% from 0.3V to 1V 7

  8. Delay Variability Model ! V th V th ! K K 2 K = 2 " % S T d . " % V th = ! T d 2 ! ! V th 2 ! ! K S T d ! T d T d ( V th ) ( K ) $ ' ! T d T d S T d + S T d $ ' = $ ' $ ' µ T d µ µ # & # & Vth K Threshold voltage Geometry 25 20 Simulation data Simulation data Model (WID) Model (DTD) Model (Threshold voltage) Model (Threshold voltage) 20 15 Model (Geometry) Model (Geometry) 15 σ / μ [%] σ / μ [%] 10 10 5 5 0 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Supply Voltage [V] Supply Voltage [V] Die-to-die variation (DTD) Within die variation (WID) “Global variation” “Local mismatch” 8

  9. Delay Variability Model 30 4 Simulation data 2 Model (total) 25 Model (DTD) 0 Model (WID) Error [%] 20 � 2 � / µ [%] � 4 15 � 6 10 � 8 0.2 0.4 0.6 0.8 1 Supply Voltage [V] 5 2 2 ! $ ! $ ! T d , total ! T d , DTD + ! T d , WID 0 # & # & = 0.2 0.4 0.6 0.8 1 # & # & Supply Voltage [V] µ T d , total µ T d , DTD µ T d , WID " % " % • Model error <8% from 0.3V to 1V • Local mismatch dominates at low supply voltages 9

  10. Delay Variability Model with Different Logic Depths 30 15 Simulation data (n=4) n=4 n=8 Model (n=4) n=24 10 Simulation data (n=8) 25 Model (n=8) 5 Error [%] Simulation data (n=24) Model (n=24) 20 0 � / µ [%] � 5 15 � 10 0.2 0.4 0.6 0.8 1 Supply Voltage [V] 10 2 2 ! $ ! $ ! T d , total _ n ! T d , DTD _ 4 &' ! T d , WID _ 4 + 4 ! # $ # & # & = 5 # & # & µ T d , total _ n µ T d , DTD _ 4 " n % µ T d , WID _ 4 0.2 0.4 0.6 0.8 1 " % " % Supply Voltage [V] • Use 4-stage inverter chain model as baseline model • Model error <13% for n=8 and <15% for n=24 10

  11. Outline • Motivation • Variability model of CMOS digital circuit • Performance model for different timing schemes • Performance comparison • Conclusion 11

  12. Delay Overhead Evaluation Dual-rail timing Probability Synchronous timing Original circuit A A: protocol circuit delay B B: 3 σ delay variation Computation Delay • Assumption: Process variation follows Gaussian distribution • Dual-rail approach: have only protocol overhead but no delay overhead • Synchronous approach: have only delay overhead D sync = 3 ! logic , total For 99.7% yield: µ logic , total 12

  13. Bundled-Data Self-Timed Approach Main data path Main data path Replica delay line Probability ( 2 ) f logic ( t ) = N µ logic , ! logic Replica delay line Computation Delay ( ) # 1 Goal: P t logic ! t delay " line 2 ) f delay ! line = N ( µ delay ! line , ! delay ! line Assume main data path and replica delay line exhibit similar statistics: # & 2 2 D bundled ! data = D variation " 0.5 + 0.25 + For 99.7% yield: % ( % ( 2 D variation $ ' D variation = 3 ! logic , WID D bundled ! data = µ delay ! line ! µ logic where µ logic , WID µ logic 13

  14. Bundled-Data Delay Overhead & 2 # D variation , when D variation $ 0 ( D bundled ! data " . ' 2 ( D variation , when D variation $ % ) 600 O(n 2 ) 500 Delay Overhead [%] 400 • Delay overhead becomes much larger as process 300 variability increases! 200 O(n) 100 0 0 50 100 150 200 Process Variability [%] 14

  15. Performance Model under Variations Original delay and energy model Statistical delay and energy model T comp = T delay (1+P+D) T comp = T delay E dynamic = α C switch (1+P)V 2 E dynamic = α C switch V 2 E leakage =VI leakage (1+P)T delay (1+P+D) E leakage =VI leakage T delay E total = α C switch (1+P)V 2 E total = α C switch V 2 +VI leakage (1+P)T delay (1+P+D) +VI leakage T delay Timing scheme Synchronous Bundled-Data Dual-Rail Delay Overhead (D) D sync D bundled-data 0 Protocol Overhead (P) 0 P bundled-data P dual-rail • Evaluate computation delay and energy under variations • Overhead changes with supply voltage and logic depth 15

  16. Outline • Motivation • Variability model of CMOS digital circuit • Performance model for different timing schemes • Performance comparison • Conclusion 16

  17. Delay Overhead Comparison 120 70 Synchronous Timing Synchronous Timing Bundled � Data Self � Timing Bundled � Data Self � Timing 60 100 Delay Overhead [%] Delay Overhead [%] 50 80 40 60 30 40 20 20 10 0 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Supply Voltage [V] Supply Voltage [V] 4-stage FO4 INV chain 24-stage FO4 INV chain • Global variation affects only synchronous approach • Local mismatch dominates at low supply voltages • Local mismatch has less impact on longer critical path 17

  18. Speed Performance Comparison 1.3 1 Dual � Rail Self � Timing Dual � Rail Self � Timing Bundled � Data Self � Timing Bundled � Data Self � Timing 0.95 1.2 0.9 Normalized Delay Normalized Delay 1.1 0.85 0.8 1 0.75 0.9 0.7 0.8 0.65 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Supply Voltage [V] Supply Voltage [V] 4-stage FO4 INV chain 24-stage FO4 INV chain • Assumption: P bundled-data = 1T FO4 ; P dual-rail = 2T FO4 • Synchronous scheme is better for small critical path at high supply voltages • Dual-rail scheme is better for large critical path at low supply voltages 18

  19. Energy Performance Comparison Energy � Delay Plot 60 70 Synchronous Timing ( � = 0.1) Synchronous Timing ( � = 0.01) Dual � Rail Self � Timing ( � = 0.1) Dual � Rail Self � Timing ( � = 0.01) 50 60 Bundled � Data Self � timing ( � = 0.1) Bundled � Data Self � timing ( � = 0.01) 40 Energy [fJ] Energy [fJ] 50 30 40 20 30 10 0 20 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 Supply [V] Supply [V] 24-stage FO4 INV chain • Synchronous scheme is better for high activity at high supply voltages • Dual-rail scheme is better for low activity at low supply voltages • Leakage dominates for low activity at low supply voltages 19

  20. Conclusion • A statistical analysis framework is proposed to evaluate performance of CMOS digital circuit in the presence of process variations. • Designer can efficiently determine the optimal timing strategy, pipeline depth and supply voltage based on the proposed variability and statistical performance models. • Asynchronous design exhibits better energy and delay characteristics for circuits with low activity and larger critical path delay under process variations 20

  21. Acknowledgement • Berkeley Wireless Research Center • NSF Infrastructure Grant • STMicroelectronics • Multiscale System Center Thank you! 21

Recommend


More recommend