circuit reliability mechanisms monitors and effects in
play

Circuit Reliability: Mechanisms, Monitors, and Effects in - PowerPoint PPT Presentation

Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors Chris H. Kim University of Minnesota, Minneapolis, MN chriskim@umn.edu www.umn.edu/~chriskim/ Scaling Challenges 2000 2010 2020 Power wall Variability


  1. Circuit Reliability: Mechanisms, Monitors, and Effects in Near-Threshold Processors Chris H. Kim University of Minnesota, Minneapolis, MN chriskim@umn.edu www.umn.edu/~chriskim/

  2. Scaling Challenges 2000 2010 2020 Power wall Variability wall Reliability wall Power (W) Year 2

  3. Overcoming the Power Wall Y=AxB Y=AxB Y=AxB Freq=1 Freq=0.5 Vdd=1 Vdd=0.5 Throughput=1 Throughput=1 Area=1 Area=2 Power=1 Power=0.25 Pwr Den=1 Pwr Den=0.125 87%↓ • Proven solutions: Multi-core chips, dynamic voltage frequency scaling, clock gating, power gating, … 3

  4. Overcoming the Variability Wall VID P Limit - + - + IIR DAC 6 R Package V Connector V Die A/D V Connector P Calc Calc R Package V Die A/D Micro-Controller Power Supply Package/Die Intel Foxton Technology • Proven solutions: Variation aware design, memory assist/repair, lithography techniques, adaptive systems 4

  5. Overcoming the Reliability Wall • Possible solutions: Guardbanding, sensing and compensation, wear-leveling, failure resistant systems, … 5

  6. Outline • Device Reliability Issues • Reliability Monitors and Measurements • Reliability Effects in NTV Processors • Summary 6

  7. Aging in CMOS Transistors 7

  8. HCI, BTI, and TDDB in Digital Logic Inverted Inverted I D D Channel Channel • Transistors are exposed to different stress conditions during normal digital circuit operation 8

  9. Practical Solutions for Preventing Aging Related Failures • BTI and HCI – Gradual decline in performance – Guard banding (static or dynamic), adjust Vmax – CAD, firmware & architecture level support essential • TDDB – Single incident may lead to outright system failure – Can happen anywhere inside a chip – Improve fabrication procedure, adjust Vmax • Bottom line: Precise measurement and understanding of circuit degradation a key aspect of robust design 9

  10. Transistor Lifetime Estimation real supply voltage • Extrapolate stress results with respect to: – Op. conditions based on acceleration models – Larger chip areas (e.g., Poisson scaling for TDDB) – Lower percentiles based on chosen distribution 10

  11. Benefits of In-Situ Reliability Monitors over Device Probing • Information from actual circuits (test circuit must be representative) • High (timing) precision + short measurement interrupt • No expensive equipment • Short test time and reduced test area • Measurements at use condition  allows realistic lifetime projection • Complements traditional probing methods 11

  12. Usage Scenarios and Design Issues of In-situ Reliability Monitors • Usage scenario 1: Process characterization and yield improvement • Early technology characterization is often performed before many metallization layers are being fabricated • Library cells may not be available (flip-flops, scan) • Device probing would still be a competitive solution for extracting analog parameters such as I–V or C–V • Usage scenario 2: In-field monitoring and data collection • Workload unknown • Simple circuits are practical but they have limited capabilities • Firmware and architecture support needed 12

  13. Usage Scenarios and Design Issues of In-situ Reliability Monitors • Usage scenario 3: Sensor for real time aging compensation • Effectiveness versus overhead • Measurements are from a proxy circuit • Practical issues: type of sensor, temporal granularity, spatial granularity, communication with sensors, interface and protocol • Personally not a big fan 13

  14. Outline • Device Reliability Issues • Monitors and Measurements • Effects in NTV Processors • Summary 14

  15. Circuit Based Reliability Monitors (or Silicon Odometers) 2008 Year 2007 2009 2010 2011 2012 Die Photo Process 130nm 65nm 65nm 65nm 32nmSOI 32nmSOI Statistical, Original PBTI and SRAM and Odometer All-In-One Duty-Cycle, Interconnect Silicon SRAM RTN Projects Odometer and RTN Odometer Odometer Odometer Odometer Odometer Monitoring SRAM Timing Statistical NBTI Separately PBTI in HKMG Issues Due to Impact of Behavior of Focused Induced Monitoring Process; BTI; Interconnect NBTI; Reliability Frequency NBTI, HCI and on BTI and BTI Impact on RTN Impact Issues RTN on Logic Degradation TDDB HCI Aging SRAM Read/ on Ring Circuit Write Oscillator 15

  16. Beat Frequency Silicon Odometer • Beat frequency of two free running ROSCs measured by DFF and edge detector • Benefits of beat frequency detection system – Achieve ps resolution with μs measurement interrupt – Insensitive to common mode noise such as temperature drifts – Fully digital, scan based interface, easy to implement 16

  17. Beat Frequency Silicon Odometer stress ref ref ref beat stress stress • Sample stressed ROSC output with reference ROSC – 1% frequency difference before stress  N=100 – 2% frequency difference after stress  N=50 – Δf or ΔT sensing resolution is >0.01% 17

  18. ROSC Based Aging Sensor Comparison 2 ROSC, simple System Single ROSC 2 ROSC, beat freq. Block Diagram Count Stress ROSC Count Stress ROSC Count Ref. ROSC periods Function periods during externally periods during N1 periods during one period of controlled meas. time of Ref. ROSC PC_OUT High resolution w/ short Simple; immune to Features meas. time; immune to Simple; compact common mode variations common mode variations Voltage and temp. varations; meas. time vs. Requires extra circuits Meas. time vs. resolution Issues resolution tradeoff; requires (e.g., Phase Comp., edge tradeoff absolute timing reference detector, etc...) (e.g. oscilloscope) Meas. time for 0.01% 30 μs 30 μs 0.3 μs max res. * Meas. error wrt. common +10.18% / -8.57% +0.26% / -0.38% +0.06% / -0.07% mode variations ** *ROSC period = 3 ns ** simulated with +/- 4% ∆VCC 18

  19. Separately Monitoring NBTI and PBTI NBTI stress • PBTI becoming an important concern in high-k metal-gate • Conventional Ring Oscillator (ROSC) PBTI stress can only provide overall frequency degradation information due to combined NBTI and PBTI effects • New RO structure separates NBTI and PBTI effects N/PBTI stress J. Kim, et al., IBM, IRPS 2011 19

  20. Separately Monitoring BTI and HCI BTI_ROSC STRESSED BTI_ROSC (BTI Stress Only) DRIVE_ROSC (BTI & HCI Stress) DRIVE_ROSC UNSTRESSED SCAN Beat Frequency OUT Detection Circuit 1 BTI_REF_ROSC SCAN Beat Frequency OUT DRIVE_REF_ROSC Detection Circuit 2 20

  21. Separately Monitoring BTI and HCI • Backdriving action equalizes BTI in both BTI_ROSC and DRIVE_ROSC • Negligible HCI in BTI_ROSC: only 3-5% of the switching current in the DRIVE_ROSC • Fresh power gates are used for frequency measurements 21

  22. Temp. and Voltage Dependencies 1.E+00 1.E+01 26 O C 250MHz stress freq. 2.0V stress 470MHz stress freq. Frequency Shift (%) Frequency Shift (%) 1.E+00 2.4V stress 1.E-01 1.8V stress 1.E-01 30 O C: HCI DEG , BTI DEG 120 O C: HCI DEG , BTI DEG 1.E-02 1.E-02 1.E+0 1.E+1 1.E+2 1.E+3 1.E+4 1.E+5 1.E+00 1.E+02 1.E+04 1.E+06 Stress Time (s) Stress Time (s) • HCI slightly reduced with temperature – Due to reduced drain current • Both mechanisms degrade with stress voltage – Point when HCI begins to dominate pushed out in time by >1 order of magnitude at 1.8V vs. 2.4V 22

  23. Aging Issues in Interconnects • Interconnect affects the voltage and current shapes – Increased transition time (decreased slew rate) – Increased current pulse; decreased current peak value • BTI and HCI have different sensitivities to bias conditions 23

  24. Interconnect Aging Monitor • Serpentine wires for a dense chip implementation • Ground shielding on both sides for reducing noise X. Wang, et al ., IRPS 2012, TVLSI 2014 24

  25. BTI and HCI Aging: With Interconnect • BTI aging decreases with interconnect length • HCI degradation peaks at L=500µm 25

  26. BTI Aging vs. Interconnect Length • BTI induced frequency degradation decreases with longer interconnect • Longer transition time  shorter PMOS stress duration  Less BTI aging 26

  27. HCI Aging vs. Interconnect Length • HCI aging exhibits a non-monotonic behavior with respect to interconnect length – Current pulse width increases – Current peak decreases 27

  28. Statistical Behavior of Aging Spread in ∆Vt increases with scaling CDF of ∆Vt at different stress times S. Pae, et al., TDMR‘08 S. Rauch, TDMR, Dec. ‘07 • Finite number and random spatial distribution of discrete charges  NBTI & HCI variation • Inversely proportional to A GATE  worse with scaling • Small number of aging measurements not sufficient to characterize aging 28

  29. Statistical Reliability Monitor • Need stressed & Column Peripherals reference ROSC frequencies to be close • Difficult, costly to tune each stressed ROSC Ref ROSC 1 • Use multiple ref. 3 Silicon FSM ROSCs with different Odometer Beat + Ref ROSC 2 Frequency Scan frequencies Detection Chain Systems • Cover the frequency Ref ROSC 3 distribution of the SCANOUT RESULTS stressed array J. Keane, et al., IEDM 2010, JSSC 2011 29

Recommend


More recommend