a new approach to system level single event survivability
play

A New Approach to System-Level Single Event Survivability Prediction - PowerPoint PPT Presentation

A New Approach to System-Level Single Event Survivability Prediction Melanie Berg 1 , Kenneth LaBel 2 , Michael Campola 2 , Michael Xapsos 2 Melanie.D.Berg@NASA.gov 1.AS&D in support of NASA/GSFC 2. NASA/GSFC P resented by Melanie Berg at


  1. A New Approach to System-Level Single Event Survivability Prediction Melanie Berg 1 , Kenneth LaBel 2 , Michael Campola 2 , Michael Xapsos 2 Melanie.D.Berg@NASA.gov 1.AS&D in support of NASA/GSFC 2. NASA/GSFC P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018 1

  2. Acronyms • Combinatorial logic (CL) • Probability of configuration upsets (P configuration ) • Commercial off the shelf (COTS) • Probability of Functional Logic upsets (P functionalLogic ) • Complementary metal-oxide semiconductor (CMOS) • Probability of single event functional interrupt (P SEFI ) • Device under test (DUT) • Probability of system failure (P system ) • Edge-triggered flip-flops (DFFs) • Processor (PC) • Electronic design automation (EDA) • Radiation Effects and Analysis Group (REAG) • Error rate ( λ ) • Reliability over time (R(t)) • Error rate per bit( λ bit ) • Reliability over fluence (R( Φ )) • Error rate per system( λ system ) • Single event effect (SEE) • Field programmable gate array (FPGA) • Single event functional interrupt (SEFI) • Global triple modular redundancy (GTMR) • Single event latch-up (SEL) • Hardware description language (HDL) • Single event transient (SET) • Input – output (I/O) • Single event upset (SEU) • Intellectual Property (IP) • Single event upset cross-section ( σ SEU ) • Linear energy transfer (LET) • System on a chip (SoC) • Mean fluence to failure (MFTF) • Windowed Shift Register (WSR) • Mean time to failure (MTTF) • Xilinx Virtex 5 field programmable gate array (V5) • Number of used bits (#Usedbits) • Xilinx Virtex 5 field programmable gate array • Operational frequency (fs) radiation hardened (V5QV) • Personal Computer (PC) 2 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  3. Problem Statement and Abstract • The process for application of single event upset (SEU) data used to characterize system performance in radiation environments needs improvement. • We are investigating the application of classical reliability performance metrics combined with standard SEU analysis data to improve system survivability prediction. This presentation is a simplified approach for SEU data extrapolation to complex systems. Future work will incorporate additional details. 3 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  4. Background (1) : FPGA SEU Susceptibility SEU Cross Section ( σ SEU ) σ SEU s ( per category) are calculated from SEU test and analysis. • σ SEU s are calculated per particle linear energy transfer (LET). • Most believe the dominant σ SEU s are per bit (configuration or flip- • flops (DFFs)). However, global routes are significant (more than DFFs). σ SEU s are measured σ SEU s are measured by bit! by bit??? Design σ SEU Configuration σ SEU SEFI σ SEU Functional logic σ SEU Sequential and Global Routes For a system, should σ SEU s be Combinatorial and Hidden logic (CL) in data Logic measured by bit???? path 4 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  5. Window Shift Register (WSR) Microsemi σ SEU s: Design and Stimulus Dependencies to SEUs Add combinatorial logic, Increase frequency may 7.00E-09 increase cross section. or may not change SEU data. WSR16 Checkerboard 6.00E-09 How and what you WSR8 Checkerboard test make a big WSR4 Checkerboard 5.00E-09 difference! WSR0 Checkerboard σ SEU (cm 2 /DFF) WSR16 All 1's 4.00E-09 WSR8 All 1's WSR4 All 1's 3.00E-09 WSR0 All 1's WSR16 All 0's 2.00E-09 WSR8 All 0's 1.00E-09 WSR4 All 0's WSR0 All 0's 0.00E+00 σ SEU = #errors/fluence 0 5 10 15 20 25 λ system = #errors/time LET MeV*cm 2 /mg LET: Linear energy transfer 5 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  6. Background (2) Conventional Conversion of SEU Cross-Sections To Error Rates for Complex Systems Next Step Bottom-Up approach (transistor level): • – Given σ SEU (per bit) use an error rate calculator (such as CRÈME96) to obtain an error rate per bit ( λ bit ). – Multiply λ bit by the number of used memory bits (# UsedBits ) in the target design to attain a system error rate ( λ system ). Configuration and DFFs. Top-Down approach (system level): • Given σ SEU (per system) use an error rate calculator (such as • CRÈME96) to obtain an error rate per bit ( λ system ). 6 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  7. Technical Problems with Current Methods of Error Rate Calculation For submission to CRÈME96, σ SEU • data (in Log-linear form) are fitted to a Weibull curve. 1.00E-01 – During the curve fitting process, a large amount of error can be 1.00E-02 σ SEU (cm 2 /design) introduced. 1.00E-03 – Consequently, it is possible for 1.00E-04 resultant error rates (for the same design) to vary by decades. 1.00E-05 Because of the error rate calculation • 1.00E-06 process, σ SEU data are blended together and it is nearly impossible 1.00E-07 to hone in on the problem spots. 1.00E-08 This can become important for 0.0 20.0 40.0 60.0 mitigation insertion. LET MeV*cm 2 /mg 7 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  8. Technical Problems with Bottom-Up Analysis Method Multiplying each bit within a design by λ bit • is not an efficient method of system error rate prediction. – Works well with memory structures… but…complex systems do not operate or respond like memories. – If an SEU affects a bit, and the bit is either inactive, disabled, or masked, a system malfunction might not occur. λ system < λ bit ×#UsedBits • Using the same multiplication factor across DFFs will produce extreme over-estimates. Let’s Not Reinvent The Wheel… A Proven Solution Can Be Found in Classical Reliability System-Level Analysis 8 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  9. Mapping Classical Reliability Models from The Time Domain To The Fluence Domain The exponential model that relates reliability to MTTF • assumes that during useful-lifetime: – Failures are independent. R(t)=e -t/MTTF or R(t)=e - λt – Error rate is constant. Weibull slope = 1… exponential. – MTTF = 1/ λ . Parallel between For a given LET (across fluence): • time and fluence. – SEUs are independent. – σ SEU is constant. σ SEU = #errors/fluence λ system = #errors/time – MFTF = 1/ σ SEU . Hence, mapping from the time domain to the fluence • domain (per LET) is straight forward: – t Φ – MTTF MFTF R( Φ )=e - Φ /MFTF R(t)=e -t/MTTF – λ σ SEU 9 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  10. Example of Proposed Methodology Application • Mission requirements: – Selection shall be made between a Xilinx V5QV (relatively expensive device) or a Xilinx V5 with embedded PowerPC (relatively cheap device). – FPGA operation shall have reliability of 3-nines (99.9%) within a 10 minute window at Geosynchronous Equatorial Orbit (GEO). • Proposed methodology: – Create a histogram of particle flux versus LET for a 10- minute window of time for your target environment. – Calculate MFTF per LET (obtain SEU data). – Graph R( Φ ) for a variety of LET values and their associated MFTFs. R( Φ )=e - Φ /MFTF – For selected ranges of LETs, use an upper bound of particle flux (number of particles/cm 2  10-minutes), to determine if the system will meet the mission’s reliability requirements. 10 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

  11. Environment Data: Flux versus LET Histogram for A 10-minute Window Geosynchronous Equatorial Orbit (GEO) 100-mils shielding Bins are selected based on σ SEU data 1.0E+03 Flux(particles/(cm 2 *10-minutes) points. 1.0E+02 We will analyze 1.0E+01 system reliability 1.0E+00 for each bin 1.0E-01 1.0E-02 1.0E-03 1.0E-04 1.0E-05 1.0E-06 1.0E-07 1.0E-08 1.0E-09 1.0E-10 0.7 0.1 0.1 to 1.8 1.8 3.6 20 40 >40 0 To 0.07 ฀ ฀ 0.07 To 0.1 3.6 To 20 ฀ ฀ 20 To 40 ฀ 40 and over 0.1 To 1.8 1.8 To 3.6 LET Bins (MeVcm 2 /mg) 11 P resented by Melanie Berg at the Microelectronics Reliability & Qualification Working Meeting (MRQW), El Segundo, CA February 6-7, 2018

Recommend


More recommend