The Effects of Race Conditions when Implementing Single-Source Redundant Clock Trees in Triple Modular Redundant Synchronous Architectures Melanie Berg, AS&D in support of NASA/GSFC Melanie.D.Berg@NASA.gov Ken LaBel, NASA/GSFC Jonathan Pellish, NASA/GSFC Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 1 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Acronyms • Clock cycle time ( T clk ) Linear energy transfer (LET) • • Combinatorial logic (CL) • Mean time to failure (MTTF) • Data-path hold time requirement (T HOLD ) • Mitigation window (MW) • Design under analysis (DUA) • Multiple bit upset (MBU) • Delay of combinational logic delay (T comb ) • Radiation Effects and Analysis Group (REAG) • Delay of data output of DFF ( T clk q ) • Single Error Correct Double Error Detect • Device under test (DUT) Single event functional interrupt (SEFI) • DFF setup time (T setup ). • Single event effects (SEEs) • DFF hold time (T DataStable ) • Single event transient (SET) • Distributed triple modular redundancy • Single event upset (SEU) (DTMR) Single event upset cross-section ( σ SEU ) • • Edge-triggered flip-flops (DFFs) • Static random access memory (SRAM) • Field programmable gate array (FPGA) • Static timing analysis (STA) • Global triple modular redundancy (GTMR) • Triple modular redundancy (TMR) Hardware description language (HDL) • • Input – output (I/O) Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 2 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Problem Statement • Triple modular redundancy (TMR) can be implemented in a variety of topologies. • This presentation focuses on the trade-offs between implementing TMR with: – Multiple clock domains (Three clocks… one per TMR domain): i.e., global TMR (GTMR) and – A single clock shared across the three TMR domains: i.e., distributed TMR (DTMR). • For many organizations, GTMR is the mitigation strategy of choice because of its redundant clock topology. • However, as FPGA devices and designs become larger and more complex, clock-skew between separate domains is increasing and becoming impossible to control. • Unfortunately, mismanaged clock-skew can cause timing violations or circuit race conditions in synchronous designs. Race conditions from clock-skew weaken mitigation and can cause system malfunction! Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 3 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Abstract We present the challenges that arise when using redundant clock domains due to their clock-skew. Heavy-ion radiation data show that a singular clock domain (DTMR) provides an improved TMR methodology for SRAM-based FPGAs over redundant clocks. Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 4 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Clock-skew Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 5 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Clock-skew within One Clock Domain Clock path Data path CL: combinatorial logic T comb : CL circuit delay T skew : clock-skew DFF: flip-flop SET D Q SET D Q SET D Q Q CLR D SET D SET Q Q D SET Q Q CLR Q SET SET CLR D Q D SET Q D Q Q Q CLR CLR Q CLR D SET Q D SET Q D SET Q Q Q Q CLR CLR CLR SET D Q SET D Q Q Q CLR CLR Q CLR D SET Q Q CLR Q CLR SET D Q Q CLR Q CLR D SET Q D SET Q Q CLR SET D Q Q CLR D SET Q Q CLR Q CLR The difference in time for a clock edge’s arrival at one DFF with respect to its arrival at another DFF is defined as clock- skew (T skew ). Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 6 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Synchronous Data Capture DFF: flip flop Launch DFF T comb : combinational logic delay. Capture DFF T clk q : delay of data output from DFF. T DataStable : Data-path hold time requirement. T setup : DFF setup time. T HOLD : DFF hold time No Skew: Data is launched from usually T clk q is DFFa long enough to accommodate the DFF T hold requirement. Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 7 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Positive Skew and Data Capture • Large T skew : DFF x will capture the wrong data (cycle ahead). • Small T skew : DFF x capture can be in the DFF T hold window…data is unstable (metastability). • Changing the clock cycle time (T clk ) will not fix T skew . • Longer data path delays that make incoming data stable at the capture DFF helps to accommodate skew. Not shown: Data 1 and DATA 2 will be delayed getting to DFFx. If T skew > shortest data path delay, bad data is captured. T clk Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 8 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Negative Skew and Data Capture In a system with negative skew, there is the possibility that data can be captured during it’s computation time. – T setup is violated. – This can cause metastability. – Data is invalid. Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 9 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Triple Modular Redundancy (TMR) Protection against single event upsets (SEUs) Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 10 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
DTMR and GTMR Topologies • With DTMR and GTMR Internal Voters all circuits are provide masking triplicated; creating and correction three TMR domains. against SEUs. • Voters are placed after the internal flip-flops (DFFs). • DTMR: only one clock per TMR domain. • GTMR: Three separate clocks per TMR domain. GTMR violates synchronous design protocol because of sharing data across clock domains without synchronization. Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 11 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
TMR Mitigation Window Definition DFF CL CL CL CL DFF DFF to DFF data-path DTMR and GTMR conversion of DFF to DFF data-path. Mitigation Window (MW) is DFF-voter pair to DFF-voter pair. In the absence of SEUs: With GTMR, there is a possibility of having broken MWs because of T skew . There are no broken MWs with DTMR. With the occurrence of SEUs: The broken GTMR MWs have weakened mitigation (masking and correction cannot be guaranteed). Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 12 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Challenges of GTMR System Implementation Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 13 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
System Implementation: Sources of Clock-skew • Board Level: – One board clock source (oscillator): routes from board clock source must be the same length to FPGA clock inputs. – Three board clock sources: Don’t! • Internal to the FPGA: – Clock pin to clock tree routing differences, – Skew within a single clock tree, and – GTMR has additional skew from use of different clock trees. Deliverable to NASA Electronic Parts and Packaging (NEPP) Program to be published on nepp.nasa.gov originally presented by Melanie D. Berg at Radiation Effects 14 on Components and Systems (RADECS) Conference, Bremen, Germany, September 19-23, 2016
Recommend
More recommend