1/21/2014 Overview ECE 753: FAULT-TOLERANT • Fault Modeling COMPUTING • References • Introduction Kewal K.Saluja Kewal K Saluja • Fault models at different levels (HW) Department of Electrical and Computer Engineering • Error models • High-level failure models (process or Fault Modeling system failure) • Summary Lectures Set 2 ECE 753 Fault Tolerant Computing 2 Recap Fault Modeling References • Think about PROJECT • [abra:86] Abraham and Fuchs, Fault and error • Terminology and definitions modeling for VLSI, Proc. IEEE, May 1986 • Fundamental principles - Redundancy • [kala:13] Kalayappan and Sarangi, A survey of – Hardware - low and high level checker architectures, ACM Computing survey, – Software Software Aug 2013 – Time • [mull:93] Hadzilacos and Toueg, Fault tolerant – Information broadcast and related problems, In Distributed • FEF Chain and methods to break it (barriers) systems (book) – Attributes of faults and fault types - such as permanent, transient, intermittent (please read) ECE 753 Fault Tolerant Computing 3 ECE 753 Fault Tolerant Computing 4 Fault Modeling (contd.) Fault Modeling (contd.) Introduction Introduction • Why use a model? • What is a model? – tractability of analysis – An abstraction that captures the behavior – a non-destructive method to study (low y ( of the original system. f th i i l t cost, alternative to fault injection) • must be simple – manageable study space (can check • must lead to accurate conclusions equivalence and reduce the study space) ECE 753 Fault Tolerant Computing 5 ECE 753 Fault Tolerant Computing 6 1
1/21/2014 Fault Modeling (contd.) Fault Modeling (contd.) Introduction Fault models at different levels (HW) • Process level • Different models at different levels of • Transistor level abstractions: • Gate level Gate level – Chip level - manufacturing defects, random • Function level (often error models) faults, transistor faults, gate failures, aging,… • Behaviour level (often timing failure – System level models) • HW - aging, interconnect failures, chip failures, … . . . • SW - bugs, design flaws, incorrect algorithms, ... • System level (usually failure models) ECE 753 Fault Tolerant Computing 7 ECE 753 Fault Tolerant Computing 8 Fault Modeling (contd.) Fault Modeling (contd.) Fault models at different levels (contd.) Fault models at different levels (contd.) • Transistor level - failure of a transistor • Process level - Defect models • fabrication level causes - point defects, mask misallignment, design rule violation • cluster defects • point and random defects • physical facts - shorts, opens, line-bridges, • used to predict the process yield sed to predict the process ield • others others • size variations -> altered delays • tested using optical and parametric tests • coupling/crosstalk • effect of defect • degradation of elements - electromigration • chip fails to perform its function • alpha particle hits • unacceptable parameters - large capacitance, large • power transients delay, slow speed, high current • missing/extra transistors – PLAs • Function modification/alteration - FPGA ECE 753 Fault Tolerant Computing 9 ECE 753 Fault Tolerant Computing 10 Fault Modeling (contd.) Fault Modeling (contd.) Fault models at different levels (contd.) Fault models at different levels (contd.) • Transistor level - prevalent fault models • Transistor level - erroneous behaviors • stuck-on and stuck-off faults • High current • bridging fault • incorrect logic output • strength of signals strength of signals • intermediate voltage • intermediate voltage • delay fault • different performance (operating speed) • coupling and cross talk • state change - alpha particle hit • Limitations • very large number of possible faults makes it difficult to handle these faults (intractability due to large model space) ECE 753 Fault Tolerant Computing 11 ECE 753 Fault Tolerant Computing 12 2
1/21/2014 Fault Modeling (contd.) Fault Modeling (contd.) Fault models at different levels (contd.) Fault models a different levels • Transistor level - comments (these are fairly (contd.) general and are not restricted to transistor level • Gate level - causes model) • same as for transistors • increasing computing power implies that we can handle • additional causes in SSI and board level • additional causes in SSI and board level - failed resistor, failed resistor large number of faults and complex models failed solder joint, failed wire wrap, … • these models used for test generation and not for fault • Gate level - erroneous behaviors tolerance per say • similar to those as for transistors • methods have been proposed to reduce the number of faults that need to be studied - e.g. fault equivalence • classical method and newer methods (such as current (one of the most commonly used model - why? See next testing) are employed in real testing slides) • design for testability and built-in self-test are becoming prevelent ECE 753 Fault Tolerant Computing 13 ECE 753 Fault Tolerant Computing 14 Fault Modeling (contd.) Fault Modeling (contd.) Fault models a different levels (contd.) Fault models a different levels • Gate level - different models (contd.) • Stuck-at: a line value stays the same irrespective of the signal applied to the line • Gate level - different models • Advantages g • Stuck-at - (contd.) Stuck at (contd ) • simplicity • Disadvantages • accuracy • with increasing device density the model is being • can model most real faults questioned often and loosing many of its advantages • tractable model space - count the possible number of • Some real defects can not be modeled by this model faults • more powerful computers are making it possible to • easy to use and easy to quantify (for quality metric) handle other models - even at fabrication level • substantial empirical evidence of its practical use ECE 753 Fault Tolerant Computing 15 ECE 753 Fault Tolerant Computing 16 Fault Modeling (contd.) Fault Modeling (contd.) Fault models a different levels (contd.) Fault models a different levels (contd.) • Gate level - different models • Gate level - different models • Bridging faults - pair of lines in a circuit (at gate level) are shorted. Many variations such as • Stuck-open/Stuck-On - Transistor based open intergate, intragate, neighboring lines, … g g g g fault can be modeled by logic level. Some time extra y g logic gates are used to model opens in this manner • Advantages similar to modeling bridging faults • simple • realistic • Disadvantages • large number of faults • difficult to relate to the quality metric ECE 753 Fault Tolerant Computing 17 ECE 753 Fault Tolerant Computing 18 3
1/21/2014 Fault Modeling (contd.) Fault Modeling (contd.) Fault models a different levels (contd.) Fault models a different levels (contd.) • Gate level - different models • Gate level - different models • Delay faults - delay of a gate or a line is • Other models different than the nominal or know delay in a y • coupling between pair of lines coupling between pair of lines perfect process • pin or I/O faults in gates (or chips) • Deals with critical paths - gate delay, path delay, ... • speedup/slow down of signals (sub-micron • Advantages technologies) • Performance oriented modeling • aging (such as NBTI in sub-micron technologies) • Quite general • Disadvantages • Difficult to use and intractable (path delay) ECE 753 Fault Tolerant Computing 19 ECE 753 Fault Tolerant Computing 20 Fault Modeling (contd.) Fault Modeling (contd.) Fault models a different levels (contd.) Fault models a different levels (contd.) • Function Level - when used • Function Level - where used • lower level description is not available • combinational circuits • function level processing (e g simulation) is • function level processing (e.g. simulation) is • logic blocks logic blocks • decoders often faster • finite state machines • design available only in mixed form (gate and • large complex circuits function) • microprocessors (often only mix format is available, such as ALU in gate level, memory in functional level, etc.) • for other building blocks • PLAs, RAMs, FPGAs ECE 753 Fault Tolerant Computing 21 ECE 753 Fault Tolerant Computing 22 Fault Modeling (contd.) Fault Modeling (contd.) Fault models a different levels Error models (contd.) Means of classifying the effect of physical fault(s) in a system - note from modeling • System Level - when used point of view it is not necessary that we point of view it is not necessary that we • interconnected systems interconnected systems deduce it using a fault model • ad hoc connected systems • regular connected systems • Goals • failure of a system or systems, or interconnects • extent of information corrupted • many failure models exist and will be dicussed later • extent of error(s) propagated in the course • latency issue ECE 753 Fault Tolerant Computing 23 ECE 753 Fault Tolerant Computing 24 4
Recommend
More recommend