why ft software
play

Why FT-Software? Safe and reliable software operation is a - PDF document

Software Fault-Tolerance Techniques Techniques Hadi Salimi Distributed Systems Lab, School of Computer Engineering, School of Computer Engineering, Iran University of Science and Technology, hsalimi@iust.ac.ir Why FT-Software? Safe and


  1. Software Fault-Tolerance Techniques Techniques Hadi Salimi Distributed Systems Lab, School of Computer Engineering, School of Computer Engineering, Iran University of Science and Technology, hsalimi@iust.ac.ir Why FT-Software? � Safe and reliable software operation is a significant requirement for many systems � Aircraft, medical devices, nuclear safety, electronic banking and commerce. Consequences of these systems failing can C f th t f ili � range from mildly annoying to catastrophic. � Software assumes more of the responsibility S ft f th ibilit for providing functionality in these systems 6/2/2010 Chapter 1 2

  2. Why Are There Many Errors? � The current state-of-the-practice is such that fewer errors are introduced, but not all errors are prevented. t d � If the best people, practices, and tools are used it would be very risky to assume the used, it would be very risky to assume the software developed is error-free � There may also be cases in which an error, y , found late in the system's life cycle and perhaps prohibitively expensive to repair, is knowingly allowed to remain in the system knowingly allowed to remain in the system 6/2/2010 Chapter 1 3 Software-Related accidents � Problems in the backup tracking software delayed the launch of Atlantis for three days � AT&T system suffered a nine-hour United States wide blockade due to a flaw in recover recognition software. recover-recognition software � During Gulf War, the Patriot system miss a missile due to clock shift caused by the y software's use of two different and unequal representations (24-bit and 48-bit) of the value 0 1 value 0.1. 6/2/2010 Chapter 1 4

  3. HW-FT vs. SW-FT � Hardware faults are primarily physical faults, which can be characterized and predicted over time � Software has only logical faults, which are difficult to Software has only logical faults which are difficult to visualize, classify, detect, and correct � Software faults may be traced to incorrect requirements or to the implementation not satisfying i t t th i l t ti t ti f i the requirements � Changes in operational usage or incorrect g p g modifications may introduce new faults � Redundancy is not enough to protect against these faults faults 6/2/2010 Chapter 1 5 Dependability concept classification Fault Impairments Error Failure Fault Avoidance Construction Fault Tolerance Dependability Dependability Means Means Fault Removal Fault Removal Validation Fault Forecasting Availability Reliability Reliability Safety Attributes Integrity Maintainability Maintainability Confidentially 6/2/2010 Chapter 1 6

  4. Dependability Classification � Impairments: Are those things that stand in the way of dependability. stand in the way of dependability. � Means: the various technical means to achieve dependable software achieve dependable software. � Attributes: provide a way to assess achievement of dependability achievement of dependability properties. 6/2/2010 Chapter 1 7 Impairments Fault Impairments Error Failure Fault Avoidance Construction Fault Tolerance Dependability Dependability Means Means Fault Removal Fault Removal Validation Fault Forecasting Availability Reliability Reliability Safety Attributes Integrity Maintainability Maintainability Confidentially 6/2/2010 Chapter 1 8

  5. Fault � A fault is the identified or hypothesized cause of an error and sometimes called cause of an error and sometimes called a "bug“. � It can be viewed as simply the � It can be viewed as simply the "consequence of a failure." � An active fault is one that produces an An active fault is one that produces an error. 6/2/2010 Chapter 1 9 Error � An error is part of the system state that is likely to lead to a failure � It can be unrecognized as an error (latent) or detected � An error may propagate, i.e., produce d other errors � Faults are known to be present when Faults are known to be present when errors are detected � An error is the manifestation of a fault � An error is the manifestation of a fault 6/2/2010 Chapter 1 10

  6. Failure � A failure occurs when the service delivered by the system deviates from delivered by the system deviates from the specified service, otherwise termed as an incorrect result. as an incorrect result. � The expected service is described, typically by a specification or set of typically by a specification or set of requirements. 6/2/2010 Chapter 1 11 Fault-Error-Failure Chain Fault Error Error Failure 6/2/2010 Chapter 1 12

  7. Means to achieve dependable software Fault Impairments Error Failure Fault Avoidance Construction Fault Tolerance Dependability Dependability Means Means Fault Removal Fault Removal Validation Fault Forecasting Availability Availability Reliability Safety Attributes Integrity Integrity Maintainability Confidentially 6/2/2010 Chapter 1 13 Means to achieve dependable software � Two major groups � Construction: those that are employed � Construction: those that are employed during the software construction process � Validation: those that contribute to Validation: those that contribute to validation of the software after it is developed 6/2/2010 Chapter 1 14

  8. Fault avoidance or prevention � Fault avoidance or prevention techniques are dependability enhancing techniques employed during software development to reduce the d i ft d l t t d th number of faults introduced during construction � These techniques may address: System requirements specification � Structured design and programming methods � Formal methods � Software reuse Software reuse � 6/2/2010 Chapter 1 15 Software Reusability � Software reusability implies a savings in development cost � It can also increase dependability because software that has been well exercised is less likely to fail � Object-oriented paradigms and techniques encourage and support software reuse. d f � It also may decrease reliability. Why? 6/2/2010 Chapter 1 16

  9. Fault Avoidance-prevention � Using advanced software construction techniques is highly accepted and techniques is highly accepted and employed approaches are generally used to prevent faults in software. used to prevent faults in software. � Despite fault prevention efforts, faults are created so fault removal is needed! are created, so fault removal is needed! 6/2/2010 Chapter 1 17 Fault removal � Fault Removal techniques are dependability- enhancing techniques employed during software verification and validation. ft ifi ti d lid ti � Improving software dependability by � Detecting existing faults, using verification and validation g g , g (V&V) methods � Eliminating the detected faults � Techniques � Techniques � Software testing � Formal inspection � Formal design proofs Formal design proofs 6/2/2010 Chapter 1 18

  10. Formal inspection � Formal inspection is a rigorous process, accompanied by documentation that focuses on: � Examining source code to find faults � Correcting the faults � Verifying the corrections � A practical and success fault removal A ti l d f lt l technique widely implemented in industry 6/2/2010 Chapter 1 19 Formal design proofs � Formal design proofs: using executable specifications, test cases can be automatically generated to improve the software verification process � Attempts to achieve mathematical proof of correctness of programs closely related to formal methods formal methods � It may be costly and complex or May give the designer a high degree of confidence designer a high degree of confidence. 6/2/2010 Chapter 1 20

  11. Fault Removal � Fault removal techniques � Determine whether the software matches � Determine whether the software matches the specified required behavior � They do not determine whether something has y g been left out of the requirements � Fault removal is imperfect, so fault forecasting and fault tolerance are needed! 6/2/2010 Chapter 1 21 Fault/Failure Forecasting � Fault/Failure Forecasting includes dependability enhancing techniques that are used during the validation of software to estimate the presence of faults and the occurrence and consequence of failures d f f il � Usually focuses on the reliability measure of dependability dependability � Also known as software reliability measurement measurement 6/2/2010 Chapter 1 22

  12. Fault/Failure Forecasting � The formulation of a fault/error/failure relationship � An understanding of the operational � An understanding of the operational environment � The establishment of reliability models � The collection of failure data Th ll ti f f il d t � The application of reliability models by tools � The selection of appropriate models pp p � The analysis and interpretation of results � Guidance for management decisions 6/2/2010 Chapter 1 23 Fault forecasting � Fault forecasting Activities � Reliability Estimation Reliability Estimation � Reliability Prediction 6/2/2010 Chapter 1 24

Recommend


More recommend