Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Software Reliability and System Reliability Introduction 1 Software Reliability and System reliability Dependability 2 Steven J Zeil Failure Behavior of an X-ware System 3 Atomic Old Dominion Univ. Reliability Failure Rates and Hazard Functions Spring 2012 Reliability and the Hazard Rate Discrete, Independent Failures Systems made up of components The Single Interpreter Multiple Interpreters 1 2 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Outline Theme “by using deliberately simple mathematics, the classical reliability Introduction 1 theory can be extended in order to be interpreted from both hardware and software viewpoints” Dependability 2 Failure Behavior of an X-ware System 3 Atomic Reliability Failure Rates and Hazard Functions Reliability and the Hazard Rate Discrete, Independent Failures Systems made up of components The Single Interpreter Multiple Interpreters 3 4
Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Outline Basic Definitions Dependability is defined as the trustworthiness of a computer Introduction 1 system such that reliance can justifiably be placed on the service it delivers. Attributes: Dependability 2 availability Failure Behavior of an X-ware System reliability 3 Atomic safety Reliability confidentiality Failure Rates and Hazard Functions Reliability and the Hazard Rate integrity Discrete, Independent Failures maintainability Systems made up of components The Single Interpreter Multiple Interpreters 5 6 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Impairments and Means Failure Classification Domain Impairments: Value faults Timing failures Perception errors Consistent Means Inconsistent Consequences fault preventions benign . . . catastrophic fault removal fault tolerance fault forecasting 7 8
Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Outline Time to Failure The key random variable is the time to failure, T . Introduction 1 Denote the probability that the time to failure T is in some interval ( t , t + ∆ t ) as Dependability 2 P ( t ≤ T ≤ t + ∆ t ) Failure Behavior of an X-ware System 3 Given the cdf F ( T ) and pdf f ( T ), Atomic Reliability P ( t ≤ T ≤ t + ∆ t ) = F ( t + ∆ t ) − F ( t ) ≃ f ( t )∆ t Failure Rates and Hazard Functions Reliability and the Hazard Rate Discrete, Independent Failures Systems made up of components The Single Interpreter Multiple Interpreters 9 10 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Reliability Function Failure Rate The failure rate is the probability that a failure per unit time � t occurs in the interval [ t , t + ∆ t ], given that a failure has not F ( t ) = P (0 ≤ T ≤ t ) = f ( x ) dx occurred before t . 0 The reliability function is the probability of success at time t (i.e., P ( t ≤ T < t + ∆ t | T > t ) Failure rate ≡ the prob. that the time to failure exceeds t ) ∆ t � ∞ P ( t ≤ T < t + ∆ t ) = R ( t ) = P ( T > t ) = 1 − F ( t ) = f ( x ) dx (∆ t ) P ( T > t ) t F ( t + ∆ t ) − F ( t ) = (∆ t ) R ( t ) Failure rate measurable easier to understand than the prob. density function 11 12
Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Hazard Rate Converting The hazard rate is defined as the limit of the failure rate as the z ( t ) = f ( t ) R ( t ) = dF ( t ) 1 interval ∆ t approaches zero. dt R ( t ) F ( t + ∆ t ) − F ( t ) = f ( t ) dF ( t ) = − R ( t ) z ( t ) = lim (∆ t ) R ( t ) Rt ∆ t → 0 dt dt The hazard rate is an instantaneous rate of failure at time t , given Combining gives dR ( t ) that the system survives up to t . R ( t ) = − z ( t ) dt z ( t ) dt represents the probability that a system of age t will fail in the small interval t to t + dt . Integrate both sides w.r.t. t : � t ln R ( t ) = − z ( x ) dx + c 0 13 14 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Integrating Relating Reliability to Failure Rate � t � t � � ln R ( t ) = − z ( x ) dx + c R ( t ) = exp − z ( x ) dx 0 0 or, differentiating Because R(0) = 1, c = 0 Exponentiate both sides: � t � � f ( t ) = z ( t ) exp − z ( x ) dx � t � � 0 R ( t ) = exp − z ( x ) dx 0 15 16
Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Single failure Execution Duration Suppose we measure time in terms of # of discrete inputs. Now, assume that there is a finite limit for p / t e as t e becomes Let p be the prob of failure on a given test input given that no vanishingly small p prior failure has occurred on prior inputs. λ = lim t e t e → 0 If all failure domain inputs are independent t e → 0 (1 − p ( t e )) t / t e = exp( − λ t ) R ( k ) = (1 − p ) k R ( t ) = lim which is the exponential distribution Let t e be time required to execute one test case. t = kt e 17 18 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Markov Chain Model Hierarchical Structures Better known approach is dismissed in one paragraph Systems can be decomposed into subsystems forming a hierarchy of pipelines? function calls Markov approach Might not be a tree Littlewood TRel 4-1981, Cheung TSE 3/1980 might not form clean layers We should look at one of these later levels of abstraction Here called “interpreters” 19 20
Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System Single Interpreter Component States Consider an application built on C components (e.g., ADTs) [¡+-— alert@+¿] Can components be well modeled by discrete states? Each component C i has a failure rate λ i Can failures be modeled as a state change? The entire collection of components can be in any of S valid states. e.g., Consider a numeric calculation that is supposed to be within ± 0 . 01 of an ideal solution but that is ± 0 . 1 for selected Presumably each component has some number of discrete states, so S is the power set of all component states. input values. Is that a state of simply a function of the inputs? Add an S + 1st state to represent a failure state. In the example above, what are the implications of the failure state being terminal This state is an absorber/terminal state if the interpreter fails because of the error? if the interpreter recovers from the error? 21 22 Introduction Dependability Failure Behavior of an X-ware System Introduction Dependability Failure Behavior of an X-ware System State Transitions System Failure Rate The collection of components has its own set of transition ”A system failure is caused by the failure of any of its properties components. The system failure rate ξ j in state j is thus γ j is prob that a component in state j stays in state j the sum of the failure rates of the components under execution in this state, denoted by 1 /γ j is mean sojourn time in state j q jk ≡ prob that system in state j will make a transition to C state k � ξ j = δ ij λ i S � i =1 q jk = 1 k =1 where � 1 if C i is currently in state j δ ij = 0 ow 23 24
Recommend
More recommend