44 Fault diagnosis 1. Spectra for N test cases M components x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 N cases … … ... ... … x N1 x N2 … x NM e N
45 Fault diagnosis 1. Spectra for M test cases x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N Row i : the blocks that are executed in case i
46 Fault diagnosis 1. Spectra for M test cases x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N Column j : the test cases in which block j was executed
47 Fault diagnosis 1. Spectra for M test cases 2. Error detection per test case x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N e i =1 : error in the i -th test e i =0 : no error in the i -th test
48 Statistics-based Fault diagnosis Compare every column vector with the error vector. error vector block j x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N similarity s j
49 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01
50 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01
51 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + n 10 +n 01
52 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + n 01
53 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + 1
54 Statistics-based Fault diagnosis For every block: similarity with the error “block” M components error vector x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 m cases … … ... ... … x N1 x N2 … x NM e N s 1 s 2 … s M The component with the highest s i most likely contains the fault.
55 Statistics-based Fault diagnosis n 11 s = n 11 +n 10 +n 01 component a b c d e f g fail test 1 0 1 1 1 1 0 0 0 test 2 0 0 0 1 0 1 1 1 test 3 1 1 1 1 0 0 0 1 test 4 0 0 0 0 0 0 0 0 test 5 1 1 0 1 1 0 1 1 ⅔ ½ ¼ ¾ ¼ ⅓ ⅔
56 Example: rational bubble sort void RationalSort( int n, int *num, int *den ) { int i,j; /* block 1 */ for ( i=n-1; i>=0; i-- ) { assert( den[i] != 0 ); /* block 2 */ for ( j=0; j<i; j++ ) { if ( RationalGT( num[j], den[j], /* block 3 */ num[j+1], den[j+1] ) ) { swap( &num[j], &num[j+1] ); /* block 4 */ /* swap( &den[j], &den[j+1] ); */ } } } } Fault : forgot to swap denominators Error : sequence is not a permutation of input sequence Failure : output is not a sorted version of the input
57 Example (2) earlier example 0 2 4 4 2 0 2 4 0 2 0 4 1 2 1 1 2 1 1 2 1 1 2 1 ERROR!
58 Example (3) Block 4 has highest similarity coefficient -> most likely suspect
59 Reasoning-based Fault Diagnosis • MBD – Reasoning approach based on behavioral comp models – High(er) diagnostic accuracy – Prohibitive (modeling and/or diagnosis) cost • SFL – Statistical based on execution spectra – Lower diagnostic accuracy: cannot reason over multiple faults – No modeling (except test oracle) + low diagnosis cost
60 60 Idea: Extend SFL with MBD • Combine best of both worlds • MBD – Reasoning approach based on behavioral comp models – High(er) diagnostic accuracy – Prohibitive (modeling and/or diagnosis) cost • SFL – Statistical based on execution spectra – Lower diagnostic accuracy: cannot reason over MF – No modeling (except test oracle) + low diagnosis cost
61 61 Working Example
62 62 SFL TARANTULA
63 63 Reasoning
64 64 Reasoning
65 65 Reasoning
66
67 Ranking Candidates • Probabilities updated according to Bayes’ rule – where
68 68 Ranking Candidates • Many ε -policies exist – Ideally • e.g., • ε = (1-h 1 ) . (1-h 2 ) . (1-h 1 ) . (1-h 2 ) . h 1 . h 2 – But estimating h j is far from trivial, hence approximations have been used so far (BAYES-A, [Abreu et al., WODA’08])
69 Barinel • Barinel’s key idea – for each d k , compute h j for the candidate’s faulty components that maximizes the probability Pr(e| d k ) of observations e occurring , conditioned on candidate d k
70 Barinel Algorithm c 1 c 2 c 3 e 1 1 0 1 (F) 0 1 1 1 (F) 1 0 0 1 (F) 1 0 1 0 (P) 1. Compute set of valid diagnosis candidates D = {d 1 = {1,2}, d 2 = {1,3}} • 2. Derive Pr(e|d) • Pr(e|d 1 ) = (1- h 1 . h 2 ) . (1 – h 2 ) . (1 – h 1 ) . h 1 • Pr(e|d 2 ) = (1 – h 1 ). (1 - h 3 ) . (1 – h 1 ) . h 3 . h 1
71 Barinel Algorithm 3. Compute h j by maximizing Pr (e|d) – Maximum likelihood estimation – Gradient ascent procedure – Pr(e|d 1 ): h 1 = 0.47 ; h 2 = 0.19 Pr(d 1 ) = 0.19 – Pr(e|d 2 ): h 1 = 0.41 ; h 3 = 0.50 Pr(d 2 ) = 0.04 4. Rank candidates according to Pr(d) – D = <{1,2}, {1,3}> – Inspection starts with components 1 and 2
72 LIVE DEMO • Requirements: – The Zoltar toolset (www.fdir.org/zoltar) – LLVM – OS: Linux • Because of Zoltar… – Soon • Will be available as an eclipse plugin • Support for Java • T. Janssen, R. Abreu, and A.J.C. van Gemund, Zoltar: A Toolset for Automatic Fault Localization . In Proceedings of the 24th International Conference on Automated Software Engineering (ASE'09) - Tools Track, pp. 662--664, Auckland, New Zealand, November 2009. IEEE Computer Society. ( Best Demo Award )
73 Model-based vs. Spectrum-based Model-based Spectrum-based • Model used primarily for • Model used primarily for reasoning error detection • All generated explanations • Ranking may contain invalid are valid explanations • Most likely diagnosis need • Invalid explanations may not be actual cause rank high • Well suited for hardware • Well suited for software
74 Outline • Part I – Diagnosis principles – Model-Based Diagnosis – Spectrum-Based Fault Localization – Live Demo • Part II – Existing systems – Lessons learned – Case studies – Further applications – Related work
75 Existing applications • PinPoint : large on-line transaction processing systems (search engines, web mail) [Chen02] • Tarantula : visualizing test information to aid manual debugging [Jones02] • Ochiai [TAIC PART07; JSS09] • Barinel [ASE09] • …
76 Similarity Coefficients • Jaccard (PinPoint) • Tarantula • Ochiai (molecular biology)
77 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01
78 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01
79 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + n 10 +n 01
80 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + n 01
81 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + 1
82 Diagnostic quality • Percentage of blocks that need not be inspected:
83 Discussion • Under the specific conditions of our experiment, Ochiai outperforms 8 other coefficients. • Why? • To what extent does this depend on the conditions of our experiment? – Quality of the passed / failed information – Numers of runs – Artificial bugs in Siemens set
84 Ochiai outperforms Tarantula n 11 /(n 11 +n 01 ) n 11 /(n 11 +n 01 ) + n 10 /(n 10 +n 00 ) n 11 >0 n 10 n 11 +n 01 1 / (1 + ) n 10 +n 00 n 11 n 10 n 11 +n 01 NF 1 / (1 + c ), with c = = n 11 n 10 +n 00 NP
85 Ochiai outperforms Tarantula Tarantula n 10 1 / (1 + c ) n 11 Ochiai n 11 √ ((n 11 +n 01 ).(n 11 +n 10 ))
86 Ochiai outperforms Tarantula Tarantula Only presence in passed runs n 10 lowers the similarity 1 / (1 + c ) n 11 Ochiai n 11 Absence in failed √ ((n 11 +n 01 ).(n 11 +n 10 )) runs also lowers the similarity
87 Ochiai outperforms Jaccard Jaccard n 11 n 11 +n 01 +n 10 Ochiai n 11 √ ((n 11 +n 01 ).(n 11 +n 10 ))
88 Ochiai outperforms Jaccard n 11 √ ((n 11 +n 01 )(n 11 +n 10 )) square n 11 2 ((n 11 +n 01 )(n 11 +n 10 )) rewrite denominator n 11 2 n 11 2 +n 11 n 10 +n 11 n 01 +n 01 n 10 eliminate a 11 n 11 None of these steps n 11 +n 10 +n 01 + n 01 n 10 /n 11 modifies the ranking!
89 Ochiai outperforms Jaccard Jaccard n 11 n 11 +n 01 +n 10 Ochiai differences are amplified n 11 n 11 +n 10 +n 01 + n 01 n 10 /n 11
90 Quality of the passed / failed info • Failure detection is a crude error detection mechanism. • q e = n 11 / (n 11 + n 10 ) • In the Siemens Set, q e ranges from 1.4% on average for schedule2 to 20.3% on average for tot_info. • Can be increased by excluding a run that contributes to n 10 • Can be decreased by excluding a run that contributes to n 11
91 Quality of the passed / failed info Small fraction of fault activations detected is enough
92 Number of runs • On average, for the Siemens set: – Adding more failed tests is safe – 6 failed tests are enough – The number of passed tests has no influence • However: – For individual runs the effect of adding passed tests differs – It stabilizes around 20 passed tests
93 Influence of #runs
94 Influence of #runs • On average, for our benchmark: – Adding failed runs is safe – 6 failed runs is enough – The number of passed runs has no influence • However – For individual runs, the effect of more passed runs differs – It stabilizes around 20
95 Dependence on Siemens set faults • Investigate industrial relevance in TRADER project: improve the user-perceived reliability of high-volume consumer electronics devices • Test case: television platform from NXP • Partners: – Universities of Delft, Twente, Leiden, – Embedded Systems Institute, Design Technology Institute, IMEC Leuven – NXP (former Philips Semiconductors)
96 Embedded systems • Low overhead • Little infrastructure needed • Consumer electronics – No time for exhaustive debugging – Helps to identify responsible teams / developers • Diagnosis can drive a recovery mechanism, e.g., rebooting suspect processes
97 Case study – platform • Control software of an analog TV • Decoding RC input, displays the on-screen menu, teletext, optimizes parameters for audio / video processing based on signal analysis, etc. • 450 K lines of C code • 2 MB of RAM + 2 MB in development version • CPU: MIPS running a small multi-tasking OS • Work is organized in 315 logical threads • UART connection to a PC
98 Case study 1. Load problem: TV TXT TV
99 Diagnosis • 150 hit spectra of 315 functions, corresponding to the logical threads (one per second): 60 sec. TV, 30 sec. TXT, 60 sec. TV • Marked the last 60 spectra as failed • 2 nd in ranking of 315 functions
100 Case study 2. Teletext lock-up: – Existing problem in another product line – Copied to our platform, triggered by a remote control key sequence – Inconsistency in two state variables, for which only specific combinations are allowed
Recommend
More recommend