fault diagnosis of software systems
play

Fault Diagnosis of Software Systems Rui Abreu Dept. of Informatics - PowerPoint PPT Presentation

Fault Diagnosis of Software Systems Rui Abreu Dept. of Informatics Engineering Faculty of Engineering University of Porto Thanks: Peter Zoeteweij, Tom Janssen, Arjan J.C. van Gemund Johan de Kleer, Wolfgang Mayer 2 About the speaker


  1. 44 Fault diagnosis 1. Spectra for N test cases M components x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 N cases … … ... ... … x N1 x N2 … x NM e N

  2. 45 Fault diagnosis 1. Spectra for M test cases x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N Row i : the blocks that are executed in case i

  3. 46 Fault diagnosis 1. Spectra for M test cases x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N Column j : the test cases in which block j was executed

  4. 47 Fault diagnosis 1. Spectra for M test cases 2. Error detection per test case x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N e i =1 : error in the i -th test e i =0 : no error in the i -th test

  5. 48 Statistics-based Fault diagnosis Compare every column vector with the error vector. error vector block j x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 … … ... ... … x N1 x N2 … x NM e N similarity s j

  6. 49 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01

  7. 50 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01

  8. 51 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + n 10 +n 01

  9. 52 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + n 01

  10. 53 Statistics-based Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + 1

  11. 54 Statistics-based Fault diagnosis For every block: similarity with the error “block” M components error vector x 11 x 12 … x 1M e 1 x 21 x 22 ... x 2M e 2 m cases … … ... ... … x N1 x N2 … x NM e N s 1 s 2 … s M The component with the highest s i most likely contains the fault.

  12. 55 Statistics-based Fault diagnosis n 11 s = n 11 +n 10 +n 01 component a b c d e f g fail test 1 0 1 1 1 1 0 0 0 test 2 0 0 0 1 0 1 1 1 test 3 1 1 1 1 0 0 0 1 test 4 0 0 0 0 0 0 0 0 test 5 1 1 0 1 1 0 1 1 ⅔ ½ ¼ ¾ ¼ ⅓ ⅔

  13. 56 Example: rational bubble sort void RationalSort( int n, int *num, int *den ) { int i,j; /* block 1 */ for ( i=n-1; i>=0; i-- ) { assert( den[i] != 0 ); /* block 2 */ for ( j=0; j<i; j++ ) { if ( RationalGT( num[j], den[j], /* block 3 */ num[j+1], den[j+1] ) ) { swap( &num[j], &num[j+1] ); /* block 4 */ /* swap( &den[j], &den[j+1] ); */ } } } } Fault : forgot to swap denominators Error : sequence is not a permutation of input sequence Failure : output is not a sorted version of the input

  14. 57 Example (2) earlier example 0 2 4 4 2 0 2 4 0 2 0 4 1 2 1 1 2 1 1 2 1 1 2 1 ERROR!

  15. 58 Example (3) Block 4 has highest similarity coefficient -> most likely suspect

  16. 59 Reasoning-based Fault Diagnosis • MBD – Reasoning approach based on behavioral comp models – High(er) diagnostic accuracy – Prohibitive (modeling and/or diagnosis) cost • SFL – Statistical based on execution spectra – Lower diagnostic accuracy: cannot reason over multiple faults – No modeling (except test oracle) + low diagnosis cost

  17. 60 60 Idea: Extend SFL with MBD • Combine best of both worlds • MBD – Reasoning approach based on behavioral comp models – High(er) diagnostic accuracy – Prohibitive (modeling and/or diagnosis) cost • SFL – Statistical based on execution spectra – Lower diagnostic accuracy: cannot reason over MF – No modeling (except test oracle) + low diagnosis cost

  18. 61 61 Working Example

  19. 62 62 SFL TARANTULA

  20. 63 63 Reasoning

  21. 64 64 Reasoning

  22. 65 65 Reasoning

  23. 66

  24. 67 Ranking Candidates • Probabilities updated according to Bayes’ rule – where

  25. 68 68 Ranking Candidates • Many ε -policies exist – Ideally • e.g., • ε = (1-h 1 ) . (1-h 2 ) . (1-h 1 ) . (1-h 2 ) . h 1 . h 2 – But estimating h j is far from trivial, hence approximations have been used so far (BAYES-A, [Abreu et al., WODA’08])

  26. 69 Barinel • Barinel’s key idea – for each d k , compute h j for the candidate’s faulty components that maximizes the probability Pr(e| d k ) of observations e occurring , conditioned on candidate d k

  27. 70 Barinel Algorithm c 1 c 2 c 3 e 1 1 0 1 (F) 0 1 1 1 (F) 1 0 0 1 (F) 1 0 1 0 (P) 1. Compute set of valid diagnosis candidates D = {d 1 = {1,2}, d 2 = {1,3}} • 2. Derive Pr(e|d) • Pr(e|d 1 ) = (1- h 1 . h 2 ) . (1 – h 2 ) . (1 – h 1 ) . h 1 • Pr(e|d 2 ) = (1 – h 1 ). (1 - h 3 ) . (1 – h 1 ) . h 3 . h 1

  28. 71 Barinel Algorithm 3. Compute h j by maximizing Pr (e|d) – Maximum likelihood estimation – Gradient ascent procedure – Pr(e|d 1 ): h 1 = 0.47 ; h 2 = 0.19  Pr(d 1 ) = 0.19 – Pr(e|d 2 ): h 1 = 0.41 ; h 3 = 0.50  Pr(d 2 ) = 0.04 4. Rank candidates according to Pr(d) – D = <{1,2}, {1,3}> – Inspection starts with components 1 and 2

  29. 72 LIVE DEMO • Requirements: – The Zoltar toolset (www.fdir.org/zoltar) – LLVM – OS: Linux • Because of Zoltar… – Soon • Will be available as an eclipse plugin • Support for Java • T. Janssen, R. Abreu, and A.J.C. van Gemund, Zoltar: A Toolset for Automatic Fault Localization . In Proceedings of the 24th International Conference on Automated Software Engineering (ASE'09) - Tools Track, pp. 662--664, Auckland, New Zealand, November 2009. IEEE Computer Society. ( Best Demo Award )

  30. 73 Model-based vs. Spectrum-based Model-based Spectrum-based • Model used primarily for • Model used primarily for reasoning error detection • All generated explanations • Ranking may contain invalid are valid explanations • Most likely diagnosis need • Invalid explanations may not be actual cause rank high • Well suited for hardware • Well suited for software

  31. 74 Outline • Part I – Diagnosis principles – Model-Based Diagnosis – Spectrum-Based Fault Localization – Live Demo • Part II – Existing systems – Lessons learned – Case studies – Further applications – Related work

  32. 75 Existing applications • PinPoint : large on-line transaction processing systems (search engines, web mail) [Chen02] • Tarantula : visualizing test information to aid manual debugging [Jones02] • Ochiai [TAIC PART07; JSS09] • Barinel [ASE09] • …

  33. 76 Similarity Coefficients • Jaccard (PinPoint) • Tarantula • Ochiai (molecular biology)

  34. 77 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01

  35. 78 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 n 11 s j = n 11 +n 10 +n 01

  36. 79 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + n 10 +n 01

  37. 80 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + n 01

  38. 81 Fault diagnosis Jaccard similarity coefficient: error vector block j 1 0 0 1 1 1 0 0 1 1 2 s j = 2 + 1 + 1

  39. 82 Diagnostic quality • Percentage of blocks that need not be inspected:

  40. 83 Discussion • Under the specific conditions of our experiment, Ochiai outperforms 8 other coefficients. • Why? • To what extent does this depend on the conditions of our experiment? – Quality of the passed / failed information – Numers of runs – Artificial bugs in Siemens set

  41. 84 Ochiai outperforms Tarantula n 11 /(n 11 +n 01 ) n 11 /(n 11 +n 01 ) + n 10 /(n 10 +n 00 ) n 11 >0 n 10 n 11 +n 01 1 / (1 + ) n 10 +n 00 n 11 n 10 n 11 +n 01 NF 1 / (1 + c ), with c = = n 11 n 10 +n 00 NP

  42. 85 Ochiai outperforms Tarantula Tarantula n 10 1 / (1 + c ) n 11 Ochiai n 11 √ ((n 11 +n 01 ).(n 11 +n 10 ))

  43. 86 Ochiai outperforms Tarantula Tarantula Only presence in passed runs n 10 lowers the similarity 1 / (1 + c ) n 11 Ochiai n 11 Absence in failed √ ((n 11 +n 01 ).(n 11 +n 10 )) runs also lowers the similarity

  44. 87 Ochiai outperforms Jaccard Jaccard n 11 n 11 +n 01 +n 10 Ochiai n 11 √ ((n 11 +n 01 ).(n 11 +n 10 ))

  45. 88 Ochiai outperforms Jaccard n 11 √ ((n 11 +n 01 )(n 11 +n 10 )) square n 11 2 ((n 11 +n 01 )(n 11 +n 10 )) rewrite denominator n 11 2 n 11 2 +n 11 n 10 +n 11 n 01 +n 01 n 10 eliminate a 11 n 11 None of these steps n 11 +n 10 +n 01 + n 01 n 10 /n 11 modifies the ranking!

  46. 89 Ochiai outperforms Jaccard Jaccard n 11 n 11 +n 01 +n 10 Ochiai differences are amplified n 11 n 11 +n 10 +n 01 + n 01 n 10 /n 11

  47. 90 Quality of the passed / failed info • Failure detection is a crude error detection mechanism. • q e = n 11 / (n 11 + n 10 ) • In the Siemens Set, q e ranges from 1.4% on average for schedule2 to 20.3% on average for tot_info. • Can be increased by excluding a run that contributes to n 10 • Can be decreased by excluding a run that contributes to n 11

  48. 91 Quality of the passed / failed info Small fraction of fault activations detected is enough

  49. 92 Number of runs • On average, for the Siemens set: – Adding more failed tests is safe – 6 failed tests are enough – The number of passed tests has no influence • However: – For individual runs the effect of adding passed tests differs – It stabilizes around 20 passed tests

  50. 93 Influence of #runs

  51. 94 Influence of #runs • On average, for our benchmark: – Adding failed runs is safe – 6 failed runs is enough – The number of passed runs has no influence • However – For individual runs, the effect of more passed runs differs – It stabilizes around 20

  52. 95 Dependence on Siemens set faults • Investigate industrial relevance in TRADER project: improve the user-perceived reliability of high-volume consumer electronics devices • Test case: television platform from NXP • Partners: – Universities of Delft, Twente, Leiden, – Embedded Systems Institute, Design Technology Institute, IMEC Leuven – NXP (former Philips Semiconductors)

  53. 96 Embedded systems • Low overhead • Little infrastructure needed • Consumer electronics – No time for exhaustive debugging – Helps to identify responsible teams / developers • Diagnosis can drive a recovery mechanism, e.g., rebooting suspect processes

  54. 97 Case study – platform • Control software of an analog TV • Decoding RC input, displays the on-screen menu, teletext, optimizes parameters for audio / video processing based on signal analysis, etc. • 450 K lines of C code • 2 MB of RAM + 2 MB in development version • CPU: MIPS running a small multi-tasking OS • Work is organized in 315 logical threads • UART connection to a PC

  55. 98 Case study 1. Load problem: TV TXT TV

  56. 99 Diagnosis • 150 hit spectra of 315 functions, corresponding to the logical threads (one per second): 60 sec. TV, 30 sec. TXT, 60 sec. TV • Marked the last 60 spectra as failed • 2 nd in ranking of 315 functions

  57. 100 Case study 2. Teletext lock-up: – Existing problem in another product line – Copied to our platform, triggered by a remote control key sequence – Inconsistency in two state variables, for which only specific combinations are allowed

Recommend


More recommend