Scent Intensification for Testing & Debugging Rui Abreu
Economic Relevance [Embedded] Software • Exponential increase LOC • Despite thorough design / testing, constant fault density • Typically 5-15bugs / KLOC, 75 min / bug ➤ $4K/KLOC • Development cost $15-30K / KLOC ➤ 15-25% diagnostic cost • Residual defects cost US $60B/year [NIST 2002] • estimated 20% due to fault diagnosis (downtime, labor) •
The birth of debugging: your guess?
Software Errors mentioned in Ada Byron’s notes on Charles Bababage’s analytical engine 1840 2015
First actual bug and actual debugging: Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University 1840 1947 2015 S
UNIVAC 1100’s FLIT - Fault Localization by Interpretive Testing 1840 1947 1962 2015 S
Weiser’s Breakthrough paper. Input: source code and program point 1840 1947 1962 1981 2015 S
Stallman’s GDB Input : faulty program and 1 failed test case 1840 1947 1962 1981 1986 2015 W S
Korel and Laski’s dynamic slicing Agrawal Input: source code and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S
DDD Input: faulty program and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S
Delta Debugging Input: faulty program, 1 failed and 1 passed test case 1840 1947 1962 1981 1986 1988 1993 1996 2015 W S S
Statistical Debugging Input: faulty program, test suite 1840 1947 1962 1981 1986 1988 1993 1996 2002 2015 W S S
E Z U NIT S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S
VIDA E S 2009 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S
E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S
Also a survey paper is under review at TSE. More than 300 works cited. E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S
Focus of this talk • Techniques that take into account spectra • aka abstraction of program traces • Spectrum-based Fault Localization ( SFL ) • Statistical vs. reasoning • Lightweight, scalable
Integrates well with testing SFL: Principle (1) 1 Test 2 3 4 5 suite 6 t1 t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 0 0 0 0 0 0 0 0 0 0 0 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail
Integrates well with testing SFL: Principle (2) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 1 1 0 0 0 1 1 1 1 0 1 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail
Integrates well with testing SFL: Principle (3) 1 Test Status 2 3 4 5 suite t1 ! 6 7 8 9 t2 ! t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail
Integrates well with testing SFL: Principle (4) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail
Integrates well with testing SFL: Principle (5) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail
Integrates well with testing SFL: Principle (6) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail
Integrates well with testing SFL: Principle (7) 1 Status 2 3 4 5 6 t1 ! Components are 7 8 9 t2 ! ranked according to t3 " the likelihood of t4 ! causing detected errors 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail
Program Test Suite class Triangle {… t 6 Suspiciousness t 1 t 2 t 3 t 4 t 5 static int type(int a, int b, int c) { int type = SCALENE; 0.09998 if ( (a == b) && (b == c) ) 0.09998 type = EQUILATERAL; 0.10001 else if ( (a*a) == ((b*b) + (c*c)) ) 0.09999 type = RIGHT; 0.10001 else if ( (a == b) || (b == a) ) /* FAULT */ 0.10000 type = ISOSCELES; 0.10001 return type; } 0.09998 static double area(int a, int b, int c) { double s = (a+b+c)/2.0; 0.10000 return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... } 0.10000 *Fault* Spectra 25
Suspiciousness score • Each component (row) is ranked according to their similarity to the error vector • Many similarity coefficients exist. • Ochiai similarity is equivalent to the cosine of the angle between two vectors in a n-dimensional space Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780-1792. Lucia, L., Lo, D., Jiang, L., Thung, F., & Budi, A. (2014). Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 26(2).
Diagnostic Performance Rank Line Suspicious Statement number Suspiciousness Position 1º type = EQUILATERAL; 3 0.10001 2º type = RIGHT; 5 0.10001 = 4 C d 3º type = ISOSCELES; 7 0.10001 4º else if ( (a == b) || (b == a) ) /* FAULT */ 6 0.10000 5º double s = (a+b+c)/2.0; 9 0.10000 6º return Math.sqrt(s*(s-a)*(s-b)*(s-c)); 10 0.10000 7º else if ( (a*a) == ((b*b) + (c*c)) ) 4 0.09999 8º int type = SCALENE; 1 0.09998 9º if ( (a == b) && (b == c) ) 2 0.09998 10º return type; } 8 0.09998 13 R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09
Can we do better? • Statistics-based SFL does not reason in terms of multiple faults c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) 1 0 1 1 (F) 0 1 1 1 (F) 1 1 0 0 (P) Diagnostic report = < c 3 , c 1 , c 2 >
Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) c 1 must be faulty c 2 cannot be single fault 0 1 0 1 (F) c 3 cannot be single fault 1 0 1 1 (F) c 2 , c 3 cannot be double fault 0 1 1 1 (F) 1 1 0 0 (P)
Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) c 2 must be faulty c 1 cannot be single fault 1 0 1 1 (F) c 1 cannot be single fault 0 1 1 1 (F) c 1 , c 3 cannot be double fault 1 1 0 0 (P)
Reasoning-based Approach • Barinel is a spectrum-based reasoning approach • Integrates the best of model-based diagnosis with spectra Summary: c 1 c 2 c 3 P/F c1, c2 faulty, but not single-fault c1, c2 can be double-fault 1 0 0 1 (F) c1,c3 nor c2,c3 can be double-fault 0 1 0 1 (F) so {c1,c2} is the only diagnosis possible 1 0 1 1 (F) (subsuming the triple fault {c1,c2,c3}) 0 1 1 1 (F) 1 1 0 0 (P)
Spectrum-based reasoning 1. Generate sets of components that explain observed erroneous behavior • Equivalent to compute minimal hitting set (Staccato/MHS2**) • Given failed executions 2. Rank candidates according to their probability of being the true fault explanation ➤ Baye’s rule • Given both passed and failed executions R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09 **https://github.com/npcardoso/MHS2 (citable via https://zenodo.org/record/10037) ➤ contribute to the project; send pull requests; email us!
Diagnostic Performance 100 75 % of faulty versions 50 25 0 0 20 40 60 80 100 Effort (% of program to be examined to find the fault) Worst technique Ideal technique
100 75 Intersection % of faulty versions Union NN DD 50 Tarantula Ochiai Sober CrossTab 25 PPDG Barinel 0 0 10 20 30 40 50 60 70 80 90 100 Effort (% of program to be examined to find the fault)
No similarity coefficient is statistically significantly better!
How good are we? • Best Performing techniques still require to inspect 10% of the code… • 100 LOC ➤ 10LOC • 10,000 LOC ➤ 1,000LOC • 1,000,000 LOC ➤ 10,000LOC
Case Studies (NXP) Case To Inspect Out of / Previous Load Problem 2 logical threads 315 Teletext Lock-Up 2 blocks 60K NVM corrupt 96 blocks, 10 files 150K, 1.8K Scrolling Bug 5 blocks 150K Invisible Pages 12 blocks 150K Tuner Problem 2 files 1.8K Zapping Crash 1 run (15 mins) 1 day (develop) Wrong Audio 1 run (15 mins) ½ day (expert)
Humm…. • Are we properly quantifying diagnostic accuracy? • Comparing techniques based on the rankings • Assuming perfect bug understanding • Are we showing providing an ecosystem offering this techniques?
Human Studies Parnin & Orso et al observed that there is a lack of human studies! (ISSTA’11)
Previously known as GZoltar Crowbar — http://www.crowbar.io —
Visualizations
Recommend
More recommend