Scent Intensification for Testing & Debugging Rui Abreu

Economic Relevance [Embedded] Software • Exponential increase LOC • Despite thorough design / testing, constant fault density • Typically 5-15bugs / KLOC, 75 min / bug ➤ $4K/KLOC • Development cost $15-30K / KLOC ➤ 15-25% diagnostic cost • Residual defects cost US $60B/year [NIST 2002] • estimated 20% due to fault diagnosis (downtime, labor) •

The birth of debugging: your guess?

Software Errors mentioned in Ada Byron’s notes on Charles Bababage’s analytical engine 1840 2015

First actual bug and actual debugging: Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University 1840 1947 2015 S

UNIVAC 1100’s FLIT - Fault Localization by Interpretive Testing 1840 1947 1962 2015 S

Weiser’s Breakthrough paper. Input: source code and program point 1840 1947 1962 1981 2015 S

Stallman’s GDB Input : faulty program and 1 failed test case 1840 1947 1962 1981 1986 2015 W S

Korel and Laski’s dynamic slicing Agrawal Input: source code and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S

DDD Input: faulty program and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S

Delta Debugging Input: faulty program, 1 failed and 1 passed test case 1840 1947 1962 1981 1986 1988 1993 1996 2015 W S S

Statistical Debugging Input: faulty program, test suite 1840 1947 1962 1981 1986 1988 1993 1996 2002 2015 W S S

E Z U NIT S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S

VIDA E S 2009 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S

E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S

Also a survey paper is under review at TSE. More than 300 works cited. E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S

Focus of this talk • Techniques that take into account spectra • aka abstraction of program traces • Spectrum-based Fault Localization ( SFL ) • Statistical vs. reasoning • Lightweight, scalable

Integrates well with testing SFL: Principle (1) 1 Test 2 3 4 5 suite 6 t1 t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 0 0 0 0 0 0 0 0 0 0 0 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

Integrates well with testing SFL: Principle (2) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 1 1 0 0 0 1 1 1 1 0 1 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

Integrates well with testing SFL: Principle (3) 1 Test Status 2 3 4 5 suite t1 ! 6 7 8 9 t2 ! t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

Integrates well with testing SFL: Principle (4) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail

Integrates well with testing SFL: Principle (5) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail

Integrates well with testing SFL: Principle (6) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail

Integrates well with testing SFL: Principle (7) 1 Status 2 3 4 5 6 t1 ! Components are 7 8 9 t2 ! ranked according to t3 " the likelihood of t4 ! causing detected errors 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail

Program Test Suite class Triangle {… t 6 Suspiciousness t 1 t 2 t 3 t 4 t 5 static int type(int a, int b, int c) { int type = SCALENE; 0.09998 if ( (a == b) && (b == c) ) 0.09998 type = EQUILATERAL; 0.10001 else if ( (a*a) == ((b*b) + (c*c)) ) 0.09999 type = RIGHT; 0.10001 else if ( (a == b) || (b == a) ) /* FAULT */ 0.10000 type = ISOSCELES; 0.10001 return type; } 0.09998 static double area(int a, int b, int c) { double s = (a+b+c)/2.0; 0.10000 return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... } 0.10000 *Fault* Spectra 25

Suspiciousness score • Each component (row) is ranked according to their similarity to the error vector • Many similarity coefficients exist. • Ochiai similarity is equivalent to the cosine of the angle between two vectors in a n-dimensional space Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780-1792. Lucia, L., Lo, D., Jiang, L., Thung, F., & Budi, A. (2014). Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 26(2).

Diagnostic Performance Rank Line Suspicious Statement number Suspiciousness Position 1º type = EQUILATERAL; 3 0.10001 2º type = RIGHT; 5 0.10001 = 4 C d 3º type = ISOSCELES; 7 0.10001 4º else if ( (a == b) || (b == a) ) /* FAULT */ 6 0.10000 5º double s = (a+b+c)/2.0; 9 0.10000 6º return Math.sqrt(s*(s-a)*(s-b)*(s-c)); 10 0.10000 7º else if ( (a*a) == ((b*b) + (c*c)) ) 4 0.09999 8º int type = SCALENE; 1 0.09998 9º if ( (a == b) && (b == c) ) 2 0.09998 10º return type; } 8 0.09998 13 R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09

Can we do better? • Statistics-based SFL does not reason in terms of multiple faults c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) 1 0 1 1 (F) 0 1 1 1 (F) 1 1 0 0 (P) Diagnostic report = < c 3 , c 1 , c 2 >

Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) c 1 must be faulty c 2 cannot be single fault 0 1 0 1 (F) c 3 cannot be single fault 1 0 1 1 (F) c 2 , c 3 cannot be double fault 0 1 1 1 (F) 1 1 0 0 (P)

Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) c 2 must be faulty c 1 cannot be single fault 1 0 1 1 (F) c 1 cannot be single fault 0 1 1 1 (F) c 1 , c 3 cannot be double fault 1 1 0 0 (P)

Reasoning-based Approach • Barinel is a spectrum-based reasoning approach • Integrates the best of model-based diagnosis with spectra Summary: c 1 c 2 c 3 P/F c1, c2 faulty, but not single-fault c1, c2 can be double-fault 1 0 0 1 (F) c1,c3 nor c2,c3 can be double-fault 0 1 0 1 (F) so {c1,c2} is the only diagnosis possible 1 0 1 1 (F) (subsuming the triple fault {c1,c2,c3}) 0 1 1 1 (F) 1 1 0 0 (P)

Spectrum-based reasoning 1. Generate sets of components that explain observed erroneous behavior • Equivalent to compute minimal hitting set (Staccato/MHS2**) • Given failed executions 2. Rank candidates according to their probability of being the true fault explanation ➤ Baye’s rule • Given both passed and failed executions R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09 **https://github.com/npcardoso/MHS2 (citable via https://zenodo.org/record/10037) ➤ contribute to the project; send pull requests; email us!

Diagnostic Performance 100 75 % of faulty versions 50 25 0 0 20 40 60 80 100 Effort (% of program to be examined to find the fault) Worst technique Ideal technique

100 75 Intersection % of faulty versions Union NN DD 50 Tarantula Ochiai Sober CrossTab 25 PPDG Barinel 0 0 10 20 30 40 50 60 70 80 90 100 Effort (% of program to be examined to find the fault)

No similarity coefficient is statistically significantly better!

How good are we? • Best Performing techniques still require to inspect 10% of the code… • 100 LOC ➤ 10LOC • 10,000 LOC ➤ 1,000LOC • 1,000,000 LOC ➤ 10,000LOC

Case Studies (NXP) Case To Inspect Out of / Previous Load Problem 2 logical threads 315 Teletext Lock-Up 2 blocks 60K NVM corrupt 96 blocks, 10 files 150K, 1.8K Scrolling Bug 5 blocks 150K Invisible Pages 12 blocks 150K Tuner Problem 2 files 1.8K Zapping Crash 1 run (15 mins) 1 day (develop) Wrong Audio 1 run (15 mins) ½ day (expert)

Humm…. • Are we properly quantifying diagnostic accuracy? • Comparing techniques based on the rankings • Assuming perfect bug understanding • Are we showing providing an ecosystem offering this techniques?

Human Studies Parnin & Orso et al observed that there is a lack of human studies! (ISSTA’11)

Previously known as GZoltar Crowbar — http://www.crowbar.io —

Visualizations

Scent Intensification for Testing & Debugging Rui Abreu - PowerPoint PPT Presentation

Scent Intensification for Testing & Debugging Rui Abreu Economic Relevance [Embedded] Software Exponential increase LOC Despite thorough design / testing, constant fault density Typically 5-15bugs / KLOC, 75 min / bug

Floral Fragrances P. Hugueney, ENSL What is flower scent ? Flower scent analysis :"dynamic

Debugging Debugging Tools Module Overview Introduction to Debugging Problems in Production

Coroutines Update Seva Tolstopyatov @qwwdfsad October 13, 2020 Coroutines debugging Coroutines

Debugging Debugging with High Level Languages Same goals as low-level debugging Examine and

WHITBY INTENSIFICATION STRATEGY Recommended Basis for Moving Forward Presentation to Committee of

Intensification & Fisher-Heights FHACA Annual General Meeting 29 September 2014

Minto Properties - Intensification Projects Minto Properties Intensification Projects Leslie and

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Testing and Debugging for Concurrent Programs Yi-Fan Tsai yifan.tsai@colorado.edu Concurrency

Software Testing Overview What is software testing? General testing criteria Testing

Levels of Testing Chapter 12 Beyond unit testing Developer Testing stages Unit testing

Testing Terminology System testing Types of errors Function testing Structure

Debugging microservices in production Bryan Cantrill CTO bryan@joyent.com @bcantrill

Scalable Post-Mortem Debugging Abel Mathew CEO - Backtrace amathew@backtrace.io @nullisnt0

Embedded Software TI2726-B 8. Debugging techniques Koen Langendoen Embedded Software Group

Kernel Debugging and Virtualization John Baldwin January 15, 2015 What is Kernel Debugging

Overview of Software Debugging Abhik Roychoudhury National University of Singapore

PARTICLE PHYSICS LESSON FROM CORE-COLLAPSE SUPERNOVAE Alessandro MIRIZZI University of BARI,

TWO-DIMENSIONAL STELLAR EVOLUTION WITH 2DStars Introduction & Applications GHINA M. HALABI

Black hole X-ray binaries V: Formation and evolution of black hole binaries Thomas J. Maccarone

Evolution 02-715 Advanced Topics in Computa8onal Genomics

Mark Balaguer Department of Philosophy California State University, Los Angeles What

UNSUPERWISED LABELLING OF EMAILS By: Vishal Kumawat 10818 Dibya Ranjan 10243 MOTIVATION

Aperiodic Tilings: Notions and Properties Michael Baake & Uwe Grimm Faculty of Mathematics