evaluating and improving fault localization
play

Evaluating and Improving Fault Localization Spencer Pearson - PowerPoint PPT Presentation

Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Evaluating and Improving Fault Localization Spencer


  1. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Evaluating and Improving Fault Localization Spencer Pearson Michael Ernst

  2. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Debugging is expensive Your program has a bug. What do you do? ● Reproduce it ● Locate it Focus of this talk ● Fix it

  3. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Fault localization as a black box Passing tests (3) c = foo; Line (1) u = bar(); Failing tests ranking Fault localization (4) while (c < u) tool (2) c = c.baz(); Program (5) return c; c = foo; u = bar(); while (c < u) c = c.baz(); return c;

  4. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Agenda ● Spectrum-based and mutant-based fault localization ● Evaluating fault localization techniques ● Fault provenance: are artificial faults good proxies for real faults? No! ➢ Why not? ➢ What matters on real faults, then? ➢ Doing better ➢

  5. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Let’s design a FL technique! if (unflushedValues > 0) { if (index >= 0 && !this.allowDuplicateXValues) { XYDataItem existing = (XYDataItem) this.data.get(index); try { overwritten = (XYDataItem) existing.clone(); } catch (CloneNotSupportedException e) { throw new SeriesException("Couldn't clone XYDataItem!"); } existing.setY(y); } ... More Os ⇒ more suspicious More Os ⇒ less suspicious

  6. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Let’s design a FL technique! For each statement Line# Susp. Line# # - 1 0.2 7 λ 2 0.5 sort 6 # - 3 0.0 2 ... ... ... weighting factors

  7. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? There are many variants on spectrum-based FL: Ochiai [1] Tarantula [2] D* [3] [1] R. Abreu, P. Zoeteweij, and A. J. C. van Gemund. An evaluation of similarity coefficients for software fault localization. [2] J. Jones, M. J. Harrold, and J. Stasko. Visualization of test information to assist fault localization. [3] W. E. Wong, V. Debroy, R. Gao, and Y. Li. The DStar method for effective software fault localization.

  8. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Another approach to FL: “mutation-based” def f(arg): def f(arg): def f(arg): def f(arg): if arg in cache: if arg not in cache: if arg in cache: if arg in cache: return cache[arg] return cache[arg] return cache[arg] return cache[arg] ... ... ... ... cache[arg] = (start+stop)/2 cache[arg] = (start+stop)/2 cache[arg] = (start -stop)/2 cache[arg] = (start+stop)/2 cache.sync() cache.sync() cache.sync() cache.sync() return (start+stop+1)/2 return (start+stop+1)/2 return (start+stop+1)/2 return (start+stop+ 0)/2 def f(arg): def f(arg): def f(arg): if None in cache: if arg in cache: if arg in cache: More ⇒ more suspicious return cache[arg] return cache[arg] return cache[arg] More ⇒ less suspicious ... ... ... cache[arg] = (start+stop)/2 cache[arg] = (start+stop) *2 cache[arg] = (start+stop)/2 cache.sync() cache.sync() cache.sync() return (start+stop+1)/2 return (start+stop+1)/2 return (start /stop+1)/2 def f(arg): def f(arg): def f(arg): if arg in None: if arg in cache: if arg in cache: return cache[arg] return cache[arg] return cache[arg] ... ... ... cache[arg] = (start+stop)/2 cache[arg] = (start+stop) +2 cache[arg] = (start+stop)/2 cache.sync() cache.sync() cache.sync() return (start+stop+1)/2 return (start+stop+1)/2 return (start -stop+1)/2

  9. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Another approach to FL: “mutation-based” For each mutant Mut# Susp. Line# Susp. Line# # - 1 0.1 1 0.2 7 λ 2 0.6 collect 2 0.5 sort 6 # - 3 0.1 3 0.0 2 ... ... ... ... ... weighting factors

  10. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? There are few variants on mutation-based FL: Metallaxis [1] MUSE [2] collect λ [1] M. Papadakis and Y. Le Traon. Metallaxis-FL: Mutation-based fault localization. [2] S. Moon, Y. Kim, M. Kim, and S. Yoo. Ask the mutants: Mutating faulty programs for fault localization.

  11. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? How do you tell whether a FL technique is good? Program + Program + Program + Program + Defect Tests + Tests + Tests + Tests + Defect knowledge Defect knowledge Defect knowledge Defect knowledge Find defect in ranking Passing tests Score (smaller = better) (3) u = foo; Line avg Failing tests ranking 4/90 0.04 (1) c = bar(); 3/5 FL 3/5 FL FL (4) while (c < u) 3/5 0.05 Program 3/5 (2) c = c.baz(); ... 3/5 3/5 0.01 Blue technique is the best FL technique

  12. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? How do you get defect information for evaluation? ● Artificial faults (mutants) Program + Program + Program + Tests + Tests + Tests + + Easy to make lots of faults Defect knowledge Used by previous Defect knowledge Defect knowledge + Easy to reason about research - Not necessarily realistic int x; int x; int sum; int sum; ● Real faults (from issue trackers) int iters; Provided by the - Hard to collect; fewer faults sum = xs[0]; sum = xs[0]; recent project ... ... - Diverse and complicated Defects4J [1] + Reflect real-world use cases [1] Just et al. "Defects4J: A database of existing faults to enable controlled testing studies for Java programs." ISSTA 2014 Proceedings . ACM, 2014.

  13. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Are artificial faults good substitutes for real faults? A FL technique that does well on artificial faults may do badly on real ones! We: ● generated many artificial faults by mutating fixed statements ● repeated previous comparisons ○ on artificial faults ○ on real faults SBFL-SBFL Do the same techniques win on both? No! MBFL-SBFL

  14. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Are artificial faults good substitutes for real faults? (No!) Artificial faults Real faults better

  15. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Why the difference? ● Real faults often involve unmutatable lines (e.g. break , return ) ● MBFL does very well on “reversible” artificial faults create fault mutate sum = sum + x sum = sum - x sum = sum + x

  16. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Common structure For each mutant Line# Susp. Line# # - 1 0.2 7 λ 2 0.5 sort 6 # - 3 0.0 2 ... ... ... weighting factors

  17. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Common structure For each mutant Mut# Susp. Line# Susp. Line# # - 1 0.1 1 0.2 7 λ 2 0.6 collect 2 0.5 sort 6 # - 3 0.1 3 0.0 2 ... ... ... ... ... weighting factors

  18. Motivation Black-box model Approaches Evaluation Artificial vs. real faults Failure modes Design space New techniques Summary Spectrum Mutant ...Evaluation Replication What matters? Common structure For each element Elem# Susp. Line# Susp. Line# # - 1 ... 1 0.2 7 λ 2 ... collect 2 0.5 sort 6 # - 3 ... 3 0.0 2 ( identity for SBFL) ... ... ... ... ... weighting factors

Recommend


More recommend