predicting fix locations from bug reports
play

Predicting Fix Locations from Bug Reports Master Thesis by Markus - PowerPoint PPT Presentation

Predicting Fix Locations from Bug Reports Master Thesis by Markus Thiele (Supervised by Rahul Premraj and Tom Zimmermann) 1 Deadline for thesis: 2009-03-31 2 Motivation. 3 4 A typical Bugzilla bug report. Questions Who should fix this


  1. Predicting Fix Locations from Bug Reports Master Thesis by Markus Thiele (Supervised by Rahul Premraj and Tom Zimmermann) 1 Deadline for thesis: 2009-03-31

  2. 2 Motivation.

  3. 3

  4. 4 A typical Bugzilla bug report.

  5. Questions • Who should fix this bug? (Anvik, Hiew, and Murphy 2006) • How long will it take to fix this bug? (Weiss, Premraj, Zimmermann, and Zeller 2007) • Where should I fix this bug? 5 Previous, related work, and finally the question we focus on.

  6. The Problem • More than 35400 files in Eclipse • More than 1390 packages in Eclipse • Developer time is expensive 6 Some impression of the size of the search space.

  7. The Vision 7

  8. 8

  9. Likely Fix Locations: • FontDialog.java • FontData.java • FontMetrics.java 9 What we would ultimately like to have: A tool (possibly integrated into a bug database) to automatically predict likely fix locations.

  10. The Tool: SVM Support Vector Machines 10 SVMs find the best fitting boundary (a hyper-plane) between two (or possibly more) classes in a feature space.

  11. Our Choice: libsvm • Light weight and easy to use • Easily parallelizable (with OpenMP) • Supports multiple classes • Supports predicting probabilities 11 Widely used SVM implementation.

  12. Training Data • Data points: Bug reports • Features: Extracted from bug reports • Classes: Locations 12 Application of SVMs to our problem.

  13. Data Points • Bugs with known fix locations (possibly several data points per bug) • No enhancements (inherently hard to predict) 13 A simplification to improve prediction results; enhancements may involve new locations, etc.

  14. Features • Only unstructured data • Short description • Long description • Keywords • No structured data • No Priority, etc. 14 Structured data is hard to integrate into a single feature vector with unstructured data (how to represent it? how to weight it?). Also di fg erent bug databases provide di fg erent types of meta data.

  15. Locations • Files • Packages (for Java projects) • (Sub-)Directories 15 Finer grained locations seem unlikely to work well, coarser grained locations would probably be useless.

  16. The Process Old Bug with known New Bug locations Reports Report Feature Feature Extraction Extraction Model Training Prediction Location(s) 16 Overview over the general process used by a possible tool.

  17. Feature Extraction Bag of Words {The, Program, Crashes} Plain Text (BOW) {Program, Crashes} Stop Words {Program, Crash} Stemming [0.5, 0, 0, 0, 0.5, 0, 0, 0, ...] Vector Scaling (TF) Program GUI Crash 17 Plain text needs to be converted into feature vectors for the SVM.

  18. Training: Kernels • Linear Kernel Recommended for problems with many data points and many features • Radial Basis Function (RBF) Kernel At least as good, if optimized (optimization expensive) 18 Two commonly used SVM “kernels” (functions to map data points into a di fg erent, possibly non-linear space, to find di fg erent kinds of boundaries).

  19. Kernel Comparison Unoptimized Optimized Linear Kernel RBF Kernel RBF Kernel 19 Why we chose to use linear kernels. Intuitively the vertical spread of these graphs is related to precision, the horizontal spread to recall; Notice that a linear kernel performs as well as an optimized RBF kernel (and note that the parameter optimization step is very computationally expensive).

  20. Evaluation: Data • iBUGS & Co. • maybe also: JBoss, ALMA, ... 20 Data provided by the SE chair, part of which is available to the public as “iBUGS”. This talk shows results from Eclipse and AspectJ.

  21. Experimental Setup Feature All Bug Results Extraction Reports split with known locations Training Testing Evaluator(s) Set Set Model Training Prediction Location(s) 21 Specific experimental setup (to generate results that can be evaluated).

  22. Splitting • Random Splitting May predict past from the future: unrealistic • Splitting along time axis Always predict future from the past: realistic 22 Why we chose splitting along a time axis (this is usually the better choice).

  23. Splitting • We split into 11 folds • Up to 10 for training • The following one for testing 23 11 just so we have 10 test results

  24. Splitting: Full History Fold 0: Train Test Fold 1: Train Test Fold 2: Train Test . . . . . . 24 One possibility: Always include all the previous data.

  25. Splitting: Partial History Fold 0: Train Test Fold 1: Test Train Fold 2: Test Train . . . . . . 25 Another possibility: Only include part of the previous data (here the history length is 1 fold, but it may also be longer).

  26. Evaluators • Precision and Recall Precision = #correct predictions / #predictions Recall = #correct predictions / #total correct • Accuracy Accuracy is a Synonym for Recall when we don’t care about Precision • “At Least One Right” How often do we get an Accuracy greater than 0? 26 Good precision is di ffj cult to achieve and just getting many of the actually correct locations (good recall) may well be enough to get useful results. Control flow analysis, tools like eROSE (which predicts likely additional locations that need to be changed with another location), etc. may provide some help.

  27. Benchmark: The Usual Suspects • Simply predict the locations where the most bugs were fixed in the past • Easy to implement • Proven to be useful 27 Based on a Pareto-type law: 80% of all bugs are in only 20% of all locations (or similar).

  28. Missing Results • Files in Eclipse and Mozilla • Very many locations, thus very high memory usage and computation time • Results currently unavailable due to technical difficulties 28 This data may be recomputed later.

  29. Packages in AspectJ Total Locations ~140 Bugs per Fold ~60 Average Locations per Bug 1.7 - 2.3 29 First example: relatively few locations, few bug reports in each training fold. Average locations per Bug means the average number of locations that had to be touched to fix a bug (i.e. an average bug fix in AspectJ touches about 2 packages).

  30. Packages in AspectJ Precision Recall 30 Just an overview, don’t panic! It looks like we’re doing better than the Usual Suspects. These graphs show the relationship between Precision and Recall, when varying the number of predictions, for each fold (fainter colors mean earlier folds).

  31. Packages in AspectJ Fold 1 Precision Recall 31 Relatively rare (in this case) example where the Usual Suspects get close.

  32. Packages in AspectJ Fold 8 Precision Recall 32 More common case (again, for this example).

  33. Packages in AspectJ Top 10: Average Accuracy Usual Suspects SVM 100 75 50 25 0 Fold 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 33 Average accuracy is positively influenced for the Usual Suspects, because there are many bugs in AspectJ (especially early on) with the exact same fix locations; Average accuracy is influenced negatively for SVM, because identical vectors with di fg erent locations (i.e. generally predicting more than one correct location) does not extremely well with SVMs (which are mostly designed to find one correct class for each data point).

  34. Packages in AspectJ Top 10: At Least One Right Usual Suspects SVM Random Chance 100 75 50 25 0 Fold 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 34 See notes from the previous slide; If we just check how often we get at least one right, we do better than the Usual Suspects (because they mostly just perfectly predict aforementioned special set of bugs, but not much else).

  35. Files in AspectJ Total Locations ~2390 Bugs per Fold ~60 Average Locations per Bug 5.6 - 9 35 Second example: Many more locations than bug reports in each training fold. We can intuitively predict that this will not work well.

  36. Files in AspectJ Precision Recall 36 This does not bode well either.

  37. Files in AspectJ Precision Recall 37 A closer look.

  38. Files in AspectJ Fold 6 Precision Recall 38 A relatively common case of the Usual Suspects doing better in this example.

  39. Files in AspectJ Top 10: Average Accuracy Usual Suspects SVM 100 75 50 25 0 Fold 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 39 Average performance is poor all over (which is to be expected with so many possible locations).

  40. Files in AspectJ Top 10: At Least One Right Usual Suspects SVM Random Chance 100 75 50 25 0 Fold 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 40 The Usual Suspects still benefit from aforementioned special bugs (which are not only fixed in the same packages, but often also in the same files); SVM su fg ers from even more conflicting locations.

  41. Files in AspectJ Top 10: At Least One Right Usual Suspects SVM Random Chance SVM (Shorter History) 100 75 50 25 0 Fold 0 Fold 1 Fold 2 Fold 3 Fold 4 Fold 5 Fold 6 Fold 7 Fold 8 Fold 9 41 We try to counteract the “pollution” of our model by the aforementioned special bugs by reducing the history size, with moderate success.

  42. Packages in Eclipse Total Locations ~1390 Bugs per Fold ~2300 Average Locations per Bug 1.5 - 2 42 Third example: Many more bug reports in each training set than locations. We intuitively expect this to work well.

Recommend


More recommend