using coverage criteria
play

using Coverage Criteria Milos Gligoric 1 , Alex Groce 2 , Chaoqiang - PowerPoint PPT Presentation

Comparing Non-adequate Test Suites using Coverage Criteria Milos Gligoric 1 , Alex Groce 2 , Chaoqiang Zhang 2 Rohan Sharma 1 , Amin Alipour 2 , Darko Marinov 1 ISSTA 2013 Lugano, Switzerland July 18, 2013 1 2 Motivation Publications are


  1. Comparing Non-adequate Test Suites using Coverage Criteria Milos Gligoric 1 , Alex Groce 2 , Chaoqiang Zhang 2 Rohan Sharma 1 , Amin Alipour 2 , Darko Marinov 1 ISSTA 2013 Lugano, Switzerland July 18, 2013 1 2

  2. Motivation • Publications are increasingly using coverage criteria to compare test suites and techniques • What coverage criterion should researchers use to compare test suites?

  3. Quiz 1 • Consider two test suites T 1 and T 2 – T 1 50% statement coverage, 75% branch coverage – T 2 60% statement coverage, 55% branch coverage • Which test suite is better?

  4. Example: BinomialHeap // public class BinomialHeap { ... static class Node { int key; Node parent; } Node nodes; int size; void decreaseKey(int oldValue, int newValue) { Node tmp = nodes.findANodeWithKey(oldValue); if (tmp == null) return; tmp.key = newValue; Node tmpParent = tmp.parent; while ((tmpParent != null) && (tmp.key < tmpParent.key)) { int z = tmp.key; 3 tmp.key = tmpParent.key; tmpParent.key = z; tmp = tmpParent; 7 5 tmpParent = tmpParent.parent; } 9 }

  5. Acyclic Intra-Method Path (AIMP) void decreaseKey(int oldValue, int newValue) { Coverage.beginMethod(0); Node tmp = nodes.findANodeWithKey(oldValue); if (tmp == null) { Coverage.cover(1); return;} Coverage.cover(2); tmp.key = newValue; Node tmpParent = tmp.parent; while ((tmpParent != null) && (tmp.key < tmpParent.key)) { Coverage.cover(3); int z = tmp.key; decreaseKey(9, 8) decreaseKey(9, 3 tmp.key = tmpParent.key; AIMP: 0 , 2 , 4 tmpParent.key = z; tmp = tmpParent; 7 5 tmpParent = tmpParent.parent; } Coverage.cover(4); } 9

  6. Acyclic Intra-Method Path (AIMP) void decreaseKey(int oldValue, int newValue) { Coverage.beginMethod(0); Node tmp = nodes.findANodeWithKey(oldValue); if (tmp == null) { Coverage.cover(1); return;} Coverage.cover(2); tmp.key = newValue; Node tmpParent = tmp.parent; while ((tmpParent != null) && (tmp.key < tmpParent.key)) { Coverage.cover(3); int z = tmp.key; decreaseKey(9, decreaseKey(9, 2) 3 tmp.key = tmpParent.key; AIMP: 0 , 2 , 3 tmpParent.key = z; AIMP: 3 , 4 tmp = tmpParent; 7 5 tmpParent = tmpParent.parent; } Coverage.cover(4); } 9

  7. Predicate-Complete Test Coverage (PCT) void decreaseKey(int oldValue, int newValue) { Coverage.beginMethod(0); Node tmp = nodes.findANodeWithKey(oldValue); if (tmp == null) { Coverage.cover(1); return;} Coverage.cover(2); tmp.key = newValue; Node tmpParent = tmp.parent; while ((tmpParent != null) && (tmp.key < tmpParent.key)) { Coverage.cover(3); int z = tmp.key; tmp.key = tmpParent.key; 1. Extract predicates: tmp == null tmpParent.key = z; tmpParent != null tmp = tmpParent; tmp.key < tmpParent.key tmpParent = tmpParent.parent; } Coverage.cover(4); }

  8. Predicate-Complete Test Coverage (PCT) void decreaseKey(int oldValue, int newValue) { Coverage.beginMethod(0); Node tmp = nodes.findANodeWithKey(oldValue); if (tmp == null) { Coverage.cover (1, …); return;} Coverage.cover (2, …); tmp.key = newValue; Node tmpParent = tmp.parent; while ((tmpParent != null) && (tmp.key < tmpParent.key)) { Coverage.cover(3, p$49(tmp, tmpParent ), …); 2. Insert evaluation of predicates: int z = tmp.key; // tmp.key < tmpParent.key tmp.key = tmpParent.key; boolean p$49(Node tmp , Node tmpParent) { if (tmpParent == null) return false; tmpParent.key = z; if (tmp == null) return false; tmp = tmpParent; return tmp.key < tmpParent.key; } tmpParent = tmpParent.parent; } Coverage.cover(4, p$49(tmp, tmpParent)); }

  9. Quiz 2 • Which coverage to use to compare test suites? – Statement (SC) – Branch (BC) – Intra-method path (AIMP) – Predicate-Complete Test Coverage (PCT)

  10. Problem Discussion • What does it mean that one coverage is good? Or that one coverage is better than another? • We want coverage to be a predictor of finding bugs: test suites with higher coverage should find, on average, more (real) bugs • Instead of prediction of real bugs, our experiments use prediction of mutation score

  11. Towards the Answer • Ability of coverage to predict mutation score

  12. Statistical Evaluation • Quantify the degree to which this holds: – If a suite A has higher coverage than a suite B, then the suite A has a higher mutation score • Statistical tools – Kendall t - measures how well coverage predicts the relative ordering of mutation score – R 2 - correlates coverage values with mutation score using a linear regression model

  13. Steps 1. Select subjects 2. Obtain test pools for the subjects 3. Obtain mutants for the subjects 4. Create test suites from the test pools 5. Collect several metrics for the selected suites 6. Apply statistical tools to measure correlation

  14. Step 1: Experimental Subjects Subject NBNC Subject NBNC AvlTree 344 Printtokens 479 BinomialHeap 264 Printtokens2 401 BinTree 100 Replace 512 FibHeap 264 Schedule 292 Schedule2 297 FibonacciHeap 397 HeapArray 98 SglibRbtree 476 IntAVLTreeMap 213 Space 6,200 IntRedBlackTree 296 SQLite 81,934 JFreeChart 72,490 Totinfo 340 Tcas 135 JodaTime 27,472 LinkedList 245 YAFFS2 11,760 NodeCachLList 234 SinglyLList 98 TreeMap 449 TreeSet 323

  15. Step 2: Tests Subject NBNC tests Subject NBNC tests AvlTree 344 11,041 Printtokens 479 4,130 BinomialHeap 264 8,423 Printtokens2 401 4,115 BinTree 100 13,825 Replace 512 5,542 FibHeap 264 12,842 Schedule 292 2,650 Schedule2 297 2,710 FibonacciHeap 397 4,478 HeapArray 98 4,064 SglibRbtree 476 5,000 IntAVLTreeMap 213 17,072 Space 6,200 1,350 IntRedBlackTree 296 20,419 SQLite 81,934 117,240 JFreeChart 72,490 2,217 Totinfo 340 917 Tcas 135 1,608 JodaTime 27,472 3,828 LinkedList 245 1,307 YAFFS2 11,760 5,000 NodeCachLList 234 1,776 • Automatically generated: Random, Shape SinglyLList 98 1,762 Abstraction, Adaptation-based programming TreeMap 449 14,076 • Manually written (bigger examples) TreeSet 323 17,400

  16. Step 3: Mutants Subject NBNC tests mutants Subject NBNC tests mutants AvlTree 344 11,041 335 Printtokens 479 4,130 536 BinomialHeap 264 8,423 205 Printtokens2 401 4,115 343 BinTree 100 13,825 55 Replace 512 5,542 613 FibHeap 264 12,842 186 Schedule 292 2,650 140 Schedule2 297 2,710 300 FibonacciHeap 397 4,478 295 HeapArray 98 4,064 122 SglibRbtree 476 5,000 443 IntAVLTreeMap 213 17,072 199 Space 6,200 1,350 1,142 IntRedBlackTree 296 20,419 279 SQLite 81,934 117,240 52,367 JFreeChart 72,490 2,217 45,409 Totinfo 340 917 511 Tcas 135 1,608 311 JodaTime 27,472 3,828 24,956 LinkedList 245 1,307 167 YAFFS2 11,760 5,000 10,674 NodeCachLList 234 1,776 159 SinglyLList 98 1,762 57 • Javalanche used to mutate Java programs TreeMap 449 14,076 463 • Proteum used to mutate C programs TreeSet 323 17,400 360

  17. Step 4: Creating Test Suites • Coverage method – 300 test suites – Uniformly selecting values for PCT coverage – Then randomly choose tests to reach the coverage • Size method (used in previous studies) – 100 random suites for each size between 1 and 50 (less varied coverage)

  18. Step 5: Collecting Metrics • Coverage criteria – SC, BC, AIMP, PCT (more in the paper) • Mutation score • Runtime overhead • Example (for one of the subjects) SC BC AIMP PCT Mutation score Overhead Test suite 1 C 1 M 1 Test suite 2 C 2 M 2 … Test suite N C N M N

  19. Step 6: Statistical analysis • Kendall t rank correlation coefficient – Consider two pairs (C 1 , M 1 ) and (C 2 , M 2 ) – Concordant if ordering of C 1 and C 2 matches M 1 and M 2 , discordant otherwise – Ratio of the difference between the number of concordant and discordant pairs and the total number of pairs • R 2 coefficient of determination – Linear regression model for each criterion – Given an indication of correlation – Intuitively, if one suite has X% higher coverage value than another suite, does it have a c ∗ X% higher mutation score?

  20. Results: Kendall t for Java Subjects

  21. Kendall t for C Subjects

  22. R 2 for Java Subjects

  23. R 2 for C Subjects

  24. Overhead for Java Subjects

  25. Overhead for C Subjects

  26. Conclusions • Publications are increasingly using coverage criteria to compare test suites and techniques • Our study compared coverages • Take-away messages – Due to high effectiveness and low overhead, researchers should use branch coverage to compare suites whenever possible – Intra-procedural acyclic path coverage performed best of all non-branch coverage criteria http://mir.cs.illinois.edu/coco/

Recommend


More recommend