Comparing User-Provided Tests to Developer-Provided Tests René Just, Chris Parnin, Ian Drosos, Michael D. Ernst ISSTA 2018
User-provided tests Developer-provided tests Found in bug reports Committed to repository One small test More tests, more LOC Weak or no assertions More, stronger assertions High code coverage Focused on the defect Used by programmers Used in experiments Fault localization 5-14% worse Automated program repair 54-100% worse User-provided tests should be used in experiments.
Fault localization: where is the defect? Defective program double avg (double[] nums) { Fault int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique sum += nums[i]; } return sum * n; } Test suite Passing tests Failing tests
Fault localization: where is the defect? Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Least suspicious Test suite Most Passing tests suspicious Failing tests
Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing tests Failing tests
Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Early work Defective program Statement ranking ● Artificial defects (“mutants”) double avg (double[] nums) { double avg (double[] nums) { Fault ○ Easy to create lots of them int n = nums.length; int n = nums.length; ○ Known fault locations localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 Pearson et al. [ICSE 2017] sum += nums[i]; sum += nums[i]; } } ● 310 real defects (Defects4J) return sum * n; return sum * n; ● 2995 artificial defects } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Early work Defective program Statement ranking ● Artificial defects (“mutants”) double avg (double[] nums) { double avg (double[] nums) { Fault ○ Easy to create lots of them int n = nums.length; int n = nums.length; ○ Known fault locations localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 Pearson et al. [ICSE 2017] sum += nums[i]; sum += nums[i]; } } ● 310 real defects (Defects4J) return sum * n; return sum * n; ● 2995 artificial defects } } Compare to Early work Test suite known location ● Artificial tests of defect ○ Written by researchers Passing double avg (double[] nums) { Fault ○ Unrealistically strong tests int n = nums.length; localization double sum = 0; Pearson et al. [ICSE 2017] for(int i=0; i<n; ++i) { technique 2 Failing ● Real tests (Defects4J) sum += nums[i]; tests } ○ Written by developers return sum * n; ○ Committed with the fix }
Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL
Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL Results agree with most prior studies on artificial faults but only 3 effect sizes are not negligible.
Pearson [ICSE 2017] Comparison of fault localization techniques SBFL vs. SBFL MBFL vs. SBFL Results disagree with all prior studies on real faults . Design decisions don’t matter: techniques indistinguishable .
Evaluating fault localization Defective program Statement ranking double avg (double[] nums) { double avg (double[] nums) { Fault int n = nums.length; int n = nums.length; localization double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Compare to Test suite known location of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Written before or after the fix? Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { technique 2 Failing sum += nums[i]; tests } return sum * n; }
Evaluating fault localization Defective program Statement ranking New standard methodology: double avg (double[] nums) { double avg (double[] nums) { Fault Use real defects int n = nums.length; int n = nums.length; localization from Defects4J (mined from double sum = 0; double sum = 0; for(int i=0; i<n; ++i) { for(int i=0; i<n; ++i) { technique 1 version control repositories) sum += nums[i]; sum += nums[i]; } } return sum * n; return sum * n; } } Defects4J: real triggering tests Compare to ● Written by developers Test suite known location ● Committed with the fix of defect Passing double avg (double[] nums) { Written before or after the fix? Fault tests int n = nums.length; localization double sum = 0; for(int i=0; i<n; ++i) { In practice, fault localization is technique 2 Failing sum += nums[i]; run before the fix , using triggering tests } return sum * n; tests from bug reports .In }
User-provided test https://issues.apache.org/jira/browse/LANG-857 public void userTest () { assertEquals("\uD83D\uDE30", StringEscapeUtils.escapeCsv("\uD83D\uDE30")); }
Recommend
More recommend