AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ WASOWSKI Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming . In ICSE '09 . IEEE Computer 1
2
“Everyday, almost 300 bugs appear […] far too many for Annual cost of software only the Mozilla programmers errors in the US: $59.5 to handle.” – Mozilla Developer, billion (0.6% of GDP). 2005 PROBLEM: BUGGY SOFTWARE 10%: Everything Else Average time to fix a security-critical error: 28 days. 90%: Maintenance 3 http://www.clairelegoues.com
HOW DO HUMANS FIX NEW BUGS? 4 http://www.clairelegoues.com
Mike (developer) 5 http://www.clairelegoues.com
??! (Mike’s project) 6 http://www.clairelegoues.com
printf transformer 7 http://www.clairelegoues.com
Input: 1 2 3 4 5 6 7 1 8 9 0 1 1 1 2 8 http://www.clairelegoues.com
Input: 1 2 3 4 5 6 7 Legend Likely faultyability 1 Maybe faultyobabilit 8 9 0 Not faulty 1 1 1 2 9 http://www.clairelegoues.com
SECRET SAUCES • Test cases scalably inform about program behavior • Use test cases to evaluate candidate repairs • Existing program code contains the seeds of many repairs • Better use existing developer expertise than invent new code 10 http://www.clairelegoues.com
APPROACH Given a program and a set of test cases, conduct a biased, random search for a set of edits to a program that fixes a given bug. 11 http://www.clairelegoues.com
GENETIC PROGRAMMING: the application of evolutionary or genetic algorithms to program source code. 12 http://www.clairelegoues.com
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 13
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 14
GENETIC SEARCH 15 Fig. courtesy of Hossam Faris. https://www.researchgate.net/figure/Flow-chart-of-the-genetic-programming-approach_fig2_253458069
INDIVIDUAL CANDIDATES (INITIAL POPULATION) An individual is a candidate patch or set of changes to the input program. A patch is a series of statement-level edits: • delete X Reduces search space by at least • replace X with Y 2 — 10x • insert Y after X. Replace/insert: pick Y from somewhere else in the program. We are not touching the tests. 16 http://www.clairelegoues.com
MUTATION: HOW To mutate an individual, we add a new random edits to a given patch. • (or we generate a new individual by generating a couple of random edits to make a new patch) • We are not touching the tests 17 http://www.clairelegoues.com
SEARCH SPACE: FAULT LOCALIZATION Hypothesis: statements executed only by the failing test case(s) should be weighted more heavily than those also executed by the passing test cases. 18 http://www.clairelegoues.com
FAULT LOCALIZATION • Instrument the program to record lines visited during tests • The positive test case gcd(1071,1029) • visits lines 2 – 3 and 6 – 13 • The negative test case gcd(0,55) • visits lines 2 – 5, 6 – 7, and 9 – 10 When selecting portions of the program to modify we favor those: • Were visited during the negative test case • Were not also visited during the positive one • In this example, repairs are focused on lines 4 – 5 • This particular fault localization heuristics (custom for this paper) turned out not to be very good in long run. We return to this later. 19
> 1 void gcd(int a, int b) { if (a == 0) { 2 printf( “%d” , b); 3 } 4 while (b > 0) { 5 if (a > b) 6 a = a – b; 7 else 8 b = b – a; 9 } 10 printf (“ %d ”, a); 11 return; 12 13 } 20 http://www.clairelegoues.com
> gcd(4,2) 1 void gcd(int a, int b) { > 2 if (a == 0) { 2 printf( “%d” , b); 3 > } 4 > gcd(1071,1029) while (b > 0) { 5 > 21 if (a > b) 6 > a = a – b; 7 > gcd(0,55) else 8 > 55 b = b – a; 9 } 10 printf (“ %d ”, a); 11 (looping forever) return; 12 13 } 21 http://www.clairelegoues.com
(a=0; b=55) 1 void gcd(int a, int b) { true if (a == 0) { 2 > 55 printf( “%d” , b); 3 } 4 (a=0; b=55) true while (b > 0) { 5 false if (a > b) 6 ! a = a – b; 7 else 8 b = 55 - 0 b = b – a; 9 } 10 printf (“ %d ”, a); 11 return; 12 13 } 22 http://www.clairelegoues.com
Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} Legend High change probability printf(b) if(a>b) Low change probability {block} {block} Not changed a = a – b b = b – a 23 http://www.clairelegoues.com
Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} a = a – b b = b – a 24 http://www.clairelegoues.com
Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} a = a – b b = b – a 25 http://www.clairelegoues.com
Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} return a = a – b b = b – a 26 http://www.clairelegoues.com
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 27
MOTIVATING EXAMPLE (CONT … ) • Consider the following program variant: gcd_2(1071,1029) produces 1029 instead of 21 • Thus, the variants must pass the negative test case while retaining other core functionality • This is enforced through positive test cases 28
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 29
FITNESS FUNCTION • The fitness function returns a number indicating the acceptability of the program • We first compile the variant’s AST to an executable program • Then record which test cases are passed by that executable • A program variant that does not compile: fitness zero • 32.19% of variants failed to compile in our experiment • The weights W PosT and W NegT should be positive values 30
PATCH MINIMIZATION • Exit(0) is inserted correctly • a = a - b in line 5 is extraneous • Patch minimization (by search, delta- debugging) 31
CLAIMS GenProg can generically fix a variety of bugs in real programs without a priori knowledge. GenProg is human competitive in both expressive power and actual cost. 32 http://www.clairelegoues.com
Program Description LOC Bug Type Time (s) gcd example 22 infinite loop 153 nullhttpd webserver 5575 heap buffer overflow (code) 578 zune example 28 infinite loop 42 uniq text processing 1146 segmentation fault 34 look-u dictionary lookup 1169 segmentation fault 45 look-s dictionary lookup 1363 infinite loop 55 units metric conversion 1504 segmentation fault 109 deroff document processing 2236 segmentation fault 131 indent code processing 9906 infinite loop 546 flex lexical analyzer generator 18774 segmentation fault 230 openldap directory protocol 292598 non-overflow denial of service 665 ccrypt encryption utility 7515 segmentation fault 330 lighttpd webserver 51895 heap buffer overflow (vars) 394 atris graphical game 21553 local stack buffer exploit 80 php scripting language 764489 integer overflow 56 wu-ftpd FTP server 67029 format string vulnerability 2256 leukocyte computational biology 6718 segmentation fault 360 tiff image processing 84067 segmentation fault 108 imagemagick image processing 450516 wrong output 2160
CONCLUSIONS GenProg: scalable, generic, expressive automatic bug repair • Genetic programming search for a patch that addresses a given bug. • Render the search tractable by restricting the search space intelligently. It works! • Fixes a variety of bugs in a variety of programs. • Repaired 60 of 105 bugs for < $8 each, on average. Benchmarks/results/source code/VM images available: • http://genprog.cs.virginia.edu 34 http://www.clairelegoues.com
WHAT COULD’VE GONE WRONG? • What if we write a new test case? what do we do about that? • Machine learning folks have known for years that minimization does not affect quality positively: model size can be independent of degree of overfitting. How could we evaluate overfitting? 35
36
SEMFIX: PROGRAM REPAIR VIA SEMANTIC ANALYSIS SemFix: program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13) 37
REPAIRING PROGRAMS WITH SEMANTIC CODE SEARCH Yalin Ke Kathryn T. Stolee Claire Le Goues Yuriy Brun Iowa State Iowa State Carnegie Mellon UMass Amherst Y. Ke, K. T. Stolee, C. L. Goues and Y. Brun. Repairing Programs with Semantic Code Search. In ASE ’15 38
OVERFITTING Does the patch generalize beyond the test cases used to create it? Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun, Is the Cure Worse than the Disease? Overfitting in Automated Program Repair, ESEC/FSE 2015. 42
Recommend
More recommend