programming
play

PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY - PowerPoint PPT Presentation

AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ WASOWSKI Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding


  1. AUTOMATIC PROGRAM REPAIR USING GENETIC PROGRAMMING SLIDES BY CLAIRE LE GOUES (MOSTLY) BUT ALSO SOME BY MAHSA VARSHOSAZ & ANDRZEJ WASOWSKI Westley Weimer, ThanhVu Nguyen, Claire Le Goues, and Stephanie Forrest. 2009. Automatically finding patches using genetic programming . In ICSE '09 . IEEE Computer 1

  2. 2

  3. “Everyday, almost 300 bugs appear […] far too many for Annual cost of software only the Mozilla programmers errors in the US: $59.5 to handle.” – Mozilla Developer, billion (0.6% of GDP). 2005 PROBLEM: BUGGY SOFTWARE 10%: Everything Else Average time to fix a security-critical error: 28 days. 90%: Maintenance 3 http://www.clairelegoues.com

  4. HOW DO HUMANS FIX NEW BUGS? 4 http://www.clairelegoues.com

  5. Mike (developer) 5 http://www.clairelegoues.com

  6. ??! (Mike’s project) 6 http://www.clairelegoues.com

  7. printf transformer 7 http://www.clairelegoues.com

  8. Input: 1 2 3 4 5 6 7 1 8 9 0 1 1 1 2 8 http://www.clairelegoues.com

  9. Input: 1 2 3 4 5 6 7 Legend Likely faultyability 1 Maybe faultyobabilit 8 9 0 Not faulty 1 1 1 2 9 http://www.clairelegoues.com

  10. SECRET SAUCES • Test cases scalably inform about program behavior • Use test cases to evaluate candidate repairs • Existing program code contains the seeds of many repairs • Better use existing developer expertise than invent new code 10 http://www.clairelegoues.com

  11. APPROACH Given a program and a set of test cases, conduct a biased, random search for a set of edits to a program that fixes a given bug. 11 http://www.clairelegoues.com

  12. GENETIC PROGRAMMING: the application of evolutionary or genetic algorithms to program source code. 12 http://www.clairelegoues.com

  13. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 13

  14. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 14

  15. GENETIC SEARCH 15 Fig. courtesy of Hossam Faris. https://www.researchgate.net/figure/Flow-chart-of-the-genetic-programming-approach_fig2_253458069

  16. INDIVIDUAL CANDIDATES (INITIAL POPULATION) An individual is a candidate patch or set of changes to the input program. A patch is a series of statement-level edits: • delete X Reduces search space by at least • replace X with Y 2 — 10x • insert Y after X. Replace/insert: pick Y from somewhere else in the program. We are not touching the tests. 16 http://www.clairelegoues.com

  17. MUTATION: HOW To mutate an individual, we add a new random edits to a given patch. • (or we generate a new individual by generating a couple of random edits to make a new patch) • We are not touching the tests 17 http://www.clairelegoues.com

  18. SEARCH SPACE: FAULT LOCALIZATION Hypothesis: statements executed only by the failing test case(s) should be weighted more heavily than those also executed by the passing test cases. 18 http://www.clairelegoues.com

  19. FAULT LOCALIZATION • Instrument the program to record lines visited during tests • The positive test case gcd(1071,1029) • visits lines 2 – 3 and 6 – 13 • The negative test case gcd(0,55) • visits lines 2 – 5, 6 – 7, and 9 – 10 When selecting portions of the program to modify we favor those: • Were visited during the negative test case • Were not also visited during the positive one • In this example, repairs are focused on lines 4 – 5 • This particular fault localization heuristics (custom for this paper) turned out not to be very good in long run. We return to this later. 19

  20. > 1 void gcd(int a, int b) { if (a == 0) { 2 printf( “%d” , b); 3 } 4 while (b > 0) { 5 if (a > b) 6 a = a – b; 7 else 8 b = b – a; 9 } 10 printf (“ %d ”, a); 11 return; 12 13 } 20 http://www.clairelegoues.com

  21. > gcd(4,2) 1 void gcd(int a, int b) { > 2 if (a == 0) { 2 printf( “%d” , b); 3 > } 4 > gcd(1071,1029) while (b > 0) { 5 > 21 if (a > b) 6 > a = a – b; 7 > gcd(0,55) else 8 > 55 b = b – a; 9 } 10 printf (“ %d ”, a); 11 (looping forever) return; 12 13 } 21 http://www.clairelegoues.com

  22. (a=0; b=55) 1 void gcd(int a, int b) { true if (a == 0) { 2 > 55 printf( “%d” , b); 3 } 4 (a=0; b=55) true while (b > 0) { 5 false if (a > b) 6 ! a = a – b; 7 else 8 b = 55 - 0 b = b – a; 9 } 10 printf (“ %d ”, a); 11 return; 12 13 } 22 http://www.clairelegoues.com

  23. Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} Legend High change probability printf(b) if(a>b) Low change probability {block} {block} Not changed a = a – b b = b – a 23 http://www.clairelegoues.com

  24. Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} a = a – b b = b – a 24 http://www.clairelegoues.com

  25. Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} a = a – b b = b – a 25 http://www.clairelegoues.com

  26. Input: {block} while if(a==0) printf(a) return (b>0) {block} {block} {block} An edit is: • Insert statement X after statement Y printf(b) if(a>b) • Replace statement X with statement Y • Delete statement X {block} {block} return a = a – b b = b – a 26 http://www.clairelegoues.com

  27. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 27

  28. MOTIVATING EXAMPLE (CONT … ) • Consider the following program variant: gcd_2(1071,1029) produces 1029 instead of 21 • Thus, the variants must pass the negative test case while retaining other core functionality • This is enforced through positive test cases 28

  29. INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE 29

  30. FITNESS FUNCTION • The fitness function returns a number indicating the acceptability of the program • We first compile the variant’s AST to an executable program • Then record which test cases are passed by that executable • A program variant that does not compile: fitness zero • 32.19% of variants failed to compile in our experiment • The weights W PosT and W NegT should be positive values 30

  31. PATCH MINIMIZATION • Exit(0) is inserted correctly • a = a - b in line 5 is extraneous • Patch minimization (by search, delta- debugging) 31

  32. CLAIMS GenProg can generically fix a variety of bugs in real programs without a priori knowledge. GenProg is human competitive in both expressive power and actual cost. 32 http://www.clairelegoues.com

  33. Program Description LOC Bug Type Time (s) gcd example 22 infinite loop 153 nullhttpd webserver 5575 heap buffer overflow (code) 578 zune example 28 infinite loop 42 uniq text processing 1146 segmentation fault 34 look-u dictionary lookup 1169 segmentation fault 45 look-s dictionary lookup 1363 infinite loop 55 units metric conversion 1504 segmentation fault 109 deroff document processing 2236 segmentation fault 131 indent code processing 9906 infinite loop 546 flex lexical analyzer generator 18774 segmentation fault 230 openldap directory protocol 292598 non-overflow denial of service 665 ccrypt encryption utility 7515 segmentation fault 330 lighttpd webserver 51895 heap buffer overflow (vars) 394 atris graphical game 21553 local stack buffer exploit 80 php scripting language 764489 integer overflow 56 wu-ftpd FTP server 67029 format string vulnerability 2256 leukocyte computational biology 6718 segmentation fault 360 tiff image processing 84067 segmentation fault 108 imagemagick image processing 450516 wrong output 2160

  34. CONCLUSIONS GenProg: scalable, generic, expressive automatic bug repair • Genetic programming search for a patch that addresses a given bug. • Render the search tractable by restricting the search space intelligently. It works! • Fixes a variety of bugs in a variety of programs. • Repaired 60 of 105 bugs for < $8 each, on average. Benchmarks/results/source code/VM images available: • http://genprog.cs.virginia.edu 34 http://www.clairelegoues.com

  35. WHAT COULD’VE GONE WRONG? • What if we write a new test case? what do we do about that? • Machine learning folks have known for years that minimization does not affect quality positively: model size can be independent of degree of overfitting. How could we evaluate overfitting? 35

  36. 36

  37. SEMFIX: PROGRAM REPAIR VIA SEMANTIC ANALYSIS SemFix: program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering (ICSE '13) 37

  38. REPAIRING PROGRAMS WITH SEMANTIC CODE SEARCH Yalin Ke Kathryn T. Stolee Claire Le Goues Yuriy Brun Iowa State Iowa State Carnegie Mellon UMass Amherst Y. Ke, K. T. Stolee, C. L. Goues and Y. Brun. Repairing Programs with Semantic Code Search. In ASE ’15 38

  39. OVERFITTING Does the patch generalize beyond the test cases used to create it? Edward K. Smith, Earl Barr, Claire Le Goues, and Yuriy Brun, Is the Cure Worse than the Disease? Overfitting in Automated Program Repair, ESEC/FSE 2015. 42

Recommend


More recommend