Occam : Automated Software Winnowing Gregory Malecha 1 Ashish Gehani 2 Natarajan Shankar 2 1 Harvard 2 SRI Malecha, Gehani, Shankar Occam : Automated Software Winnowing 1 / 18
A story of success... Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 2 / 18
A story of success... Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple MiniBlog – “Simple” PHP blogging application 683 lines of PHP code Depends on PHP & MySQL Malecha, Gehani, Shankar Occam : Automated Software Winnowing 2 / 18
A story of success... Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple MiniBlog – “Simple” PHP blogging application 683 lines of PHP code Depends on PHP & MySQL PHP – Programming language interpreter 625,000 lines of C Depends on LibC Malecha, Gehani, Shankar Occam : Automated Software Winnowing 2 / 18
A story of success... Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple MiniBlog – “Simple” PHP blogging application 683 lines of PHP code Depends on PHP & MySQL PHP – Programming language interpreter 625,000 lines of C Depends on LibC LibC – C standard runtime library 366,000 lines of C Malecha, Gehani, Shankar Occam : Automated Software Winnowing 2 / 18
A story of success... Software engineering has been so successful that it’s easy to write incredibly complex pieces of code. Software engineering makes thing deceptively simple MiniBlog – “Simple” PHP blogging application 683 lines of PHP code Depends on PHP & MySQL PHP – Programming language interpreter 625,000 lines of C Depends on LibC LibC – C standard runtime library 366,000 lines of C Where could there possibly be a bug? Malecha, Gehani, Shankar Occam : Automated Software Winnowing 2 / 18
Winnowing Outline 1 Reduce the “functionality” of a system thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static. 2 Overcome static analysis limitations Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!). 3 Monitor systems and enforce dynamic policies Log function calls as the program runs. Check security properties. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 3 / 18
Winnowing Reducing Functionality Program : thttpd Size : 11,322 lines Problems Uses potentially dangerous functions like listen , connect , etc. Reads configuration data from the command line. Solutions Limit the ways that dangerous functions can be called. Compile configuration data into the program. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 4 / 18
Winnowing Winnowing a Single Module Module Winnowing Overview Winnowing (3) reduction (0) compile (4) link main.bc a.out main.c (llvm) (1) “partial evaluation” (2) specialize Malecha, Gehani, Shankar Occam : Automated Software Winnowing 5 / 18
Winnowing Winnowing a Single Module (1) “Partial Evaluation” Simplify the program as much as possible, want to expose constants. foo(int x, int y) { bar(x, 1 + 2); bar(2*5, y); } bar(int a, int b) { ...a...b... } Use LLVM’s -O3 .
Winnowing Winnowing a Single Module (1) “Partial Evaluation” Simplify the program as much as possible, want to expose constants. foo(int x, int y) { foo(int x, int y) { bar(x, 3 ); bar(x, 1 + 2); bar( 10 , y); bar(2*5, y); } } bar(int a, int b) bar(int a, int b) { ...a...b... } { ...a...b... } Use LLVM’s -O3 . Malecha, Gehani, Shankar Occam : Automated Software Winnowing 6 / 18
Winnowing Winnowing a Single Module (2) Specialization Specialize functions when they take constant arguments. foo(int x, int y) { bar(x, 3); bar(10, y); } bar(int a, int b) { ...a...b... } Duplicate functions and inline constants using a custom LLVM pass.
Winnowing Winnowing a Single Module (2) Specialization Specialize functions when they take constant arguments. foo(int x, int y) { foo(int x, int y) { bar’(x); bar(x, 3); bar”(y); bar(10, y); } } bar(int a, int b) bar’(int a) bar”(int b) { ...a...b... } { ...a...3... } { ...10...b... } bar(int a, int b) { ...a...b... } Duplicate functions and inline constants using a custom LLVM pass. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 7 / 18
Winnowing Winnowing a Single Module (3) Reduction Eliminate unused code. foo(int x, int y) { bar’(x); bar’’(y); } bar’(int a) bar’’(int b) { ...a...3... } { ...10...b... } bar(int a, int b) { ...a...b... } LLVM dead-code/global elimination pass.
Winnowing Winnowing a Single Module (3) Reduction Eliminate unused code. foo(int x, int y) { foo(int x, int y) { bar’(x); bar’(x); bar’’(y); bar’’(y); } } bar’(int a) bar’’(int b) bar’(int a) { ...a...3... } { ...10...b... } { ...a...3... } bar’’(int b) bar(int a, int b) { ...10...b... } { ...a...b... } LLVM dead-code/global elimination pass. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 8 / 18
Making Dynamics into Statics Outline 1 Reduce the “functionality” of a system thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static. 2 Overcome static analysis limitations Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!). 3 Monitor systems and enforce dynamic policies Log function calls as the program runs. Check security properties. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 9 / 18
Making Dynamics into Statics Specializing PHP Program : MiniBlog & PHP Interpreter Size : 1,000 lines of PHP & 625,000 lines of C Problems PHP interpreter provides unnecessary functions (dead code) Some of these functions are potentially dangerous, e.g. system and mail . Solutions Remove unnecessary functions from the PHP interpreter binary. Force PHP to exit when dangerous functions are called. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 10 / 18
Making Dynamics into Statics Function Transformations 1 “Statically analyze” the PHP code and determine the functions that it will call. For relatively static applications this can be done with a grep -like static analysis. Miniblog requires about 46 PHP functions out of the 1028 functions that a minimal PHP install would have. 2 Implement a transformation that will replace these unused functions with a simple exit (1) . Winnow the result to remove all the unnecessary code. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 11 / 18
Making Dynamics into Statics Specifying Rewrites We can specify subs the same way that we refer to specializations. Remove system Function z i f s y s t e m (?) = > f a i l fail is a keyword meaning call exit (1) . Question marks specify wildcard arguments; here we stub all calls to zif system . Also support integer constants, so we can reject some calls but not others. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 12 / 18
Making Dynamics into Statics Rewriting Code Small transformation pass to replace function bodies. ... zif system ... zif system(char* cmd) { system(cmd); } system(char* cmd) { libc code } Implemented as a custom LLVM transformation pass.
Making Dynamics into Statics Rewriting Code Small transformation pass to replace function bodies. ... zif system ... ... zif system ... zif system(char* cmd) zif system(char* cmd) { system(cmd); } { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) system(char* cmd) { libc code } { libc code } Implemented as a custom LLVM transformation pass. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 13 / 18
Making Dynamics into Statics Reusing the Winnowing Hammer Remove dead code using winnowing. ... zif system ... zif system(char* cmd) { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) { libc code } Reduce to an already solved problem!
Making Dynamics into Statics Reusing the Winnowing Hammer Remove dead code using winnowing. ... zif system ... ... zif system ... zif system(char* cmd) zif system(char* cmd) { exit(1); } { exit(1); } zif system’(char* cmd) { system(cmd); } system(char* cmd) { libc code } Reduce to an already solved problem! Malecha, Gehani, Shankar Occam : Automated Software Winnowing 14 / 18
Monitoring Outline 1 Reduce the “functionality” of a system thttpd is a simple webserver. It doesn’t need to be able to listen on arbitrary ports. Make configuration options static. 2 Overcome static analysis limitations Miniblog should never send email, so that functionality should not be in the system. We need to cut it out, since mail is in the PHP standard library (compiled into the interpreter!). 3 Monitor systems and enforce dynamic policies Log function calls as the program runs. Check security properties. Malecha, Gehani, Shankar Occam : Automated Software Winnowing 15 / 18
Recommend
More recommend