Reassembleable Disassembling Shuai Wang, Pei Wang, Dinghao Wu Pennsylvania State University 24th USENIX Security Symposium, August 2015 1 / 14
Motivation Analysing and retrofitting COTS binaries with. . . software fault isolation control-flow integrity symbolic taint analysis elimination of ROP gadgets 2 / 14
Motivation Analysing and retrofitting COTS binaries with. . . software fault isolation control-flow integrity symbolic taint analysis elimination of ROP gadgets Binary rewriting comes with major drawbacks/limitations runtime overhead from patching due to control-flow transfers patching requires PIC if code is relocated instrumentation significantly increases binary size binary reuse only works for small binaries (coverage) 2 / 14
Goal Produce reassembleable assembly code from stripped COTS binaries in a fully automated manner. Allows binary-based whole program transformations Requires relocatable assembly code → symbolization of immediate values Complementary to existing work 3 / 14
Symbolization Given an immediate value in assembly code, is it a constant or a memory address? Reassembling transformed program changes binary layout Address changes invalidate memory references x86 No distinction between code and data Variable-length instruction encoding 4 / 14
(Un)Relocatable Assembly Code mov 0xc0, %eax .text mov 0xc0, %eax assemble 0xa08 .data .long 0xa08 mov 0xc0, %eax 0xc0: ? unrelocatable 0xc0: 0xa08 .text mov Glob, %eax mov Glob, %eax binary assemble .data Glob: Glob: 0xa08 .long 0xa08 relocatable 5 / 14
Types of Symbol References Code Section Data Section fun1: ptr: call fun2 .long table c2d c2c d2d fun2: table: mov ptr, %eax .long handler1 lea (%eax, %ebx, 4), %ecx .long handler2 call *%ecx handler1: ... d2c handler2: ... 6 / 14
Symbolization of c2c and c2d References Valid memory references point into code or data section Assume all immediates to be references and filter out invalid ones 7 / 14
Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned 8 / 14
Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned Assumption 2 “ Users do not need to perform transformation on the original binary data. ” → Keep start addresses of data sections during reassembly and ignore d2d references 8 / 14
Symbolization of d2c and d2d References Assumption 1 “ All symbol references stored in data sections are n-byte aligned, where n is 4 for 32-bit binaries and 8 for 64-bit binaries. ” → Consider only n-byte values which are n-byte aligned Assumption 2 “ Users do not need to perform transformation on the original binary data. ” → Keep start addresses of data sections during reassembly and ignore d2d references Assumption 3 “ d2c symbol references are only used as function pointers or jump table entries. ” → References need to point to start of a function or form a jump table 8 / 14
Evaluation Uroboros : 13,209 SLOC in OCaml and Python; works with x86/x64 ELF binaries Intel Core i7-3770 @ 3.4GHz with 8GiB RAM running Ubuntu 12.04 122 programs compiled for 32- and 64-bit targets gcc 4.6.3 with default configuration and optimization of each program strip ped before testing Collection Size Content C OREUTILS 103 GNU Core Utilities R EAL 7 bc, ctags, gzip, mongoose, nweb, oftpd, thttpd S PEC 12 C programs in SPEC2006 9 / 14
Architecture of Uroboros Data Disassembly Module Analysis Module Relocatable Linear Symbol Lifting Binary Assembly Meta-Data Disassembler External Control-Flow Disassembly Code Analyses & Structure Recovery Validator Transformations 10 / 14
Architecture of Uroboros Data Disassembly Module Analysis Module Relocatable Linear Symbol Lifting Binary Assembly Meta-Data Disassembler External Control-Flow Disassembly Code Analyses & Structure Recovery Validator Transformations https://openclipart.org/detail/215030/ 10 / 14
Correctness Test input shipped with programs or custom test of major functionality (some of REAL) Binaries Failing Functionality Tests Assumption Set 32-bit 64-bit {} h264ref, gcc, gobmk, hmmer perlbench, gcc, gobmk, hmmer, sjeng, h264ref, lbm, sphinx3 { A1 } h264ref, gcc, gobmk perlbench, gcc, gobmk { A1 , A2 } h264ref, gcc, gobmk perlbench, gcc, gobmk { A1 , A3 } gobmk gcc, gobmk { A1 , A2 , A3 } gobmk 11 / 14 2 8 Normalized Overhead (%) Normalized Overhead (%) 1.5 6 1 4 0.5 0 2 -0.5 0 -1 -2 -1.5 p b g m g h s l h m l s c g b n t m o i b [ base64 basename cat cksum comm cp csplit cut date tty uname unexpand uniq unlink uptime users vdir wc who b h e z c o m j e 2 p t z c w f c i m a t o t r i c b q 6 l h i t p l p f m n c g p e p n b m u 4 i d 2 g n s b a d g e e r x k e o n r n f 3 o c t u s h m e
Symbolization Errors Table 4: Symbolization false positives of 32-bit S PEC , R EAL and C OREUTILS (Others have zero false positive) Assumption Set Benchmark # of Ref. {} { A1 } { A1 , A2 } { A1 , A3 } { A1 , A2 , A3 } FP FP Rate FP FP Rate FP FP Rate FP FP Rate FP FP Rate perlbench 76538 2 0.026‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ hmmer 13127 12 0.914‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ h264ref 20600 27 1.311‰ 1 0.049‰ 1 0.049‰ 0 0.000‰ 0 0.000‰ gcc 262698 49 0.187‰ 32 0.122‰ 32 0.122‰ 0 0.000‰ 0 0.000‰ gobmk 65244 1348 20.661‰ 985 15.097‰ 912 13.978‰ 78 1.196‰ 5 0.077‰ Table 5: Symbolization false negatives of 32-bit S PEC , R EAL and C OREUTILS (Others have zero false negative) 8 2 Assumption Set Normalized Overhead (%) Normalized Overhead (%) 1.5 Benchmark # of Ref. {} { A1 } { A1 , A2 } { A1 , A3 } { A1 , A2 , A3 } 6 1 FN FN Rate FN FN Rate FN FN Rate FN FN Rate FN FN Rate 4 perlbench 76538 2 0.026‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0.5 hmmer 13127 12 0.914‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 2 h264ref 20600 27 1.311‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ -0.5 gcc 262698 11 0.042‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 -1 gobmk 65244 86 1.318‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ 0 0.000‰ -2 -1.5 p b g m g h s l h m l s c g b n t m o [ base64 basename cat cksum comm cp csplit cut date tty uname unexpand uniq unlink uptime users vdir wc who i b h e z c o m j b 2 p t z c w f e m a i c c q i l h i t o t r p b n 6 c p e t p l f m g p n b m u 4 i d 2 g n s b d g e e a r x 12 / 14 k n e o n r 3 f o c t u s h m e 40 2 Processing Time (Seconds) Processing Time (Seconds) 30 1.5 20 1 10 0.5 0 0 m b o n g c t b s l m s m p l h g h g [ base64 basename cat chcon chgrp chmod chown chroot cksum unexpand uniq unlink uptime users vdir wc who whoami yes h i b c f w z t z p b j e m c 2 o a e m o t t q c i p e i p t i p h n l c r c 6 b n g p f l m d i u b 4 m g b s d 2 n g a e e r o x k n e 3 n r o t f c s u h e m
Overhead for REAL and SPEC 8 2 2 40 Processing Time (Seconds) Processing Time (Seconds) Normalized Overhead (%) Normalized Overhead (%) 1.5 6 30 1.5 1 4 0.5 20 1 0 2 -0.5 10 0.5 0 -1 -2 0 -1.5 0 m b o n g c t b s l m s m p l h g h g [ base64 basename cat chcon chgrp chmod chown chroot cksum unexpand uniq unlink uptime users vdir wc who whoami yes p b g m g h s l i h m l b s c g b n t m o [ base64 basename cat cksum comm cp csplit h cut date i tty uname unexpand uniq unlink b uptime users vdir wc who b h c f w z t z p b j e m c 2 o e z c o m j 2 p t z c w f o t a t c e i m c e i m a t o t i i h q l c r i c b q 6 l h i t p p e p g t p f n c r 6 b l p f m n c g p e p n n p u l m b m u 4 i d d i b 4 m 2 g n s b g b s d 2 n g a d g a e e r e e r x o x e k k n e o n n r n r 3 o 3 f o t f c t u c u s s h h m e e m No increase in binary size after first disassemble-assemble cycle 13 / 14
Conclusion Heuristic-based symbolization of memory references Uroboros 1 provides reassembleable disassembly Assumes availability of raw disassembly and function starting addresses Tested with gcc and Clang compiled binaries Limited support for C++ (need to parse DWARF) 1 Available at https://github.com/s3team/uroboros 14 / 14
More recommend