Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information Systems, Singapore Management University, Singapore Mike Reiter UNC, Chapel Hill, NC, USA 1
Background and Introduction Overall Structure Conversion Algorithm Evaluation Summary 2
By statically analyzing the source code or the binary No false alarms Less sensitive By training Better tuned to the workload where they operate; more sensitive May suffer from false alarms 3
Input Input syscall syscall … ... [S. A. Hofmeyr et al.] [R. Sekar et al.] 4
Rebuild the model by collecting traces of the updated program Problems: 1. Setting up sanitized environment free of attacks 2. Setting up environment as similar as possible to the one in which the updated program will be run 3. Multiple such environments Adapt the old model to the changes induced by the patch 5
Bin Diff Analyzer Ingredient III Ingredient II Ingredient I 6
Call stack 1 main.2 f.1 sys1 main () { 1: int a = 2; Call stack 2 main.2 f.5 sys5 2: f(a); } f () void f (int x){ 1: sys_call (1); f.1 1 2: if (x == 1) main () 3: sys_call (3); 4: else if (x==2) main.2 5: sys_call (5); } f.5 5 7
White-box technique: main () f () main () { enter enter 1: int a = 2; 1 2: f(a); } f.1 void f (int x){ main.2 5 1: sys_call (1); 2: if (x == 1) f.3 f.5 3: sys_call (3); 4: else if (x==2) 3 5: sys_call (5); } exit exit 8
BinHunt [Gao, Reiter, Song, ICICS08] A novel technique for finding semantic differences in binary programs Computes the maximum common induced subgraphs between control flow graphs Maximum match per pair of functions Maximum match between two programs patched unpatched 9
Background and Introduction Overall Structure Conversion Algorithm Evaluation Summary 10
BinHunt Diff CFG pieces Old Execution Graph 11
Background and Introduction Overall Structure Conversion Algorithm Evaluation Summary 12
We iteratively do the conversion for each pair of matched functions: copying nodes and edges For the function that has no match or for the unmatched portion of the matched functions, we resort to static analysis Then we do conversion on edges which connect functions (calls and returns) 13
When simple copy doesn’t work g() f() call g’() call f’() jz jz ? syscall4 syscall3 noncall syscall3 call g’’() call f’’() Matched parts for two functions 14
“Extended Similarity” f() g() 3 3 syscall syscall g’() f’() call g’() call f’() enter enter syscall syscall … call g’’() call f’’() 4 4 exit exit Please see proof in the paper 15
Are the properties of Execution Graph model preserved after our conversion? Will the converted model raise false alarm on the language accepted by the trained model? How to make use of the output from binary difference analyzer? … 16
Background and Introduction Overall Structure Conversion Algorithm Evaluation Summary 17
tar Old version has an input validation error Patched part only involves a function call that does not make any syscalls EG unchanged ncompress Old version misses a boundary check In the new version, an err-msg printing branch is added Converted EG expanded slightly 18
ProFTPD Cross-site Request forgery is possible Input validation checks added Converted EG expanded slightly Unzip May pass invalid pointers and potentially execute arbitrary code Patch involves changes in four functions Nodes and edges increase more significantly 19
Copied Not copied Nodes Edges Nodes Edges tar 478 1430 0 (0%) 0 (0%) ncompress 151 489 3 (1.9%) 23 (4.5%) ProFTPD 775 1850 6 (0.7%) 28 (1.5%) unzip 374 1004 50 (11.8%) 195 (16.3%) Statistics for nodes and edges in the converted execution graph 20
Old Binary New binary Old EG New EG New EG (trained) (converted) (trained) nodes edges nodes edges time (sec) nodes edges tar 478 1430 478 1430 14.5 478 1430 ncompress 151 489 154 512 13.1 151 489 ProFTPD 775 1850 781 1878 17.4 776 1853 unzip 374 1004 424 1199 41.6 377 1017 Statistics for the size comparison and algorithm efficiency 21
System call sequences by analyzing the CFG U Please see proof in the paper System call sequences accepted by the converted model System call sequences System call sequence accepted by the trained model 22
Background and Introduction Overall Structure Conversion Algorithm Evaluation Summary 23
An approach to adapt a trained anomaly detector to software patches An algorithm for the conversion We show that our algorithm is sound (Proof) The behavior accepted by the converted detector is consistent with the static analysis of the binary (Empirically) The converted detector did not raise alarms on the behavior accepted by the trained detector of the new binary 24
pengli@cs.unc.edu 25
Recommend
More recommend