peng li
play

Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information - PowerPoint PPT Presentation

Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information Systems, Singapore Management University, Singapore Mike Reiter UNC, Chapel Hill, NC, USA 1 Background and Introduction Overall Structure Conversion Algorithm


  1. Peng Li UNC, Chapel Hill, NC, USA Debin Gao School of Information Systems, Singapore Management University, Singapore Mike Reiter UNC, Chapel Hill, NC, USA 1

  2.  Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary 2

  3.  By statically analyzing the source code or the binary  No false alarms  Less sensitive  By training  Better tuned to the workload where they operate; more sensitive  May suffer from false alarms 3

  4. Input Input syscall syscall … ... [S. A. Hofmeyr et al.] [R. Sekar et al.] 4

  5.  Rebuild the model by collecting traces of the updated program Problems: 1. Setting up sanitized environment free of attacks 2. Setting up environment as similar as possible to the one in which the updated program will be run 3. Multiple such environments  Adapt the old model to the changes induced by the patch 5

  6. Bin Diff Analyzer Ingredient III Ingredient II Ingredient I 6

  7. Call stack 1 main.2 f.1 sys1 main () { 1: int a = 2; Call stack 2 main.2 f.5 sys5 2: f(a); } f () void f (int x){ 1: sys_call (1); f.1 1 2: if (x == 1) main () 3: sys_call (3); 4: else if (x==2) main.2 5: sys_call (5); } f.5 5 7

  8. White-box technique: main () f () main () { enter enter 1: int a = 2; 1 2: f(a); } f.1 void f (int x){ main.2 5 1: sys_call (1); 2: if (x == 1) f.3 f.5 3: sys_call (3); 4: else if (x==2) 3 5: sys_call (5); } exit exit 8

  9. BinHunt [Gao, Reiter, Song, ICICS08]  A novel technique for finding semantic differences in binary programs  Computes the maximum common induced subgraphs between control flow graphs  Maximum match per pair of functions  Maximum match between two programs patched unpatched 9

  10.  Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary 10

  11. BinHunt Diff CFG pieces Old Execution Graph 11

  12.  Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary 12

  13.  We iteratively do the conversion for each pair of matched functions: copying nodes and edges  For the function that has no match or for the unmatched portion of the matched functions, we resort to static analysis  Then we do conversion on edges which connect functions (calls and returns) 13

  14. When simple copy doesn’t work g() f() call g’() call f’() jz jz ? syscall4 syscall3 noncall syscall3 call g’’() call f’’() Matched parts for two functions 14

  15. “Extended Similarity” f() g() 3 3 syscall syscall g’() f’() call g’() call f’() enter enter syscall syscall … call g’’() call f’’() 4 4 exit exit Please see proof in the paper 15

  16.  Are the properties of Execution Graph model preserved after our conversion?  Will the converted model raise false alarm on the language accepted by the trained model?  How to make use of the output from binary difference analyzer? … 16

  17.  Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary 17

  18.  tar  Old version has an input validation error  Patched part only involves a function call that does not make any syscalls  EG unchanged  ncompress  Old version misses a boundary check  In the new version, an err-msg printing branch is added  Converted EG expanded slightly 18

  19.  ProFTPD  Cross-site Request forgery is possible  Input validation checks added  Converted EG expanded slightly  Unzip  May pass invalid pointers and potentially execute arbitrary code  Patch involves changes in four functions  Nodes and edges increase more significantly 19

  20. Copied Not copied Nodes Edges Nodes Edges tar 478 1430 0 (0%) 0 (0%) ncompress 151 489 3 (1.9%) 23 (4.5%) ProFTPD 775 1850 6 (0.7%) 28 (1.5%) unzip 374 1004 50 (11.8%) 195 (16.3%) Statistics for nodes and edges in the converted execution graph 20

  21. Old Binary New binary Old EG New EG New EG (trained) (converted) (trained) nodes edges nodes edges time (sec) nodes edges tar 478 1430 478 1430 14.5 478 1430 ncompress 151 489 154 512 13.1 151 489 ProFTPD 775 1850 781 1878 17.4 776 1853 unzip 374 1004 424 1199 41.6 377 1017 Statistics for the size comparison and algorithm efficiency 21

  22. System call sequences by analyzing the CFG U Please see proof in the paper System call sequences accepted by the converted model System call sequences System call sequence accepted by the trained model 22

  23.  Background and Introduction  Overall Structure  Conversion Algorithm  Evaluation  Summary 23

  24.  An approach to adapt a trained anomaly detector to software patches  An algorithm for the conversion  We show that our algorithm is sound  (Proof) The behavior accepted by the converted detector is consistent with the static analysis of the binary  (Empirically) The converted detector did not raise alarms on the behavior accepted by the trained detector of the new binary 24

  25. pengli@cs.unc.edu 25

Recommend


More recommend