deobfuscation
play

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - PowerPoint PPT Presentation

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sbastien Bardin et al. Dagstuhl2017 | 1 IN A NUTSHELL Challenge: malware deobfuscation


  1. DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sébastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sébastien Bardin et al. – Dagstuhl2017 | 1

  2. IN A NUTSHELL • Challenge: malware deobfuscation • Standard techniques (dynamic, syntactic) not enough • Semantic methods can help [obfuscation preserves semantic] Yet, need to be strongly adapted (robustness, precision, efficiency) • • A tour on how symbolic methods can help • Explore and discover • Prove infeasibility [S&P 2017] -- with Robin David Simplify (not covered here) -- with Jonathan Salwan • Sébastien Bardin et al. – Dagstuhl2017 | 2

  3. CONTEXT: MALWARE COMPREHENSION APT: highly sophisticated attacks The day after: malware comprehension • understand what has been going on • Targeted malware • mitigate, fix and clean • Written by experts • Attack: 0-days • improve defense • Defense: stealth, obfuscation • Sponsored by states or mafia USA elections: DNC Hack Goal: help malware comprehension • Reverse of heavily obfuscated code • Identify and simplify protections Sébastien Bardin et al. – Dagstuhl2017 | 3

  4. CHALLENGE: CORRECT DISASSEMBLY Basic reverse problem • aka model recovery • aka CFG recovery Sébastien Bardin et al. – Dagstuhl2017 | 4

  5. • code – data CAN BE TRICKY! • dynamic jumps (jmp eax) Sébastien Bardin et al. – Dagstuhl2017 | 5

  6. REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION) Goal: help malware comprehension Obfuscation: make a code • Identify and simplify protections hard to reverse • Ideal = revert protections • self-modification • encryption • virtualization • code overlapping • opaque predicates • callstack tampering • … Sébastien Bardin et al. – Dagstuhl2017 | 6

  7. EXAMPLE: OPAQUE PREDICATE Constant-value predicates (always true, always false) • dead branch points to spurious code • goal = waste reverser time & efforts Sébastien Bardin et al. – Dagstuhl2017 | 7

  8. EXAMPLE: STACK TAMPERING Alter the standard compilation scheme: ret do not go back to call • hide the real target • return site may be spurious code Sébastien Bardin et al. – Dagstuhl2017 | 8

  9. STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH Static analysis • too fragile vs obfuscation • junk instr, missed instr. Dynamic analysis • robust vs obfuscation • too incomplete Sébastien Bardin et al. – Dagstuhl2017 | 9

  10. SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS Semantic preserved by obfuscation (?) Sébastien Bardin et al. – Dagstuhl2017 | 10

  11. ABOUT FORMAL METHODS Success in safety-critical Sébastien Bardin et al. – Dagstuhl2017 | 11

  12. THE HARD JOURNEY FROM SOURCE TO BINARY Wanted • robustness • precision • scale Sébastien Bardin et al. – Dagstuhl2017 | 12

  13. STATIC SEMANTIC ANALYSIS IS VER VERY HARD ON BINARY CODE Problems • Jump eax • memory • Bit resoning Sébastien Bardin et al. – Dagstuhl2017 | 13

  14. INSTEAD: DYNAMIC SYMBOLIC EXECUTION (DSE, Godefroid 2005) Perfect for intensive testing • Correct, relatively complete • No false alarm • Robust • Scale in some ways // incomplete Sébastien Bardin et al. – Dagstuhl2017 | 14

  15. DSE: PATH PREDICATE COMPUTATION (DSE, Godefroid 2005) Sébastien Bardin et al. – Dagstuhl2017 | 15

  16. ABOUT ROBUSTNESS (imo, the major advantage) « concretization » • Keep going when symbolic reasoning fails • Tune the tradeoff genericity - cost Sébastien Bardin et al. – Dagstuhl2017 | 16

  17. DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel , …) For deobfuscation • find new real paths • robust • still incomplete « dynamic analysis on steroids » Sébastien Bardin et al. – Dagstuhl2017 | 17

  18. DSE COMPLEMENTS DYNAMIC ANALYSIS Sébastien Bardin et al. – Dagstuhl2017 | 18

  19. OVERVIEW Correct Complete Efficient Robust X -- / X OK X Static syntactic OK XX OK OK Dynamic OK -- X OK DSE X OK / X X X Static semantic Sébastien Bardin et al. – Dagstuhl2017 | 19

  20. IN PRACTICE Can recover useful semantic information • More precise disassembly • Exact semantic of instructions • Input of interest • … Sébastien Bardin et al. – Dagstuhl2017 | 20

  21. YET … WHAT ABOUT INFEASIBILITY QUESTIONS? Prove that something is always true (resp. false) Many such issues in reverse • is a branch dead? • does the ret always return to the call? • have i found all targets of a dynamic jump? And more • does this malicious ret always go there? • does this expression always evaluate to 15? • does this self-modification always write this opcode? • does this self-modification always rewrite this instr.? Not addressed by DSE • … • Cannot enumerate all paths Sébastien Bardin et al. – Dagstuhl2017 | 21

  22. OUR CHALLENGE Check infeasibility questions in obfuscated codes • scale to realistic malware sizes • robust to obfuscation such as self-modification • precise • generic Rest of the talk: • opaque predicate • stack tampering Sébastien Bardin et al. – Dagstuhl2017 | 22

  23. OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION Insight 1: symbolic reasoning • precision Low FP/FN rates in practice • But: need finite #paths • ground truth xp Insight 2: backward-bounded • pre_k(c)=0 => c is infeasible False negative (FN) • finite #paths • can miss infeasibility • efficient, depends on k • why: k too small (miss /\-constraints) • But: backward on jump eax? Insight 3: dynamic partial CFG False positive (FP) • solve (partially) dyn. jumps • wrongly assert infeasibility • robustness • why: CFG too partial (miss \/-constraints) Sébastien Bardin et al. – Dagstuhl2017 | 23

  24. FORWARD & BACKWARD SYMBOLIC EXECUTION Sébastien Bardin et al. – Dagstuhl2017 | 24

  25. EXPERIMENTAL EVALUATION • Controlled experiments (ground truth) precision • Large-scale experiment: packers scalability, robustness • Case-study: X-tunnel malware usefulness Sébastien Bardin et al. – Dagstuhl2017 | 25

  26. CONTROLLED EXPERIMENTS • Goal = assess the precision of the technique ground truth value • • Experiment 1: opaque predicates (o-llvm) • Very precise résults 100 core utils, 5x20 obfuscated codes • • Seems efficient k=16: 3.46% error, no false negative • robust to k • efficient: 0.02s / query • Experiment 2: stack tampering (tigress) • 5 obfuscated codes, 5 core utils • almost all genuine ret are proved (no false positive) • many malicious ret are proved « single-targets » • Sébastien Bardin et al. – Dagstuhl2017 | 26

  27. CASE-STUDY: PACKERS Packers: legitimate software protection tools (basic malware: the sole protection) Sébastien Bardin et al. – Dagstuhl2017 | 27

  28. CASE-STUDY: PACKERS (fun facts) Sébastien Bardin et al. – Dagstuhl2017 | 28

  29. CASE-STUDY: PACKERS (fun facts) Sébastien Bardin et al. – Dagstuhl2017 | 29

  30. CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack) Two heavily obfuscated samples • Many opaque predicates Goal: detect & remove protections • Identify 50% of code as spurious • Fully automatic, < 3h Sébastien Bardin et al. – Dagstuhl2017 | 30

  31. CASE-STUDY: THE XTUNNEL MALWARE (fun facts) • Protection seems to rely only on opaque predicates • Only two families of opaque predicates • Yet, quite sophisticated original OPs • interleaving between payload and OP computation • sharing among OP computations • possibly long dependencies chains (avg 8.7, upto 230) • Sébastien Bardin et al. – Dagstuhl2017 | 31

  32. SECURITY ANALYSIS: COUNTER-MEASURES (and mitigations) • Long dependecy chains (evading the bound k) • Not always requires the whole chain to conclude! • Can use a more flexible notion of bound (data-dependencies, formula size) • Hard-to-solve predicates (causing timeouts) A time-out is already a valuable information • • Opportunity to find infeasible patterns (then matching), or signatures Tradeoff between performance penalty vs protection focus • Note: must be input-dependent, otherwise removed by standard DSE optimizations • • Anti-dynamic tricks (fool initial dynamic recovery) Can use the appropriate mitigations • Note: some tricks can be circumvent by symbolic reasoning • Also Current state-of-the-art • « Probabilistic obfuscation » • push the cat-and-mouse game further • Covert channels • raise the bar for malware designers Sébastien Bardin et al. – Dagstuhl2017 | 32

  33. SUMMARY Feasibility Infeasibility Efficient Robust X X OK X Static syntactic -- X OK OK Dynamic OK X X OK DSE X OK X X Static semantic X OK OK OK BB-DSE Sébastien Bardin et al. – Dagstuhl2017 | 33

  34. BINSEC Sébastien Bardin et al. – Dagstuhl2017 | 34

  35. CONCLUSION & TAKE AWAY • A tour on the advantages of symbolic methods for deobfuscation • Semantic analysis complements existing approaches • Explore, prove infeasible, simplify • Open the way to fruitful combinations • Formal methods can be useful for malware, but must be adapted Need robustness and scalability! • Accept to lose both correctness & completeness – in a controlled way • • Next Step Combines with user and learning! • Anti-anti-DSE • Sébastien Bardin et al. – Dagstuhl2017 | 35

Recommend


More recommend