DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - PowerPoint PPT Presentation

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sébastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sébastien Bardin et al. – Dagstuhl2017 | 1

IN A NUTSHELL • Challenge: malware deobfuscation • Standard techniques (dynamic, syntactic) not enough • Semantic methods can help [obfuscation preserves semantic] Yet, need to be strongly adapted (robustness, precision, efficiency) • • A tour on how symbolic methods can help • Explore and discover • Prove infeasibility [S&P 2017] -- with Robin David Simplify (not covered here) -- with Jonathan Salwan • Sébastien Bardin et al. – Dagstuhl2017 | 2

CONTEXT: MALWARE COMPREHENSION APT: highly sophisticated attacks The day after: malware comprehension • understand what has been going on • Targeted malware • mitigate, fix and clean • Written by experts • Attack: 0-days • improve defense • Defense: stealth, obfuscation • Sponsored by states or mafia USA elections: DNC Hack Goal: help malware comprehension • Reverse of heavily obfuscated code • Identify and simplify protections Sébastien Bardin et al. – Dagstuhl2017 | 3

CHALLENGE: CORRECT DISASSEMBLY Basic reverse problem • aka model recovery • aka CFG recovery Sébastien Bardin et al. – Dagstuhl2017 | 4

• code – data CAN BE TRICKY! • dynamic jumps (jmp eax) Sébastien Bardin et al. – Dagstuhl2017 | 5

REVERSE CAN BECOME A NIGHTMARE (OBFUSCATION) Goal: help malware comprehension Obfuscation: make a code • Identify and simplify protections hard to reverse • Ideal = revert protections • self-modification • encryption • virtualization • code overlapping • opaque predicates • callstack tampering • … Sébastien Bardin et al. – Dagstuhl2017 | 6

EXAMPLE: OPAQUE PREDICATE Constant-value predicates (always true, always false) • dead branch points to spurious code • goal = waste reverser time & efforts Sébastien Bardin et al. – Dagstuhl2017 | 7

EXAMPLE: STACK TAMPERING Alter the standard compilation scheme: ret do not go back to call • hide the real target • return site may be spurious code Sébastien Bardin et al. – Dagstuhl2017 | 8

STANDARD DISASSEMBLY TECHNIQUES ARE NOT ENOUGH Static analysis • too fragile vs obfuscation • junk instr, missed instr. Dynamic analysis • robust vs obfuscation • too incomplete Sébastien Bardin et al. – Dagstuhl2017 | 9

SOLUTION? BINARY-LEVEL SEMANTIC ANALYSIS Semantic preserved by obfuscation (?) Sébastien Bardin et al. – Dagstuhl2017 | 10

ABOUT FORMAL METHODS Success in safety-critical Sébastien Bardin et al. – Dagstuhl2017 | 11

THE HARD JOURNEY FROM SOURCE TO BINARY Wanted • robustness • precision • scale Sébastien Bardin et al. – Dagstuhl2017 | 12

STATIC SEMANTIC ANALYSIS IS VER VERY HARD ON BINARY CODE Problems • Jump eax • memory • Bit resoning Sébastien Bardin et al. – Dagstuhl2017 | 13

INSTEAD: DYNAMIC SYMBOLIC EXECUTION (DSE, Godefroid 2005) Perfect for intensive testing • Correct, relatively complete • No false alarm • Robust • Scale in some ways // incomplete Sébastien Bardin et al. – Dagstuhl2017 | 14

DSE: PATH PREDICATE COMPUTATION (DSE, Godefroid 2005) Sébastien Bardin et al. – Dagstuhl2017 | 15

ABOUT ROBUSTNESS (imo, the major advantage) « concretization » • Keep going when symbolic reasoning fails • Tune the tradeoff genericity - cost Sébastien Bardin et al. – Dagstuhl2017 | 16

DYNAMIC SYMBOLIC EXECUTION CAN HELP (Debray, Kruegel , …) For deobfuscation • find new real paths • robust • still incomplete « dynamic analysis on steroids » Sébastien Bardin et al. – Dagstuhl2017 | 17

DSE COMPLEMENTS DYNAMIC ANALYSIS Sébastien Bardin et al. – Dagstuhl2017 | 18

OVERVIEW Correct Complete Efficient Robust X -- / X OK X Static syntactic OK XX OK OK Dynamic OK -- X OK DSE X OK / X X X Static semantic Sébastien Bardin et al. – Dagstuhl2017 | 19

IN PRACTICE Can recover useful semantic information • More precise disassembly • Exact semantic of instructions • Input of interest • … Sébastien Bardin et al. – Dagstuhl2017 | 20

YET … WHAT ABOUT INFEASIBILITY QUESTIONS? Prove that something is always true (resp. false) Many such issues in reverse • is a branch dead? • does the ret always return to the call? • have i found all targets of a dynamic jump? And more • does this malicious ret always go there? • does this expression always evaluate to 15? • does this self-modification always write this opcode? • does this self-modification always rewrite this instr.? Not addressed by DSE • … • Cannot enumerate all paths Sébastien Bardin et al. – Dagstuhl2017 | 21

OUR CHALLENGE Check infeasibility questions in obfuscated codes • scale to realistic malware sizes • robust to obfuscation such as self-modification • precise • generic Rest of the talk: • opaque predicate • stack tampering Sébastien Bardin et al. – Dagstuhl2017 | 22

OUR PROPOSAL: BACKWARD-BOUNDED SYMBOLIC EXECUTION Insight 1: symbolic reasoning • precision Low FP/FN rates in practice • But: need finite #paths • ground truth xp Insight 2: backward-bounded • pre_k(c)=0 => c is infeasible False negative (FN) • finite #paths • can miss infeasibility • efficient, depends on k • why: k too small (miss /\-constraints) • But: backward on jump eax? Insight 3: dynamic partial CFG False positive (FP) • solve (partially) dyn. jumps • wrongly assert infeasibility • robustness • why: CFG too partial (miss \/-constraints) Sébastien Bardin et al. – Dagstuhl2017 | 23

FORWARD & BACKWARD SYMBOLIC EXECUTION Sébastien Bardin et al. – Dagstuhl2017 | 24

EXPERIMENTAL EVALUATION • Controlled experiments (ground truth) precision • Large-scale experiment: packers scalability, robustness • Case-study: X-tunnel malware usefulness Sébastien Bardin et al. – Dagstuhl2017 | 25

CONTROLLED EXPERIMENTS • Goal = assess the precision of the technique ground truth value • • Experiment 1: opaque predicates (o-llvm) • Very precise résults 100 core utils, 5x20 obfuscated codes • • Seems efficient k=16: 3.46% error, no false negative • robust to k • efficient: 0.02s / query • Experiment 2: stack tampering (tigress) • 5 obfuscated codes, 5 core utils • almost all genuine ret are proved (no false positive) • many malicious ret are proved « single-targets » • Sébastien Bardin et al. – Dagstuhl2017 | 26

CASE-STUDY: PACKERS Packers: legitimate software protection tools (basic malware: the sole protection) Sébastien Bardin et al. – Dagstuhl2017 | 27

CASE-STUDY: PACKERS (fun facts) Sébastien Bardin et al. – Dagstuhl2017 | 28

CASE-STUDY: PACKERS (fun facts) Sébastien Bardin et al. – Dagstuhl2017 | 29

CASE-STUDY: THE XTUNNEL MALWARE (part of DNC hack) Two heavily obfuscated samples • Many opaque predicates Goal: detect & remove protections • Identify 50% of code as spurious • Fully automatic, < 3h Sébastien Bardin et al. – Dagstuhl2017 | 30

CASE-STUDY: THE XTUNNEL MALWARE (fun facts) • Protection seems to rely only on opaque predicates • Only two families of opaque predicates • Yet, quite sophisticated original OPs • interleaving between payload and OP computation • sharing among OP computations • possibly long dependencies chains (avg 8.7, upto 230) • Sébastien Bardin et al. – Dagstuhl2017 | 31

SECURITY ANALYSIS: COUNTER-MEASURES (and mitigations) • Long dependecy chains (evading the bound k) • Not always requires the whole chain to conclude! • Can use a more flexible notion of bound (data-dependencies, formula size) • Hard-to-solve predicates (causing timeouts) A time-out is already a valuable information • • Opportunity to find infeasible patterns (then matching), or signatures Tradeoff between performance penalty vs protection focus • Note: must be input-dependent, otherwise removed by standard DSE optimizations • • Anti-dynamic tricks (fool initial dynamic recovery) Can use the appropriate mitigations • Note: some tricks can be circumvent by symbolic reasoning • Also Current state-of-the-art • « Probabilistic obfuscation » • push the cat-and-mouse game further • Covert channels • raise the bar for malware designers Sébastien Bardin et al. – Dagstuhl2017 | 32

SUMMARY Feasibility Infeasibility Efficient Robust X X OK X Static syntactic -- X OK OK Dynamic OK X X OK DSE X OK X X Static semantic X OK OK OK BB-DSE Sébastien Bardin et al. – Dagstuhl2017 | 33

BINSEC Sébastien Bardin et al. – Dagstuhl2017 | 34

CONCLUSION & TAKE AWAY • A tour on the advantages of symbolic methods for deobfuscation • Semantic analysis complements existing approaches • Explore, prove infeasible, simplify • Open the way to fruitful combinations • Formal methods can be useful for malware, but must be adapted Need robustness and scalability! • Accept to lose both correctness & completeness – in a controlled way • • Next Step Combines with user and learning! • Anti-anti-DSE • Sébastien Bardin et al. – Dagstuhl2017 | 35

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - PowerPoint PPT Presentation

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sbastien Bardin et al. Dagstuhl2017 | 1 IN A NUTSHELL Challenge: malware deobfuscation

Obfuscation vs. Deobfuscation ISSISP 2018 Christian Collberg University of Arizona 1.

Intro/Deobfuscation What's a Mobile Ambulatory Assessment System? Mobile Experiments with

Code Deobfuscation : Intertwining Dynamic, Static and Symbolic Approaches Robin David &

Playing with Binary Analysis Deobfuscation of VM based software protection Jonathan Salwan,

Deobfuscation and beyond Vasily Bukasov and Dmitry Schelkunov https://re-crypt.com Agenda

Advanced Anti-Deobfuscation Bjorn De Sutter ISSISP 2017 Paris 1 About me Research

QSynth - A Program Synthesis approach for Binary Code Deobfuscation Binary Analysis Workshop -

symbolic deobfuscation Sbastien Bardin (CEA LIST) Sbastien Bardin GreHack 2017 | 1 ABOUT

Groking the Linux SPI Subsystem FOSDEM 2017 Matt Porter Obligatory geek reference

the context of Reverse Engineering Sebastian Porst (sebastian.porst@zynamics.com) Christian

Groking the Linux SPI Subsystem Embedded Linux Conference 2017 Matt Porter Obligatory geek

Analysing iOS apps: road from AppStore to security analysis report Egor Fominykh, Lenar Safin,

Fun with symbolic execution Carl Svensson, 27 MSc in Computer Science, KTH Head of

Mess with the best, die like the rest (mode) Volodymyr Pikhur @vpikhur REcon Brussels 2018 1

Elba B. Foster Uni-HH/DESY/JAI-Oxford Introduction Peer Review - Scientific Advisory Committee

Leakage Squeezing Revisited Vincent Grosso 1 , Fran cois-Xavier Standaert 1 , Emmanuel Prouff 2 .

MDS: Mulple Dimensions of Sustainability Next Generaon Approaches

Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo

Natural Language Processing CSCI 4152/6509 Lecture 18 POS Tags; Hidden Markov Model (HMM)

ON THE AIR AT 1:00 PM DENVER TIME Live from the Shaumbra Theatre Coal Creek Canyon

Tournaments (with Mate) Here G is a connected graph with n vertices, and each vertex has a

1-color-avoiding paths, special tournaments, and incidence geometry Jonathan Tidor and Victor

Domination in tournaments Nicolas Bousquet Birmingham, June 2017 1/16 Tournaments A tournament

4.1 Eulerian Digraphs Since our definition for walks, paths, etc. can correspond to directed

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA - PowerPoint PPT Presentation

DEOBFUSCATION: SEMANTIC ANALYSIS TO THE RESCUE Sbastien Bardin (CEA LIST) Robin David (CEA LIST, QuarksLab) Jean-Yves Marion (LORIA) Sbastien Bardin et al. Dagstuhl2017 | 1 IN A NUTSHELL Challenge: malware deobfuscation

Obfuscation vs. Deobfuscation ISSISP 2018 Christian Collberg University of Arizona 1.

Intro/Deobfuscation What's a Mobile Ambulatory Assessment System? Mobile Experiments with

Code Deobfuscation : Intertwining Dynamic, Static and Symbolic Approaches Robin David &amp;

Playing with Binary Analysis Deobfuscation of VM based software protection Jonathan Salwan,

Deobfuscation and beyond Vasily Bukasov and Dmitry Schelkunov https://re-crypt.com Agenda

Advanced Anti-Deobfuscation Bjorn De Sutter ISSISP 2017 Paris 1 About me Research

QSynth - A Program Synthesis approach for Binary Code Deobfuscation Binary Analysis Workshop -

symbolic deobfuscation Sbastien Bardin (CEA LIST) Sbastien Bardin GreHack 2017 | 1 ABOUT

Groking the Linux SPI Subsystem FOSDEM 2017 Matt Porter Obligatory geek reference

the context of Reverse Engineering Sebastian Porst (sebastian.porst@zynamics.com) Christian

Groking the Linux SPI Subsystem Embedded Linux Conference 2017 Matt Porter Obligatory geek

Analysing iOS apps: road from AppStore to security analysis report Egor Fominykh, Lenar Safin,

Fun with symbolic execution Carl Svensson, 27 MSc in Computer Science, KTH Head of

Mess with the best, die like the rest (mode) Volodymyr Pikhur @vpikhur REcon Brussels 2018 1

Elba B. Foster Uni-HH/DESY/JAI-Oxford Introduction Peer Review - Scientific Advisory Committee

Leakage Squeezing Revisited Vincent Grosso 1 , Fran cois-Xavier Standaert 1 , Emmanuel Prouff 2 .

MDS: Mul*ple Dimensions of Sustainability Next Genera*on Approaches

Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo

Natural Language Processing CSCI 4152/6509 Lecture 18 POS Tags; Hidden Markov Model (HMM)

ON THE AIR AT 1:00 PM DENVER TIME Live from the Shaumbra Theatre Coal Creek Canyon

Tournaments (with Mate) Here G is a connected graph with n vertices, and each vertex has a

1-color-avoiding paths, special tournaments, and incidence geometry Jonathan Tidor and Victor

Domination in tournaments Nicolas Bousquet Birmingham, June 2017 1/16 Tournaments A tournament

4.1 Eulerian Digraphs Since our definition for walks, paths, etc. can correspond to directed

Code Deobfuscation : Intertwining Dynamic, Static and Symbolic Approaches Robin David &

MDS: Mulple Dimensions of Sustainability Next Generaon Approaches