QSynth - A Program Synthesis approach for Binary Code Deobfuscation Binary Analysis Workshop - NDSS Robin David <rdavid@quarkslab.com> Luigi Coniglio <luigi.coniglio@studenti.unitn.it> Mariano Ceccato <mariano.ceccato@univr.it> February 23th, 2020 - San Diego, California www.quarkslab.com
Talk Outline Context: ◮ Need to address highly obfuscated binaries ◮ Few approaches address data obfuscation Goal: deobfuscating expression (obfuscated with data transformations) 2 / 26
Talk Outline Context: ◮ Need to address highly obfuscated binaries ◮ Few approaches address data obfuscation Goal: deobfuscating expression (obfuscated with data transformations) Takeway We provide a synthesis approach addressing various obfuscations and that supersede the state-of-the-art in both speed and accuracy 2 / 26
Table of Contents Background Software obfuscation Deobfuscation techniques Our Synthesis Approach Goal & Contributions Approach steps Experimental Benchmarks Experimental Setup Benchmarks Conclusion 3 / 26
Obfuscation types Control-Flow Obfuscation Hiding the logic and algorithm of the program Virtualization, Opaque predicates, CFG-flattening, Split, Merge, Packing, Implicit Flow, MBA, Loop-Unrolling... Example ⇒ 4 / 26
Obfuscation types Control-Flow Obfuscation Data-Flow Obfuscation Hiding the logic and algorithm of the Hiding data, constants, strings, APIs, program keys etc. Virtualization, Opaque predicates, Data encoding, MBA, Arithmetic CFG-flattening, Split, Merge, Packing, Encoding, Whitebox, Array Split, Fold and Implicit Flow, MBA, Loop-Unrolling... Merge, Variable Splitting... Example (((((( a ∧¬ b )+ b ) << 1 ) ∧¬ (( a ∨ b ) − ⇒ a + b ( a ∧ b ))) << 1 ) − (((( a ∧¬ b )+ b ) << 1 ) ⊕ (( a ∨ b ) − ( a ∧ b )))) 4 / 26
Obfuscation types Control-Flow Obfuscation Data-Flow Obfuscation Hiding the logic and algorithm of the Hiding data, constants, strings, APIs, program keys etc. Virtualization, Opaque predicates, Data encoding, MBA, Arithmetic CFG-flattening, Split, Merge, Packing, Encoding, Whitebox, Array Split, Fold and Implicit Flow, MBA, Loop-Unrolling... Merge, Variable Splitting... Example (((((( a ∧¬ b )+ b ) << 1 ) ∧¬ (( a ∨ b ) − ⇒ a + b ( a ∧ b ))) << 1 ) − (((( a ∧¬ b )+ b ) << 1 ) ⊕ (( a ∨ b ) − ( a ∧ b )))) Problem: Reverting an obfuscating transformation is hard. 4 / 26
Deobfuscation Let’s focus on two deobfuscation techniques: Dynamic Symbolic Execution Program Synthesis 5 / 26
Symbolic Execution Definition Mean of executing a program using symbolic values (logical symbols) rather than real values (bitvectors) in order to obtain an in-out relationship of a path 6 / 26
Symbolic Execution Definition Mean of executing a program using symbolic values (logical symbols) rather than real values (bitvectors) in order to obtain an in-out relationship of a path Dynamic Symbolic Execution (a.k.a. concolic) ◮ Properties: work on dynamic paths and use runtime values ◮ Advantages: path sure to be feasible and thwart various obfuscations 6 / 26
Symbolic Execution: Example ⇒ In this context used to extract symbolic expressions (e.g. b) Symbolic State 7 / 26
Symbolic Execution: Example ⇒ In this context used to extract symbolic expressions (e.g. b) Symbolic State φ b = b 7 / 26
Symbolic Execution: Example ⇒ In this context used to extract symbolic expressions (e.g. b) Symbolic State φ b = b φ b = b + ( a | − 1 ) − 1 7 / 26
Symbolic Execution: Example ⇒ In this context used to extract symbolic expressions (e.g. b) Symbolic State φ b = b φ b = b + ( a | − 1 ) − 1 φ b = b + ( a | − 1 ) − 1 − (( ∼ a ) & − 1 ) 7 / 26
Symbolic Execution: Example ⇒ In this context used to extract symbolic expressions (e.g. b) Symbolic State φ b = b φ b = b + ( a | − 1 ) − 1 φ b = b + ( a | − 1 ) − 1 − (( ∼ a ) & − 1 ) φ b = b + ( a | − 1 ) − 1 − (( ∼ a ) & − 1 ) − 1 + ((( b + ( a | − 1 ) − 1 − (( ∼ a )& − 1 )) × ( b + . . . Question: How to simplify the φ b expression? (Knowing that the quality of the result depends on the syntactic complexity of the obfuscated expression) 7 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output Obfuscated Program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output 1, 2 3 Obfuscated Program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output 1, 2 3 2, 2 4 Obfuscated Program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output 1, 2 3 2, 2 4 Obfuscated 2, 3 5 Program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output 1, 2 3 ⇒ a + b 2, 2 4 Obfuscated 2, 3 5 Program 8 / 26
Program Synthesis Definition Program synthesis consists in automatically deriving a program from: ◮ a high-level specification (typically its I/O behaviour) ◮ additional constraints: ◮ Compilation: a faster program ◮ Deobfuscation: a smaller or more readable program Example Input Output 1, 2 3 ⇒ a + b 2, 2 4 Obfuscated 2, 3 5 Program Problem Synthesizing programs (expressions) with complex behaviors is hard . 8 / 26
Table of Contents Background Software obfuscation Deobfuscation techniques Our Synthesis Approach Goal & Contributions Approach steps Experimental Benchmarks Experimental Setup Benchmarks Conclusion 9 / 26
Key Intuition Symbolic Execution + Capture full semantic - Influenced by syntactic complexity 10 / 26
Key Intuition Symbolic Execution Program Synthesis + Only influenced by semantic + Capture full semantic complexity - Influenced by syntactic - Black-box ⇒ big search space complexity 10 / 26
Key Intuition Symbolic Execution Program Synthesis + Only influenced by semantic + Capture full semantic complexity - Influenced by syntactic - Black-box ⇒ big search space complexity Idea : Using symbolic execution to reduce the synthesis search space 10 / 26
Contributions A synthesis approach using an Offline Enumerative Search based on pre-computed lookup tables combined with an Abstract Syntax Tree simplification algorithm which outperform similar approach of the state-of-the-art (e.g. Syntia) 11 / 26
QSynth: Overview Enumerative Synthesis Oracle (generated once for all) inputs equivalent outputs expression Obfuscated Execution Obfuscated Simplification program trace expressions synthesized Execution tracing Dynamic Symbolic Strategy expressions Execution (DBI) (for each sub-expression) 12 / 26
QSynth: Overview Enumerative Synthesis Oracle (generated once for all) inputs equivalent outputs expression Obfuscated Execution Obfuscated Simplification program trace expressions synthesized Execution tracing Dynamic Symbolic Strategy expressions Execution (DBI) (for each sub-expression) Tool: QBDI 12 / 26
Execution Tracing Original Dynamic Binary Instrumentation mov qword [0x000232c0], 8 mov r13, rax test rax, rax Using QBDI : QuarkslaB Dynamic binary je 0x42a7 xor r8d, r8d Instrumentation (similar to Pin, DynamoRIO) xor edx, edx xor esi, esi + multi-architecture & platform Instrumentation - no (direct) thread support mov qword [0x000232c0], 8 ; Some code ... mov r13, rax Qtracer (a qbditool like Pin ‘‘pintools’’) ; Some code ... test rax, rax ; Some code ... je <patched address> ◮ gather instruction executed with their ; Some code ... xor r8d, r8d concrete state (registers and memory) ; Some code ... xor edx, edx ; Some code ... ◮ Data are consolidated in database xor esi, esi ; Some code ... (SQLite, PostgresSQL etc.) Instrumented https://qbdi.quarkslab.com/ 13 / 26
Recommend
More recommend