Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja
Agenda ● Motivation ● Current State of Program Analysis ● Design Goals of Binja Program Analysis ● Building Tools 2
Motivation 3
● Tooling - Concrete -> Symbolic ○ Increase speed & effectiveness of RE / VR ● Make Program Analysis more accessible & useful 4
Foundations • Need to understand code semantics • Could be done directly on the assembly • An Intermediate Language (IL) is needed 5
Why IL? • Architecture Abstraction • Smaller number of instructions 6
Easy to lift • Simple flags calculation • As close to native instructions as possible • Typeless - types inferred later 7
Easy to read • Intuitive to read • Tree-based infix notation • No register abstraction • Flags calculation only when necessary • Avoid excessive temporaries 8
IL Instruction Set Size Easier to analyze Easier to lift Instruction Set Size 9
The Options 10
Existing Options for IL • BAP • VEX • REIL • LLVM • IDA 11
BAP • Tree-tree based :) • Flags are explicit and inhibit readability :( • Written in OCAML :( 12
addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32 REBX:u32 = REBX:u32 + REAX:u32 RCF:bool = REBX:u32 < t:u32 addr 0x2 @asm ”shl %cl,%ebx” add ebx, eax t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 & shl ebx, cl 0x1f:u32) RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) & RCF:bool | ̃ ((RECX:u32 & 0x1f:u32) = 0:u32) & low:bool(t1:u32) 13
VEX • Register names are abstracted :( • Single assignment :( • Over 1000 instructions! :( • Yet they call it “RISC-like” • Even Angr is planning a move away from it 14
t0 = GET:I32(16) t1 = 0x8:I32 t3 = Sub32(t0,t1) subs R2, R2, #8 PUT(16) = t3 PUT(68) = 0x59FC8:I32 15
REIL • Tiny instruction set • Horrible readability • Makes abstractions nearly impossible • Flags are explicit and inhibit readability :( 16
00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 test eax, eax 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1 17
LLVM ● Easy to analyze and has great tools already available ● It’s a compiler! ○ Reversers want a decompiler. ○ Cannot be the only goal 18
LLVM Challenges ● Hard to lift well from compiled binaries ○ Designed for compiler output ● Expects type information in the instructions ● SSA form - assembly is not ● Stack in assembly looks like a structure, but structures lose many advantages of SSA 19
IDA ? 20
Binary Ninja’s Answer • Binary Ninja Intermediate Language (BNIL) 21
IL Goals & Design 22
Why Another IL? ● Popular existing ILs for compiled binaries are not very human readable . They are extremely low level and verbose. ● Existing ILs are single stage . Heavyweight analysis must be performed to get anywhere close to decompiled output. ● Writing a lifter for a new architecture is usually very time consuming. 23
Binary Ninja IL ● Create a family of ILs with multiple stages of analysis ● Lowest level is close to assembly ● After analysis and transformations, higher levels are closer to decompiled output and would be much easier to translate to good LLVM code ● Analysis involved in each transformation is easy to understand, fast, and directly aids further analysis 24
IL Design Goals ● Human readable ● Computer understandable (SSA, 3AF, etc.) ● Plugin understandable ● Easy to lift native architectures ● Translation to other ILs such as LLVM 25
Human Readable ● Reads like pseudocode, even in lowest level form ● Flags are resolved into readable expressions 26
Low Level IL Example lea rax, [0x201047] rax = 0x201047 lea rdi, [0x201040] rdi = 0x201040 push rbp push(rbp) sub rax, rdi rax = rax - rdi mov rbp, rsp rbp = rsp cmp rax, 0xe if (rax u> 0xe) then ja 0x68d 6 @ 0x68d else 8 @ 0x68b x86-64 Assembly Low Level IL 27
Low Level IL Example addiu $sp, $sp, -0x18 $sp = $sp - 0x18 sw $ra, 0x14($sp) [$sp + 0x14].d = $ra lw $a0, ($a1) $a0 = [$a1].d jal atoi call(atoi) nop $at = $v0 u< 0x20 ? 1 : 0 sltiu $at, $v0, 0x20 if ($at == 0) then beqz $at, 0x4002d8 7 @ 0x4002d8 else nop 12 @ 0x400290 MIPS Assembly Low Level IL 28
Computer Understandable ● Multiple IL forms ● Pick the right IL for the task at hand 29
IL Forms Lifted IL Low Level IL SSA / 3AF ASM -> IL Flags use resolved High Level IL Medium Level IL Calls in high level form Stack usage resolved SSA / 3AF Expression folding Type propagation Like decompiled output 30
Plugin Understandable ● All IL forms directly accessible from API ● Analysis performed on IL also accessible by API 31
Easy to Lift ● Expression tree ● Designed for quick, modular lifter implementations ● Semantic flags eases the burden of describing flag effects during lifting 32
Semantic Flags ● Architecture plugins define the set of flags and their semantic roles ● Instructions can define a set of flags they write ● Data flow analysis is performed to link flag uses to flag writes 33
Semantic Flags ● In most compiled code, flags are resolved to simple comparison expressions with no effort from the architecture plugin ● Special cases fall back to emitting concrete flag write expressions 34
Semantic Flags Example Folded expression “Writes to all ALU flags” describing use of flags sub.q{*}(rax, 0xe) if (rax u> 0xe) then if (u>) then … else … … else … “Flag state representing unsigned greater than” 35
Translating Upwards ● Semantic flags analysis gives Low Level IL with flag usage fully resolved ● Stack is represented as memory accesses, so data flow can be difficult to compute on stack variables in Low Level IL ● Need to analyze and translate to Medium Level IL 36
Low Level IL to Medium Level IL ● Low Level IL is translated to SSA form ● Use implicit data flow from SSA to resolve stack layout ● Data flow based stack layout resolution avoids problems with nonstandard frame pointer behavior ● Translate loads and stores on stack to stack variable uses and assignments 37
Medium Level IL Example push(ebp) ebp = esp var_4 = ebp esp = esp - 0x18 eax = arg_4 eax = [ebp + 8].d var_1c = eax [esp].d = eax free(var_1c) call(free) ebp = var_4 esp = ebp return ebp = pop <return> jump(pop) Medium Level IL 38
Medium Level IL ● Registers and stack usage are now both treated as variables ● Stack variables no longer use explicit memory access ● Translate to SSA form to obtain implicit data flow on both registers and stack variables ● Type propagation is performed on SSA form 39
Using Medium Level IL - Jump Tables 40
Using Medium Level IL - Jump Tables ● Jump table resolution based on path-sensitive data flow ● SSA conversion process also tracks control flow dependence for every block ● Data flow computations allow disjoint sets of possible values ● Reads from memory are simulated ● At jump site, possible values are the possible jump targets 41
Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Medium Level IL Solve for this to get jump targets SSA Form 42
Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Track flow backwards with SSA to find definitions 43
Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Memory read depends on value of x8#1 44
Jump Table Example x8#1 = zx.q(x0#2.d) Value used in if (x0#2.d u> 0x1f) branch then … else … comparison … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) 45
Jump Table Example Branch condition x8#1 = zx.q(x0#2.d) must be false to if (x0#2.d u> 0x1f) reach jump site then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) 46
Recommend
More recommend