be a binary rockst r
play

Be a Binary Rockst r An Introduction to Program Analysis with - PowerPoint PPT Presentation

Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja Agenda Motivation Current State of Program Analysis Design Goals of Binja Program Analysis Building Tools 2 Motivation 3 Tooling - Concrete ->


  1. Be a Binary Rockst r An Introduction to Program Analysis with Binary Ninja

  2. Agenda ● Motivation ● Current State of Program Analysis ● Design Goals of Binja Program Analysis ● Building Tools 2

  3. Motivation 3

  4. ● Tooling - Concrete -> Symbolic ○ Increase speed & effectiveness of RE / VR ● Make Program Analysis more accessible & useful 4

  5. Foundations • Need to understand code semantics • Could be done directly on the assembly • An Intermediate Language (IL) is needed 5

  6. Why IL? • Architecture Abstraction • Smaller number of instructions 6

  7. Easy to lift • Simple flags calculation • As close to native instructions as possible • Typeless - types inferred later 7

  8. Easy to read • Intuitive to read • Tree-based infix notation • No register abstraction • Flags calculation only when necessary • Avoid excessive temporaries 8

  9. IL Instruction Set Size Easier to analyze Easier to lift Instruction Set Size 9

  10. The Options 10

  11. Existing Options for IL • BAP • VEX • REIL • LLVM • IDA 11

  12. BAP • Tree-tree based :) • Flags are explicit and inhibit readability :( • Written in OCAML :( 12

  13. addr 0x0 @asm ”add %eax,%ebx” t:u32 = REBX:u32 REBX:u32 = REBX:u32 + REAX:u32 RCF:bool = REBX:u32 < t:u32 addr 0x2 @asm ”shl %cl,%ebx” add ebx, eax t1:u32 = REBX:u32 >> 0x20:u32 − (RECX:u32 & shl ebx, cl 0x1f:u32) RCF:bool = ((RECX:u32 & 0x1f:u32) = 0:u32) & RCF:bool | ̃ ((RECX:u32 & 0x1f:u32) = 0:u32) & low:bool(t1:u32) 13

  14. VEX • Register names are abstracted :( • Single assignment :( • Over 1000 instructions! :( • Yet they call it “RISC-like” • Even Angr is planning a move away from it 14

  15. t0 = GET:I32(16) t1 = 0x8:I32 t3 = Sub32(t0,t1) subs R2, R2, #8 PUT(16) = t3 PUT(68) = 0x59FC8:I32 15

  16. REIL • Tiny instruction set • Horrible readability • Makes abstractions nearly impossible • Flags are explicit and inhibit readability :( 16

  17. 00000000.00 STR R_EAX:32, , V_00:32 00000000.01 STR 0:1, , R_CF:1 00000000.02 AND V_00:32, ff:8, V_01:8 00000000.03 SHR V_01:8, 7:8, V_02:8 00000000.04 SHR V_01:8, 6:8, V_03:8 00000000.05 XOR V_02:8, V_03:8, V_04:8 00000000.06 SHR V_01:8, 5:8, V_05:8 00000000.07 SHR V_01:8, 4:8, V_06:8 00000000.08 XOR V_05:8, V_06:8, V_07:8 00000000.09 XOR V_04:8, V_07:8, V_08:8 00000000.0a SHR V_01:8, 3:8, V_09:8 00000000.0b SHR V_01:8, 2:8, V_10:8 00000000.0c XOR V_09:8, V_10:8, V_11:8 test eax, eax 00000000.0d SHR V_01:8, 1:8, V_12:8 00000000.0e XOR V_12:8, V_01:8, V_13:8 00000000.0f XOR V_11:8, V_13:8, V_14:8 00000000.10 XOR V_08:8, V_14:8, V_15:8 00000000.11 AND V_15:8, 1:1, V_16:1 00000000.12 NOT V_16:1, , R_PF:1 00000000.13 STR 0:1, , R_AF:1 00000000.14 EQ V_00:32, 0:32, R_ZF:1 00000000.15 SHR V_00:32, 1f:32, V_17:32 00000000.16 AND 1:32, V_17:32, V_18:32 00000000.17 EQ 1:32, V_18:32, R_SF:1 00000000.18 STR 0:1, , R_OF:1 17

  18. LLVM ● Easy to analyze and has great tools already available ● It’s a compiler! ○ Reversers want a decompiler. ○ Cannot be the only goal 18

  19. LLVM Challenges ● Hard to lift well from compiled binaries ○ Designed for compiler output ● Expects type information in the instructions ● SSA form - assembly is not ● Stack in assembly looks like a structure, but structures lose many advantages of SSA 19

  20. IDA ? 20

  21. Binary Ninja’s Answer • Binary Ninja Intermediate Language (BNIL) 21

  22. IL Goals & Design 22

  23. Why Another IL? ● Popular existing ILs for compiled binaries are not very human readable . They are extremely low level and verbose. ● Existing ILs are single stage . Heavyweight analysis must be performed to get anywhere close to decompiled output. ● Writing a lifter for a new architecture is usually very time consuming. 23

  24. Binary Ninja IL ● Create a family of ILs with multiple stages of analysis ● Lowest level is close to assembly ● After analysis and transformations, higher levels are closer to decompiled output and would be much easier to translate to good LLVM code ● Analysis involved in each transformation is easy to understand, fast, and directly aids further analysis 24

  25. IL Design Goals ● Human readable ● Computer understandable (SSA, 3AF, etc.) ● Plugin understandable ● Easy to lift native architectures ● Translation to other ILs such as LLVM 25

  26. Human Readable ● Reads like pseudocode, even in lowest level form ● Flags are resolved into readable expressions 26

  27. Low Level IL Example lea rax, [0x201047] rax = 0x201047 lea rdi, [0x201040] rdi = 0x201040 push rbp push(rbp) sub rax, rdi rax = rax - rdi mov rbp, rsp rbp = rsp cmp rax, 0xe if (rax u> 0xe) then ja 0x68d 6 @ 0x68d else 8 @ 0x68b x86-64 Assembly Low Level IL 27

  28. Low Level IL Example addiu $sp, $sp, -0x18 $sp = $sp - 0x18 sw $ra, 0x14($sp) [$sp + 0x14].d = $ra lw $a0, ($a1) $a0 = [$a1].d jal atoi call(atoi) nop $at = $v0 u< 0x20 ? 1 : 0 sltiu $at, $v0, 0x20 if ($at == 0) then beqz $at, 0x4002d8 7 @ 0x4002d8 else nop 12 @ 0x400290 MIPS Assembly Low Level IL 28

  29. Computer Understandable ● Multiple IL forms ● Pick the right IL for the task at hand 29

  30. IL Forms Lifted IL Low Level IL SSA / 3AF ASM -> IL Flags use resolved High Level IL Medium Level IL Calls in high level form Stack usage resolved SSA / 3AF Expression folding Type propagation Like decompiled output 30

  31. Plugin Understandable ● All IL forms directly accessible from API ● Analysis performed on IL also accessible by API 31

  32. Easy to Lift ● Expression tree ● Designed for quick, modular lifter implementations ● Semantic flags eases the burden of describing flag effects during lifting 32

  33. Semantic Flags ● Architecture plugins define the set of flags and their semantic roles ● Instructions can define a set of flags they write ● Data flow analysis is performed to link flag uses to flag writes 33

  34. Semantic Flags ● In most compiled code, flags are resolved to simple comparison expressions with no effort from the architecture plugin ● Special cases fall back to emitting concrete flag write expressions 34

  35. Semantic Flags Example Folded expression “Writes to all ALU flags” describing use of flags sub.q{*}(rax, 0xe) if (rax u> 0xe) then if (u>) then … else … … else … “Flag state representing unsigned greater than” 35

  36. Translating Upwards ● Semantic flags analysis gives Low Level IL with flag usage fully resolved ● Stack is represented as memory accesses, so data flow can be difficult to compute on stack variables in Low Level IL ● Need to analyze and translate to Medium Level IL 36

  37. Low Level IL to Medium Level IL ● Low Level IL is translated to SSA form ● Use implicit data flow from SSA to resolve stack layout ● Data flow based stack layout resolution avoids problems with nonstandard frame pointer behavior ● Translate loads and stores on stack to stack variable uses and assignments 37

  38. Medium Level IL Example push(ebp) ebp = esp var_4 = ebp esp = esp - 0x18 eax = arg_4 eax = [ebp + 8].d var_1c = eax [esp].d = eax free(var_1c) call(free) ebp = var_4 esp = ebp return ebp = pop <return> jump(pop) Medium Level IL 38

  39. Medium Level IL ● Registers and stack usage are now both treated as variables ● Stack variables no longer use explicit memory access ● Translate to SSA form to obtain implicit data flow on both registers and stack variables ● Type propagation is performed on SSA form 39

  40. Using Medium Level IL - Jump Tables 40

  41. Using Medium Level IL - Jump Tables ● Jump table resolution based on path-sensitive data flow ● SSA conversion process also tracks control flow dependence for every block ● Data flow computations allow disjoint sets of possible values ● Reads from memory are simulated ● At jump site, possible values are the possible jump targets 41

  42. Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Medium Level IL Solve for this to get jump targets SSA Form 42

  43. Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Track flow backwards with SSA to find definitions 43

  44. Jump Table Example x8#1 = zx.q(x0#2.d) if (x0#2.d u> 0x1f) then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) Memory read depends on value of x8#1 44

  45. Jump Table Example x8#1 = zx.q(x0#2.d) Value used in if (x0#2.d u> 0x1f) branch then … else … comparison … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) 45

  46. Jump Table Example Branch condition x8#1 = zx.q(x0#2.d) must be false to if (x0#2.d u> 0x1f) reach jump site then … else … … x8#2 = sx.q([table + (x8#1 << 2)].d) x8#3 = x8#2 + table jump(x8#3) 46

Recommend


More recommend