implementing an llvm based d ynamic b inary i
play

Implementing an LLVM based D ynamic B inary I nstrumentation - PowerPoint PPT Presentation

Implementing an LLVM based D ynamic B inary I nstrumentation framework Charles Hubain Cdric Tessier Introduction to Instrumentation 34c3 - Implementing an LLVM based DBI framework 2 What is Instrumentation? Transformation of a program


  1. Implementing an LLVM based D ynamic B inary I nstrumentation framework Charles Hubain Cédric Tessier

  2. Introduction to Instrumentation 34c3 - Implementing an LLVM based DBI framework 2

  3. What is Instrumentation? • “Transformation of a program into its own measurement tool” • Observe any state of a program anytime during runtime • Automate the data collection and processing 34c3 - Implementing an LLVM based DBI framework 3

  4. Use Cases • Finding memory bugs: • Track memory allocations / deallocations • Track memory accesses • Fuzzing: • Measure code coverage • Build symbolic representation of code • Recording execution traces • Replay them for “timeless” debugging • Software side-channel attacks against crypto 34c3 - Implementing an LLVM based DBI framework 4

  5. “Why not … debuggers?” • Debuggers are awesome but slooooooooow Resume Schedule Signal + Debugger Kernel Target Trap interrupt schedule 34c3 - Implementing an LLVM based DBI framework 5

  6. https://asciinema.org/a/17nynlopg5a18e1qps3r9ou7g

  7. “Why not … debuggers?” • Debuggers are awesome but slooooooooow Resume Schedule Signal + Debugger Kernel Target Trap interrupt schedule • Solution? Get rid of the kernel • How? Run the instrumentation inside the target 34c3 - Implementing an LLVM based DBI framework 7

  8. Instrumentation Techniques • From source code: • Manually, you know … printf(…) BORING • At compile time • From binary: • Static binary patching & hooking Crude and barbaric • Dynamic Binary Instrumentation This talk 34c3 - Implementing an LLVM based DBI framework 8

  9. Existing Frameworks • Valgrind since 2000 • Open source, only *nix platforms, very complex • DynamoRIO since 2002 • Open source, cross-platforms, very raw • Intel Pin since 2004 • Closed source, only Intel platforms, user friendly 34c3 - Implementing an LLVM based DBI framework 9

  10. “Why we made our own” What we wanted from a DBI framework in 2015 • Cross-platform and cross-architecture • Mobile and embedded targets support • Simpler and modular design • Focus on “heavy” instrumentation 34c3 - Implementing an LLVM based DBI framework 10

  11. Introduction to DBI 34c3 - Implementing an LLVM based DBI framework 11

  12. Dynamic Binary Instrumentation • Dynamically insert the instrumentation at runtime Generate Disassemble Insert Execute Instrumentation Instru Original Binary Code PAC-MAN for scale 34c3 - Implementing an LLVM based DBI framework 12

  13. Disassembling • What part of the binary is the code is unknown ➡ Disassembling the whole binary in advance is impossible • We need to discover the code as we go 34c3 - Implementing an LLVM based DBI framework 13

  14. Code Discovery • How? • Execute a block of code • Discover where the execution flow after the block • Execute the next block of code • This forms a short execution cycle 34c3 - Implementing an LLVM based DBI framework 14

  15. No Free Space • The instrumented code is larger than Instruction Instruction the original code Instruction … COND JUMP • Binaries are usually tightly packed with TRUE Instruction little free space FALSE Instruction Instruction … JUMP ➡ The instrumentation cannot be Instruction inserted in-place Instruction Instruction … ➡ It needs to be “ relocated” JUMP 34c3 - Implementing an LLVM based DBI framework 15

  16. Relocating • Code contains relative reference to memory addresses • These become invalid once we move the code • We need to completely rewrite the code to fix those references ➡ This is what we call “ patching ” 34c3 - Implementing an LLVM based DBI framework 16

  17. The “Cycle of Life” 34c3 - Implementing an LLVM based DBI framework 17

  18. Designing a DBI: 1. Low Level Abstractions 34c3 - Implementing an LLVM based DBI framework 18

  19. Basic Blocks Instruction Instruction Instruction … Instruction Instruction Instruction Instruction Instruction Instruction … … Instruction Instruction Instruction … 34c3 - Implementing an LLVM based DBI framework 19

  20. Control Flow Instruction Instruction Instruction … JUMP Instruction Instruction Instruction Instruction Instruction Instruction … … JUMP JUMP Instruction Instruction Instruction … JUMP 34c3 - Implementing an LLVM based DBI framework 20

  21. Under Control Flow Guest Host Instruction Instruction Instruction … JUMP Instruction Instruction Instruction … JUMP DBI Instruction Instruction Instruction … JUMP Instruction Instruction Instruction … JUMP 34c3 - Implementing an LLVM based DBI framework 21

  22. Under Control DBI is all about keeping control of the execution 34c3 - Implementing an LLVM based DBI framework 22

  23. Under Control • Keeping control of the execution • requires modifying original instructions… • …without modifying original behaviour 34c3 - Implementing an LLVM based DBI framework 23

  24. What We Need • A multi-architecture disassembler • A multi-architecture assembler • A generic intermediate representation to apply modifications on 34c3 - Implementing an LLVM based DBI framework 24

  25. We Don't Want Actually we don’t have 10 years and unlimited ressources • To implement a multi-architecture disassembler and assembler • To abstract every single instruction semantic • Architectures Developer Manuals are not that fun… 34c3 - Implementing an LLVM based DBI framework 25

  26. Here Be Dragons This has nothing to do with 26C3 34c3 - Implementing an LLVM based DBI framework 26

  27. To the rescue • LLVM already has everything • It supports all major architectures • It provides a disassembler and an assembler … • …and both work on the same intermediate representation • LLVM Machine Code (aka MC) to the rescue 34c3 - Implementing an LLVM based DBI framework 27

  28. LLVM MC movq rax, 42 Instruction Binary [0x48,0x89,0x04,0x25,0x2a,0x00,0x00,0x00] <MCInst #1670 MOV64mr LLVM MC <MCOperand Reg:0> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand Imm:42> <MCOperand Reg:0> <MCOperand Reg:35>> 34c3 - Implementing an LLVM based DBI framework 28

  29. LLVM MC • It’s minimalist • It’s totally generic • still encodes a lot of things about an instruction • But very raw • genericness means some heavy compromises • doesn’t encode everything about an instruction 34c3 - Implementing an LLVM based DBI framework 29

  30. Creation Every instruction is encoded using the same representation … … but in a di ff erent way <MCInst #1139 MOV64mr <MCOperand Reg:41> <MCOperand Imm:1> movq [rip+0x2600], rax <MCOperand Reg:0> <MCOperand Imm:0x2600> <MCOperand Reg:0> <MCOperand Reg:35>> 34c3 - Implementing an LLVM based DBI framework 30

  31. Modification jmp 0x41424242 jmp [rip+0x2600] <MCInst #1139 JMP64m <MCOperand Reg:41> <MCInst #1141 JMP_1 <MCOperand Imm:1> <MCOperand Imm: 0x41424242>> <MCOperand Reg:0> <MCOperand Imm:0x2600> <MCOperand Reg:0>> 34c3 - Implementing an LLVM based DBI framework 31

  32. Patch 0x410000: mov r0, [r0+pc] ; Load a value relative to PC 34c3 - Implementing an LLVM based DBI framework 32

  33. Patch mov [pc+0x2600], r1 ; Backup R1 mov r1, 0x410000 ; Set original instruction address 0x7f10000: mov r0, [r0+r1] ; Load a value relative to R1 mov r1, [pc+0x2600] ; Restore R1 34c3 - Implementing an LLVM based DBI framework 33

  34. Abstractions • MCInst encoding make transformations painful • Patches can be really complex • Many transformations are composed of generic steps we need abstractions 34c3 - Implementing an LLVM based DBI framework 34

  35. Patch Engine MCInst Patch MCInst MCInst Engine MCInst Abstractions Inside™ 34c3 - Implementing an LLVM based DBI framework 35

  36. 36

  37. Patch DSL Abstractions you said? • Identify transformation steps required to patch instructions • Regroup and integrate them as a domain-specific language • Instructions are architecture specifics… • …DSL should be generic (as much as possible) 34c3 - Implementing an LLVM based DBI framework 37

  38. Patch DSL Program QBDI Registry Copy Reg T emp Load/Save e Get/Set t i r W Memory Shadows, Context Metadata 34c3 - Implementing an LLVM based DBI framework 38

  39. Patch DSL mov [pc+0x2600], r1 mov r1, 0x410000 Temp(0) […] mov r1, [pc+0x2600] 34c3 - Implementing an LLVM based DBI framework 39

  40. Patch DSL mov [pc+0x2600], r1 mov r1, 0x410000 mov r0, [r0+r1] mov r1, [pc+0x2600] SubstituteWithTemp(Reg(REG_PC), Temp(0)) 34c3 - Implementing an LLVM based DBI framework 40

  41. Patch DSL • Modifications are defined in rules • A rule is composed of • one (or several) condition(s) • one (or several) action(s) • Actions can modify or replace an instruction 34c3 - Implementing an LLVM based DBI framework 41

  42. Patch DSL /* Rule #3: Generic RIP patching. * Target: Any instruction with RIP as operand, e.g. LEA RAX, [RIP + 1] * Patch: Temp(0) := rip * LEA RAX, [RIP + IMM] --> LEA RAX, [Temp(0) + IMM] */ PatchRule( UseReg(Reg(REG_PC)), { GetPCO ff set(Temp(0), Constant(0)), ModifyInstruction({ SubstituteWithTemp(Reg(REG_PC), Temp(0)) }) } ); 34c3 - Implementing an LLVM based DBI framework 42

Recommend


More recommend