Binary‐level program analysis: Assembly basics Gang Tan CSE 597 Spring 2019 Penn State University 1
Source code, Assembly code, Object code, and Executable Code Source Object Assembly Compiler Assembler code code code • Then a linker links object code of different compilation units (files, libraries) into executable code • Assembly code – Consist of assembly instructions – Specific for a particular architecture (x86, x64, ARM, SPARC, etc.) • Object code – Consist of encodings of assembly instructions in bytes • Executable code – AKA machine code – In a particular file format (e.g., ELF or PE) 2
Example Source Code: hello.c #include <stdio.h> int main () { printf("Hello, World!"); return 0; } 3
Example Assembly Code: After “gcc ‐S ‐o hello.s hello.c” .file "hello.c" .cfi_def_cfa_register 6 .section .rodata movl $.LC0, %eax .LC0 : movq %rax, %rdi .string "Hello, World!" movl $0, %eax .text call printf .globl main movl $0, %eax .type main, @function leave main : .cfi_def_cfa 7, 8 .LFB0 : ret .cfi_startproc .cfi_endproc pushq %rbp .cfi_def_cfa_offset 16 .LFE0 : .cfi_offset 6, ‐16 .size main, .‐main movq %rsp, %rbp .ident "GCC: (GNU) 4.4.7 20120313 (Red Hat 4.4.7‐23)" .section .note.GNU‐stack,"",@progbits 4
Example executable Code: After “gcc –o hello hello.c” Do “objdump ‐s ./hello” 5
Binary Code Analysis • Refer to analyzing assembly or executable code • If given executable code – Step 1: disassemble it to assembly code – Step 2: analyze the assembly code • The disassembly step may be hard or easy – Depending on whether meta information is embedded into executable code 6
Meta information in Executable Code • During compilation, meta information can be embedded into executable code • Meta information: symbol tables – Information about symbols (e.g., function and variable names) from source code – Each entry • The symbol name • The binding address • Type of the symbol • Misc. info – Symbol tables consumed by linkers and debuggers 7
objdump ‐‐sym ./hello 8
Meta information in Executable Code • Relocation information – Before linking, memory addresses of functions and global data are unknown – Compilers generate relocation entries – Static/dynamic linkers patch the program during linking 9
Meta information in Executable Code • Debugging information – Generated by the compiler and consumed by debuggers (e.g., gdb) – During debugging, the debugger uses debugging info to relate binary code to source code • E.g., this instruction is generated code from this source code line – Include • Source code info: types and scopes of identifiers • Line‐number info: to relate binary to source code • Other info such as location description – Debugging info formats: DWARF and STABS 10
Stripped versus unstripped binaries • Stripped binaries – Pure binary code; no meta information – Disassembly is hard (do not even know where functions start) • Unstripped binaries – Binary code plus meta information – Disassembly is easy • Why stripped binaries? – Meta information occupies space – Stripped binaries are harder to reverse engineer, making it easier to protect intellectual property 11
Next: IA32 and Reverse Engineering basics • NSA tutorial on reverse engineering – https://codebreaker.ltsnet.net/resources – Introduction to x86 Assembly – Reverse Engineering Machine Code Pt. 1 – Reverse Engineering Machine Code Pt. 2 12
Recommend
More recommend