Binary‐level program analysis: A discussion of x86‐64 Gang Tan CSE 597 Spring 2019 Penn State University * These slides follow Sec 3.13 of the book CSAPP “Computer Systems: A Programmer’s Perspective”; Figures and slides are borrowed/adapted from that book 2
Intel’s 64‐Bit History • 2001: Intel Attempts Radical Shift from IA32 to IA64 – Totally different architecture (Itanium) – Executes IA32 code only as legacy – Performance disappointing • 2003: AMD Steps in with Evolutionary Solution – x86‐64 (now called “AMD64”) • Intel Felt Obligated to Focus on IA64 – Hard to admit mistake or that AMD is better • 2004: Intel Announces EM64T extension to IA32 – Extended Memory 64‐bit Technology – Almost identical to x86‐64! • All but low‐end x86 processors support x86‐64 – But, lots of code still runs in 32‐bit mode 3
Overview of x86‐64 • Pointers and long integers are 64 bits long – Integer arithmetic operations support 8, 16, 32, and 64 bits • 16 general‐purpose registers; each 64‐bit long • Calling conventions pass more parameters via registers – System V AMD64 ABI: passes the first 6 parameters in registers – As a result, some procedures do not need to access the stack at all. • Conditional operations are implemented using conditional move instructions when possible – Better performance than using branches • Floating‐point operations are implemented using the register‐oriented instruction set in SSE version 2 – Rather than the stack‐based approach in IA32 4
x86‐64 Data Types Fig 3.34 of CSAPP 5
16 64‐bit GP Registers Fig 3.35 of CSAPP 6
Instruction Operands • Similar to IA32 – Except that the base and index registers must use the r‐version of registers • In addition, PC‐relative addressing – “add rax, 0x200ad1[rip]” accesses mem at address rip+0x200ad1 7
Function Calling: Argument Passing • The following slides assume the System V AMD64 ABI • Arguments (up to the first six) are passed to procedures via registers – This reduces the overhead of storing and retrieving values on the stack • callq stores a 64‐bit return address on the stack. 8
Example of Argument Passing long myfunc( long a, long b, long c, long d, long e, long f, long g, long h) { long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20; } 9 * Example from https://eli.thegreenplace.net/2011/09/06/stack‐frame‐layout‐on‐x86‐64/
Function Calling: Stack Frame • A function may not require a stack frame, if – all local variables can be held in registers, and – no array/structure local variables, and – no address‐of operator (&) is used on local variables, and – It does not call another function that requires argument passing on the stack, and – It does not need to save some callee‐save regs 10
Function Calling: Red‐Zone Optimization • Red‐zone optimization for leaf functions (functions that do not call other funs) – 128 bytes below rsp can be used by a leaf function without stack allocation – Red‐zone will not be asynchronously clobbered by signals or interrupt handlers, and thus can use it for scratch data 11
Function Calling: the Base Pointer Optimization • Two options for functions that need a stack frame • Option 1: the traditional approach (default for gcc without optimizations) – Function prologue: save the base pointer; create the new base pointer – Function body: References to stack location are made relative to the base pointer – Function epilogue: restore the base pointer • Option 2: faster (default for gcc with optimizations) – Do not save/restore the base pointer; rbp used as a GP register – References to stack locations are made relative to the stack pointer – Stack allocation at the beginning; rsp remains at a fixed position during a call 12
Example C source code long int simple_l (long int *xp, long int y) { long int t = *xp + y; *xp = t; return t; } 13
Example Optimized x86‐32 Assembly simple_l: pushl %ebp ; Save frame pointer movl %esp, %ebp ; New frame pointer movl 8(%ebp), %edx ; Retrieve xp movl 12(%ebp), %eax ; Retrieve yp addl (%edx), %eax ; Add *xp to get t movl %eax, (%edx) ; Store t at xp popl %ebp ; Restore frame pointer ret 14
Example Unoptimized Optimized x86‐64 Assembly x86‐64 Assembly simple_l: simple_l: movq %rsi, %rax ; Copy y pushq %rbp addq (%rdi), %rax ; Add *xp to get t movq %rsp, %rbp movq %rax, (%rdi) ; Store t at xp movq %rdi, ‐24(%rbp) ret movq %rsi, ‐32(%rbp) movq ‐24(%rbp), %rax movq (%rax), %rax addq ‐32(%rbp), %rax movq %rax, ‐8(%rbp) movq ‐24(%rbp), %rax movq ‐8(%rbp), %rdx movq %rdx, (%rax) movq ‐8(%rbp), %rax leave ret 15
Function Calling: Caller/Callee‐Save Registers • Callee‐saved regs: rbx, rbp, and r12 to r15 • Caller‐saved regs: r10 and r11 16
x86‐64 Assembly Code Example C source code Optimized x86‐64 Assembly long plus(long x, long y); sumstore: pushq %rbx movq %rdx, %rbx void sumstore(long x, long y, long *dest) call plus movq %rax, (%rbx) { long t = plus(x, y); popq %rbx ret *dest = t; }
Recommend
More recommend