Turning C into Machine Code CSAPP book is very useful and well-aligned with class for the remainder of the course. C Code void sumstore(long x, long y, long *dest) { long t = x + y; C to Machine Code and x86 Basics Generated x86 Assembly Code *dest = t; } Human-readable language close to machine code. sum.c sum: addq %rdi,%rsi ISA context and x86 history movq %rsi,(%rdx) Translation tools: C --> assembly <--> machine code compiler (CS 301) retq sum.s assembler gcc -Og -S sum.c x86 Basics: Registers Object Code Data movement instructions linker 01010101100010011110010110 Memory addressing modes 00101101000101000011000000 Arithmetic instructions Executable: sum 00110100010100001000100010 01111011000101110111000011 Resolve references between object files, sum.o libraries, (re)locate data 2 3 01010101100010011110010110 sub-registers 00101101000101000011000000 Disassembling Object Code x86-64 registers 00110100010100001000100010 1985: 32-bit e xtended register %eax 01111011000101110111000011 1978: 16-bit register %ax ... %rax Return Value Disassembled by objdump -d sum %rbx %ah %al 0000000000400536 <sumstore>: %rax %eax %rcx Argument 4 %ax Disassembler 400536: 48 01 fe add %rdi,%rsi %rdx Argument 3 h igh and l ow bytes 400539: 48 89 32 mov %rsi,(%rdx) 40053c: c3 retq of %ax %rsi Argument 2 %rdi Argument 1 %rsi %esi %si %rsp Special Purpose: Stack Pointer %rbp Low 32 bits of %rsi Low 16 bits of %rsi Object Disassembled by GDB %r8 Argument 5 0x00400536: 0x0000000000400536 <+0>: add %rdi,%rsi historical artifacts %r9 Argument 6 0x48 0x0000000000400539 <+3>: mov %rsi,(%rdx) %r10 0x01 0x000000000040053c <+6>: retq 0xfe %r11 $ gdb sum %r8 %r8d 0x48 %r12 (gdb) disassemble sumstore 0x89 32-bit sub-register to match %r13 0x32 (disassemble function) %r14 0xc3 (gdb) x/7b sum %r15 (examine the 13 bytes starting at sum ) 64-bits / 8 bytes Some have special uses for particular instructions 5
x86: Three Basic Kinds of Instructions Data movement instructions 1. Data movement between memory and register mov _ Source , Dest Load data from memory into register data size _ is one of { b, w, l, q } %reg ß Mem[ address ] movq : move 8-byte “quad word” Memory is an Store register data into memory movl : move 4-byte “long word” array[] of bytes! Mem[ address ] ß %reg movw : move 2-byte “word” Historical terms based on the 16-bit days, movb : move 1-byte “byte” not the current machine word size (64 bits) 2. Arithmetic/logic on register or memory data Source/Dest operand types: c = a + b; z = x << y; i = h & g; Immediate: Literal integer data Examples: $0x400 $-533 Register: One of 16 registers 3. Comparisons and Control flow to choose next instruction Examples: %rax %rdx Unconditional jumps to/from procedures Memory: consecutive bytes in memory, at address held by register Conditional branches Direct addressing: (%rax) With displacement/offset: 8(%rsp) 12 13 Memory Addressing Modes Pointers and Memory Addressing Indirect (R) Mem[Reg[R]] void swap(long* xp, long* yp){ swap: long t0 = *xp; movq (%rdi),%rax Register R specifies memory address: movq (%rcx),%rax long t1 = *yp; movq (%rsi),%rdx *xp = t1; movq %rdx,(%rdi) Displacement D(R) Mem[Reg[R]+D] *yp = t0; movq %rax,(%rsi) } retq Register R specifies base memory address (e.g. base of an object) Displacement D specifies literal offset (e.g. a field in the object) Registers Memory movq %rdx,8(%rsp) Address Register Variable %rdi 0x120 0x120 %rdi � xp General Form: D(Rb,Ri,S) Mem[Reg[ Rb ] + S *Reg[ Ri ] + D ] 0x118 %rsi 0x108 %rsi � yp D: Literal “displacement” value represented in 1, 2, or 4 bytes 0x110 %rax %rax � t0 Rb: Base register: Any register 0x108 � %rdx t1 %rdx Ri: Index register: Any except %rsp 0x100 S: Scale: 1, 2, 4, or 8 15 16
Compute address given by ex Load effective address Address Computation Examples this addressing mode expression and store it here . General Addressing Modes leaq Src , Dest !!! D(Rb,Ri,S) Mem[Reg[ Rb ]+ S *Reg[ Ri ] + D ] DOES NOT ACCESS MEMORY Register contents Special Cases: Implicitly: %rdx 0xf000 (Rb,Ri) Mem[Reg[ Rb ]+Reg[ Ri ]] (S=1,D=0) Uses: "address of" "Lovely Efficient Arithmetic" %rcx 0x100 D(Rb,Ri) Mem[Reg[ Rb ]+Reg[ Ri ]+ D ] (S=1) x + k*I, where k = 1, 2, 4, or 8 p = &x[i]; (Rb,Ri,S) Mem[Reg[ Rb ]+ S *Reg[ Ri ]] (D=0) leaq vs. movq Address Expression Address Computation Address 0x8(%rdx) Registers Memory Assembly Code Address %rax 0x120 leaq (%rdx,%rcx,4), %rax 0x400 (%rdx,%rcx) movq (%rdx,%rcx,4), %rbx %rbx 0xf 0x118 leaq (%rdx), %rdi (%rdx,%rcx,4) %rcx 0x4 0x8 0x110 movq (%rdx), %rsi %rdx 0x100 0x80(,%rdx,2) 0x10 0x108 %rdi 0x1 0x100 %rsi 17 18 Call Stack Call Stack: Push, Pop Stack “Bottom” Stack “Bottom” higher addresses pushq Src Memory region for temporary storage managed with stack discipline. 1. Fetch value from Src higher 2. Decrement %rsp by 8 (why 8?) Stack Pointer: %rsp addresses -8 %rsp holds lowest stack address 3. Store value at new address (address of "top" element) given by %rsp lower addresses Stack “Top” stack grows Stack “Bottom” popq Dest toward higher addresses lower addresses 1. Load value from address %rsp Stack Pointer: %rsp 2. Write value to Dest 3. Stack Pointer: %rsp Increment %rsp by 8 +8 Stack “Top” Those bits are still there; lower addresses Stack “Top” we’re just not using them. 20 21
Procedure Preview (more soon) Arithmetic Operations Two-operand instructions: call, ret, push, pop Format Computation Procedure arguments passed in 6 registers: … Src , Dest Dest = Dest + Src addq Argument 5 %rax Return Value %r8 Caller Src , Dest Dest = Dest – Src argument order subq Argument 6 %rbx %r9 Frame imulq Src,Dest Dest = Dest * Src %rcx Argument 4 %r10 Extra Arguments Argument 3 %rdx %r11 Src,Dest Dest = Dest << Src a.k.a salq shlq Argument 2 to callee %rsi %r12 Src,Dest Dest = Dest >> Src Arithmetic sarq Argument 1 %rdi %r13 Return Address %rsp Stack pointer %r14 Src,Dest Dest = Dest >> Src Logical shrq %rbp %r15 Src,Dest Dest = Dest ^ Src xorq Return value in %rax . Src,Dest Dest = Dest & Src andq Src,Dest Dest = Dest | Src orq Saved Registers Callee One-operand (unary) instructions Allocate/push new stack frame for each + Frame incq Dest Dest = Dest + 1 increment procedure call. Local Variables Stack pointer decq Dest Dest = Dest – 1 decrement Some local variables, %rsp negq Dest Dest = - Dest negate saved register values, extra arguments notq Dest Dest = ~ Dest bitwise complement Deallocate/pop frame before return. See CSAPP 3.5.5 for: mulq , cqto , idivq , divq 22 23 Another example leaq for arithmetic Register Use(s) Register Use(s) long arith(long x, long y, long z){ Argument x Argument x long logical(long x, long y){ %rdi %rdi long t1 = x+y; long t1 = x^y; Argument y Argument y %rsi %rsi long t2 = z+t1; long t2 = t1 >> 17; long t3 = x+4; Argument z %rdx %rax long mask = (1<<13) - 7; long t4 = y * 48; long rval = t2 & mask; %rax long t5 = t3 + t4; return rval; %rcx long rval = t2 * t5; } return rval; } § Instructions in different arith: order from C code leaq (%rdi,%rsi), %rax logical: addq %rdx, %rax § Some expressions require movq %rdi,%rax leaq (%rsi,%rsi,2), %rdx xorq %rsi,%rax multiple instructions salq $4, %rdx sarq $17,%rax § Some instructions cover leaq 4(%rdi,%rdx), %rcx andq $8185,%rax multiple expressions imulq %rcx, %rax retq § S ame x86 code by compiling: ret (x+y+z)*(x+4+48*y) 24 25
Recommend
More recommend