Road map Midterm a’comin Friday in class Exams page on web site has info + practice problems Today’s lecture A first look at assembly Where is our data stored? The mov instruction and addressing modes
It’s bits all the way down… Data representation so far Integer (unsigned, 2’s complement signed) Char (ASCII) Address (unsigned long) Float/double (IEEE floating point) Aggregates (arrays, structs) The code itself is binary too! Instructions (machine encoding)
Compiling code, what happens? simple.c int find_max(int arr[], size_t n) { int max = arr[0]; for (size_t i = 1; i < n; i++) if (arr[i] > max) max = arr[i]; return max; Source file (in text form) } Compiler parses input ... validates language rules, myth> make generates assembly instructions gcc simple.c -o simple writes object file (in binary form) ^ELF^B^A^A^@^@^@^@^@^@^@^@^@^B^@ >^@^A^@^@^@\300^D@^@^@^@^@^@^@^@ ^@^@^@^@^@^@\370\225^@^@^@^@^@^@ ^@^@^@^@^@^@8^@^@^@^@&^@#^@^F^@^ @^@^E^@^@^@^@^@^@^@^@^@^@^@^@^@^ ... simple
What’s in an object file? objdump -d simple Name of function, memory address of code 00000000004005b6 <find_max>: (function pointer) 4005b6: 8b 07 mov (%rdi),%eax 4005b8: ba 01 00 00 00 mov $0x1,%edx 4005bd: eb 0d jmp 4005cc <find_max+0x16> 4005bf: 8b 0c 97 mov (%rdi,%rdx,4),%ecx Sequential 4005c2: 39 c8 cmp %ecx,%eax instructions are at sequential addresses 4005c4: 7d 02 jge 4005c8 <find_max+0x12> 4005c6: 89 c8 mov %ecx,%eax 4005c8: 48 83 c2 01 add $0x1,%rdx 4005cc: 48 39 f2 cmp %rsi,%rdx 4005cf: 72 ee jb 4005bf <find_max+0x9> 4005d1: f3 c3 repz retq machine code each machine instruction decoded each instruction into human-readable encoded in binary assembly
What is an assembly instruction? %eax 4005c6: 89 c8 mov %ecx,%eax is register name , 4005c8: 48 83 c2 01 add $0x1,%rdx (storage location on CPU) 4005cc: 48 39 f2 cmp %rsi,%rdx 4005cf: 72 ee jb 4005bf <find_max+0x9> opcode operands $0x1 (instruction (arguments to instruction) is constant value name/type) ("immediate") 4005bf is direct address
Computer anatomy Where is my data?? memory , accessed by address registers , accessed by name program stored on disk/server B&O Figure 1.4
Instruction set architecture The ISA defines Application Operations that the processor can execute Program Data transfer operations, how to access data Control mechanisms like branch, jump (think loops and if-else) Compiler OS Contract between programmer/compiler and hardware Layer of abstraction ISA Above: programmer/compiler emits instructions as allowed in ISA CPU Below: hardware implements what is described in ISA Design ISAs have incredible inertia! Circuit Design Legacy support is a huge issue for x86-64 CISC vs RISC Chip Layout (CISC, x86) Large set of specialized/expressive instructions, slower frequency (RISC, ARM) Small set of simple instructions, higher frequency Pres. Hennessy Turing Award!
Assembly characteristics Data "integer" data, 1/2/4/8 bytes Char, int, long, pointer, signed/unsigned Floating point data, 4/8/(10) bytes Special-purpose registers and instructions No aggregates Arrays and structs are just contiguously located bytes in memory No names, no types Refer to data by where stored (register/memory), size in bytes Operations Perform arithmetic/logical ops on register or memory data Transfer data between memory and register Load/store Control flow Unconditional jump to/from other functions Conditional branch
The almighty mov instruction Programs manipulate data Where is that data stored? registers, memory (also: disk, server, network, …) mov instruction is the assembly equivalent of assignment Most common instruction of all Key insight: no access to variables by name/type High-level language had descriptive names, type information Assembly accesses variable by identifying where it is stored (register/memory) General form: mov x src, dst Copy bytes from one place to another Source can be memory, registers, constants Destination can be memory, registers
Mov operands: imm/reg Op Src Dst Comments src is immediate movl $0, %eax movb $0x41, %al Virtual sub-register Register to register mov %rax, %rdx movx suffix is how many bytes to move b for byte (1), w for word (2), l for long (4), q for quad (8) (suffixes show legacy…) Elided if can be inferred from operands
Mov operands: direct/indirect Op Src Dst Comments movl $0, 0x605428 Store, direct address (Note no prefix on address literal) movl $0, (%rsp) Store, indirect address (address in register, dereference) Load movl 0x605428, %edx movl (%rsp), %edx Load Load = read from memory location Store = write to memory location Direct: Data at fixed location No mem-to-mem transfer Indirect: Register holds pointer Either src or dst is memory, not both
Addressing modes Op Src Dst Comments movl $0, 0x605428 Direct address movl $0, (%rsp) Indirect address movl $0, 20(%rsp) Indirect with displacement Target address = base + displacement Displacement Base is any constant (negative or positive)
Addressing modes Op Src Dst Comments movl $0, 0x605428 Direct address movl $0, (%rsp) Indirect address movl $0, 20(%rsp) Indirect with displacement movl $0, 20(%rsp, %rax, 4) Indirect with scaled-index Target address = Index Base register base + displacement + index*scale register Displacement if missing, = 0 is any constant (negative or positive) Scale If missing, =0 must be 1, 2, 4, or 8 if missing, =1
Load e ff ective address lea = "load effective address" Basically a mov without the dereference Used for address calculation, e.g. &arr[x] Also arithmetic expressions of form x + ky (faster than sequenced mul/add) where k = 1, 2, 4, 8 Examples leal (%rax, %rsi, 4), %rax Computes base + scaled-index, e.g address of array elem leal 7(%rdx, %rdx, 4), %rdx Computes x = 5x + 7 (assuming x stored in %rdx)
Recommend
More recommend