12 21 2016
play

12/21/2016 Machine-Level Representations Prior lectures Data - PDF document

12/21/2016 Machine-Level Representations Prior lectures Data representation x86 Data Access and This lecture Operations Program representation Encoding is architecture dependent We will focus on the Intel x86-64 or x64


  1. 12/21/2016 Machine-Level Representations Prior lectures  Data representation x86 Data Access and This lecture Operations  Program representation  Encoding is architecture dependent  We will focus on the Intel x86-64 or x64 architecture  Prior edition used IA32 – 2 – Intel x86 2015 Core i7 Broadwell Evolutionary design starting in 1978 with 8086  i386 in 1986: First 32-bit Intel CPU (IA32)  Pentium4E in 2004: First 64-bit Intel CPU (x86-64)  Adopted from AMD Opteron (2003)  Core 2 in 2006: First multi-core Intel CPU  New features and instructions added over time  Vector operations for multimedia  Memory protection for security  Conditional data movement instructions for performance  Expanded address space for scaling  But, many obsolete features Complex Instruction Set Computer (CISC)  Many different instructions with many different formats  But we’ll only look at a small subset – 3 – – 4 – 1

  2. 12/21/2016 How do you program it? Assemblers Assign mnemonics to machine code Initially, no compilers or assemblers  Assembly language for specifying machine instructions Machine code generated by hand!  Names for the machine instructions and registers  Error-prone  movq %rax, %rcx  Time-consuming  There is no standard for x86 assemblers  Intel assembly language  Hard to read and write  AT&T Unix assembler  Hard to debug  Microsoft assembler  GNU uses Unix style with its assembler gas Even with the advent of compilers, assembly still used  Early compilers made big, slow code  Operating Systems were written mostly in assembly, into the 1980s  Accessing new hardware features before compiler has a chance to incorporate them – 5 – – 6 – Assembly Programmer’s View Then, via C CPU Memory Addresses Registers Object Code void sumstore(long x, long y, long *D) RIP Data Program Data { (PC) OS Data long t = plus(x, y); Condition Instructions *D = t; Codes } Stack Visible State to Assembly Program  RIP  Instruction Pointer or Program Counter  Address of next instruction sumstore:  Register File pushq %rbx Memory  Heavily used program data movq %rdx, %rbx  Byte addressable array  Condition Codes call plus  Code, user data, OS data  Store status information about most recent movq %rax, (%rbx) arithmetic or logical operation  Includes stack used to popq %rbx  Used for conditional branching support procedures ret – 7 – – 8 – 2

  3. 12/21/2016 Registers 64-bit memory map 48-bit canonical addresses to make page-tables smaller Special memory not part of main memory  Located on CPU Kernel addresses have high-bit set  Used to store temporary values 0x7ffe96110000 user stack  Typically, data is loaded into registers, manipulated or used, (created at runtime) %esp (stack pointer) and then written back to memory 0xffffffffffffffff memory mapped region for reserved for kernel shared libraries 0x7f81bb0b5000 (code, data, heap, stack) 0xffff800000000000 brk run-time heap (managed by malloc) memory invisible to read/write segment user code (.data, .bss) loaded from the read-only segment executable file (.init, .text, .rodata) 0x00400000 unused 0 cat /proc/self/maps – 9 – – 10 – 64-bit registers x86-64 Integer Registers %rax %r8 Multiple access sizes %rax, %rbx, %rcx, %rdx %eax %r8d %ah, %al : low order bytes (8 bits) %rbx %r9 %ebx %r9d %ax : low word (16 bits) %eax : low “double word” (32 bits) %rcx %r10 %ecx %r10d %rax : quad word (64 bits) %rdx %r11 %edx %r11d 63 31 15 7 0 %rsi %r12 %ax %esi %r12d %rax %eax %ah %al %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d Similar access for %rdi, %rsi, %rbp, %rsp %rbp %r15 %ebp %r15d – 11 – Format different since registers added with x86-64 – 12 – 3

  4. 12/21/2016 64-bit registers Register evolution Multiple access sizes %r8, %r9, … , %r15 The x86 architecture initially “register poor” %r8b : low order byte (8 bits)  Few general purpose registers (8 in IA32) %r8w : low word (16 bits)  Initially, driven by the fact that transistors were expensive %r8d : low “double word” (32 bits)  Then, driven by the need for backwards compatibility for certain instructions pusha (push all) and popa (pop all) from 80186 %r8 : quad word (64 bits)  Other reasons 63 31 15 7 0  Makes context-switching amongst processes easy (less %r8w register-state to store) %r8 %r8d %r8b  Add fast caches instead of more registers (L1, L2, L3 etc.) – 13 – – 14 – Instruction types C types and x86-64 instructions A typical instruction acts on 2 or more operands of a particular width C Data Type Intel x86-64 type GAS suffix x86-64  addq %rcx, %rdx adds the contents of rcx to rdx char byte b 1  “ addq ” stands for add “quad word” short word w 2  Size of the operand denoted in instruction int double word l 4  Why “quad word” for 64-bit registers? long  Baggage from 16-bit processors quad word q 8 float Now we have these crazy terms single precision s 4  8 bits = byte = addb double double precision l 8  16 bits = word = addw extended long double t 10/16 precision  32 bits = double or long word = addl pointer  64 bits = quad word = addq quad word q 8 – 15 – – 16 – 4

  5. 12/21/2016 Operand examples using movq Instruction operands %rax Example instruction Source Destination C Analog %rcx movq Source , Dest %rdx movq $0x4,%rax temp = 0x4; Reg %rbx Three operand types Imm movq $-147,(%rax) *p = -147; %rsi Mem  Immediate %rdi  Constant integer data Reg movq %rax,%rdx temp2 = temp1;  Like C constant, but preceded by $ %rsp movq Reg  e.g., $0x400, $-533 Mem movq %rax,(%rdx) *p = temp; %rbp  Encoded directly into instructions %rN  Register: One of 16 integer registers Mem Reg movq (%rax),%rdx temp = *p;  Example: %rax, %r13  Note %rsp reserved for special use  Memory: a memory address  Memory-memory transfers cannot be done with single  There are many modes for addressing memory instruction  Simplest example: (%rax) – 17 – – 18 – Immediate mode Register mode Immediate has only one mode Register has only one mode  Form: $Imm Form: E a   Operand value: Imm Operand value: R[E a ]   movq $0x8000,%rax  movq %rcx,%rax  movq $array,%rax int array[30]; /* array = global var. stored at 0x8000 */ Main memory 0x8000 %rax Main memory %rcx 0x0030 0x8000 %rdx %rax %rcx 0x8000 array %rdx – 19 – – 20 – 5

  6. 12/21/2016 Memory modes Memory modes Memory has multiple modes Memory mode: Absolute  Absolute Form: Imm   specify the address of the data Operand value: M[Imm]   movq 0x8000,%rax  Indirect  movq array,%rax  use register to calculate address long array[30]; /* global variable at 0x8000 */  Base + displacement  use register plus absolute address to calculate address Main memory  Indexed  Indexed %rax » Add contents of an index register  Scaled index %rcx 0x8000 array » Add contents of an index register scaled by a constant %rdx – 21 – – 22 – Memory modes Memory modes Memory mode: Indirect Memory mode: Base + Displacement Form: (E a ) Form: Imm(E b )   Operand value: M[R[E a ]] Operand value: M[Imm+R[E b ]]    Register E a specifies the memory address  Register E b specifies start of memory region  movq (%rcx),%rax  Imm specifies the offset/displacement movq 16(%rcx),%rax  Main memory Main memory 0x8018 %rax %rax 0x8010 0x8008 %rcx 0x8000 %rcx 0x8000 0x8000 0x8000 %rdx %rdx – 23 – – 24 – 6

  7. 12/21/2016 Memory modes Addressing Mode Examples Memory mode: Scaled indexed Add the double word at address addl 12(%rbp),%ecx Most general format  rbp + 12 to ecx Used for accessing structures and arrays in memory  movb (%rax,%rcx),%dl Load the byte at address Form: Imm(E b ,E i ,S)  rax + rcx into dl Operand value: M[Imm+R[E b ]+S*R[E i ]]   Register E b specifies start of memory region Subtract rdx from the quad word subq %rdx,(%rcx,%rax,8)  E i holds index at address rcx+(8*rax)  S is integer scale (1,2,4,8) Increment the word at address  movq 8(%rdx,%rcx,8),%rax incw 0xA(,%rcx,8) Main memory 0xA+(8*rcx) 0x8028 0x8020 0x8018 Also note: We do not put ‘$’ in front of constants when they are addressing %rax 0x8010 indexes, only when they are literals 0x8008 %rcx 0x03 0x8000 %rdx 0x8000 – 25 – – 26 – Carnegie Mellon Address computation examples Practice Problem 3.1 %rdx 0xf000 Register Value Operand Value %rax %rax 0x100 0x100 %rcx 0x0100 %rcx 0x1 0x108 0xAB %rdx 0x3 $0x108 0x108 (%rax) 0xFF Address Value 8(%rax) 0xAB 0x100 0xFF 13(%rax, %rdx) 0x13 0x108 0xAB Expression Address Computation Address 260(%rcx, %rdx) 0xAB 0x110 0x13 0xf000 + 0x8 0xf008 0x8(%rdx) 0xF8(, %rcx, 8) 0xFF 0x118 0x11 (%rax, %rdx, 8) 0x11 (%rdx,%rcx) 0xf000 + 0x100 0xf100 (%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400 2*0xf000 + 0x80 0x1e080 0x80(,%rdx,2) – 27 – – 28 – 7

Recommend


More recommend