Lecture 2: Processor Design, Single-Processor Performance G63.2011.002/G22.2945.001 · September 14, 2010 Intro Basics Assembly Memory Pipelines
Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro Basics Assembly Memory Pipelines
Admin Bits • Lec. 1 slides posted • New here? Welcome! Please send in survey info (see lec. 1 slides) via email • PASI • Please subscribe to mailing list • Near end of class: 5-min, 3-question ‘concept check’ Intro Basics Assembly Memory Pipelines
Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro Basics Assembly Memory Pipelines
Introduction Goal for Today High Performance Computing : Discuss the actual computer end of this. . . . . . and its influence on performance Intro Basics Assembly Memory Pipelines
What’s in a computer? Intro Basics Assembly Memory Pipelines
What’s in a computer? Processor Intel Q6600 Core2 Quad, 2.4 GHz Intro Basics Assembly Memory Pipelines
What’s in a computer? Die Processor (2 × ) 143 mm 2 , 2 × 2 cores Intel Q6600 Core2 Quad, 2.4 GHz 582,000,000 transistors ∼ 100W Intro Basics Assembly Memory Pipelines
What’s in a computer? Intro Basics Assembly Memory Pipelines
What’s in a computer? Memory Intro Basics Assembly Memory Pipelines
Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro Basics Assembly Memory Pipelines
A Basic Processor Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit (loosely based on Intel 8086) Intro Basics Assembly Memory Pipelines
A Basic Processor Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. Bonus Question: PC fetch Data ALU What’s a bus? Control Unit (loosely based on Intel 8086) Intro Basics Assembly Memory Pipelines
How all of this fits together Everything synchronizes to the Clock . Control Unit (“CU”): The brains of the Memory Interface operation. Everything connects to it. Address ALU Address Bus Data Bus Bus entries/exits are gated and Register File Flags (potentially) buffered . Internal Bus CU controls gates, tells other units Insn. fetch PC Data ALU Control Unit about ‘what’ and ‘how’: • What operation? • Which register? • Which addressing mode? Intro Basics Assembly Memory Pipelines
What is. . . an ALU? A rithmetic L ogic U nit One or two operands A, B Operation selector (Op): • (Integer) Addition, Subtraction A B • (Logical) And, Or, Not • (Bitwise) Shifts (equivalent to multiplication by power of two) Op • (Integer) Multiplication, Division Specialized ALUs: R • Floating Point Unit (FPU) • Address ALU Operates on binary representations of numbers. Negative numbers represented by two’s complement. Intro Basics Assembly Memory Pipelines
What is. . . a Register File? Registers are On-Chip Memory %r0 • Directly usable as operands in %r1 Machine Language %r2 • Often “general-purpose” %r3 %r4 • Sometimes special-purpose: Floating point, Indexing, Accumulator %r5 %r6 • Small: x86 64: 16 × 64 bit GPRs %r7 • Very fast (near-zero latency) Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Intro Basics Assembly Memory Pipelines
How does computer memory work? One (reading) memory transaction (simplified): D0..15 Memory Processor A0..15 R/ ¯ W CLK Observation: Access (and addressing) happens in bus-width-size “chunks”. Intro Basics Assembly Memory Pipelines
What is. . . a Memory Interface? Memory Interface gets and stores binary words in off-chip memory. Smallest granularity: Bus width Tells outside memory • “where” through address bus • “what” through data bus Computer main memory is “Dynamic RAM” (DRAM): Slow, but small and cheap. Intro Basics Assembly Memory Pipelines
Outline Intro The Basic Subsystems Machine Language The Memory Hierarchy Pipelines Intro Basics Assembly Memory Pipelines
A Very Simple Program 4: c7 45 f4 05 00 00 00 movl $0x5, − 0xc(%rbp) b: c7 45 f8 11 00 00 00 movl $0x11, − 0x8(%rbp) int a = 5; 12: 8b 45 f4 mov − 0xc(%rbp),%eax int b = 17; 15: 0f af 45 f8 imul − 0x8(%rbp),%eax int z = a ∗ b; 19: 89 45 fc mov %eax, − 0x4(%rbp) 1c: 8b 45 fc mov − 0x4(%rbp),%eax Things to know: • Addressing modes (Immediate, Register, Base plus Offset) • 0xHexadecimal • “AT&T Form”: (we’ll use this) <opcode><size> <source>, <dest> Intro Basics Assembly Memory Pipelines
Another Look Memory Interface Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit Intro Basics Assembly Memory Pipelines
Another Look 4: c7 45 f4 05 00 00 00 movl $0x5, − 0xc(%rbp) b: c7 45 f8 11 00 00 00 movl $0x11, − 0x8(%rbp) 12: 8b 45 f4 mov − 0xc(%rbp),%eax Memory Interface 15: 0f af 45 f8 imul − 0x8(%rbp),%eax 19: 89 45 fc mov %eax, − 0x4(%rbp) 1c: 8b 45 fc mov − 0x4(%rbp),%eax Address ALU Address Bus Data Bus Register File Flags Internal Bus Insn. PC fetch Data ALU Control Unit Intro Basics Assembly Memory Pipelines
A Very Simple Program: Intel Form 4: c7 45 f4 05 00 00 00 mov DWORD PTR [rbp − 0xc],0x5 b: c7 45 f8 11 00 00 00 mov DWORD PTR [rbp − 0x8],0x11 12: 8b 45 f4 mov eax,DWORD PTR [rbp − 0xc] 15: 0f af 45 f8 imul eax,DWORD PTR [rbp − 0x8] 19: 89 45 fc mov DWORD PTR [rbp − 0x4],eax 1c: 8b 45 fc mov eax,DWORD PTR [rbp − 0x4] • “Intel Form”: (you might see this on the net) <opcode> <sized dest>, <sized source> • Goal: Reading comprehension. • Don’t understand an opcode? Google “ <opcode> intel instruction ”. Intro Basics Assembly Memory Pipelines
Machine Language Loops 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp int main() 4: c7 45 f8 00 00 00 00 movl $0x0, − 0x8(%rbp) { b: c7 45 fc 00 00 00 00 movl $0x0, − 0x4(%rbp) int y = 0, i ; 12: eb 0a jmp 1e < main+0x1e > 14: 8b 45 fc mov − 0x4(%rbp),%eax for (i = 0; 17: 01 45 f8 add %eax, − 0x8(%rbp) y < 10; ++i) 1a: 83 45 fc 01 addl $0x1, − 0x4(%rbp) y += i; 1e: 83 7d f8 09 cmpl $0x9, − 0x8(%rbp) return y; 22: 7e f0 jle 14 < main+0x14 > 24: 8b 45 f8 mov − 0x8(%rbp),%eax } 27: c9 leaveq 28: c3 retq Things to know: • Condition Codes (Flags): Zero, Sign, Carry, etc. • Call Stack: Stack frame, stack pointer, base pointer • ABI: Calling conventions Intro Basics Assembly Memory Pipelines
Machine Language Loops 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp int main() 4: c7 45 f8 00 00 00 00 movl $0x0, − 0x8(%rbp) { b: c7 45 fc 00 00 00 00 movl $0x0, − 0x4(%rbp) int y = 0, i ; 12: eb 0a jmp 1e < main+0x1e > 14: 8b 45 fc mov − 0x4(%rbp),%eax for (i = 0; 17: 01 45 f8 add %eax, − 0x8(%rbp) y < 10; ++i) 1a: 83 45 fc 01 addl $0x1, − 0x4(%rbp) y += i; 1e: 83 7d f8 09 cmpl $0x9, − 0x8(%rbp) return y; 22: 7e f0 jle 14 < main+0x14 > 24: 8b 45 f8 mov − 0x8(%rbp),%eax } 27: c9 leaveq 28: c3 retq Things to know: Want to make those yourself? • Condition Codes (Flags): Zero, Sign, Carry, etc. Write myprogram.c . • Call Stack: Stack frame, stack pointer, base pointer $ cc -c myprogram.c $ objdump --disassemble myprogram.o • ABI: Calling conventions Intro Basics Assembly Memory Pipelines
We know how a computer works! All of this can be built in about 4000 transistors. (e.g. MOS 6502 in Apple II, Commodore 64, Atari 2600) So what exactly is Intel doing with the other 581,996,000 transistors? Answer: Intro Basics Assembly Memory Pipelines
We know how a computer works! All of this can be built in about 4000 transistors. (e.g. MOS 6502 in Apple II, Commodore 64, Atari 2600) So what exactly is Intel doing with the other 581,996,000 transistors? Answer: Make things go faster! Intro Basics Assembly Memory Pipelines
Recommend
More recommend