the big picture the performance perspective computer
play

The Big Picture: The Performance Perspective Computer System - PowerPoint PPT Presentation

The Big Picture: The Performance Perspective Computer System Architecture Performance of a machine is Processor Part I determined by: CPI Instruction count Clock cycle time Chalermek Intanagonwiwat Inst. Count Cycle Time


  1. The Big Picture: The Performance Perspective Computer System Architecture • Performance of a machine is Processor Part I determined by: CPI – Instruction count – Clock cycle time Chalermek Intanagonwiwat Inst. Count Cycle Time – Clock cycles per instruction Slides courtesy of John Hennessy and David Patterson The Big Picture (cont.) How to Design a Processor: step- by-step • Processor design (datapath and control) will determine: 1. Analyze instruction set => datapath requirements – Clock cycle time – the meaning of each instruction is given – Clock cycles per instruction by the Register Transfer Language (RTL) • Today: – datapath must include storage element – Single cycle processor: for ISA registers • possibly more • Advantage: One clock cycle per – datapath must support each register instruction transfer • Disadvantage: long cycle time 1

  2. How to Design a Processor The MIPS Instruction Formats (cont.) • All MIPS instructions are 32 bits long. 2. Select set of datapath components The three instruction formats: and establish clocking methodology 3. Assemble datapath meeting the – R-type 31 26 21 16 11 6 0 requirements op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 – I-type 4. Analyze implementation of each op rs rt immediate 6 bits 5 bits 5 bits 16 bits instruction to determine setting of 31 26 0 – J-type op target address control points that effects the 6 bits 26 bits register transfer. 5. Assemble the control logic The MIPS Instruction Formats Step 1a: The MIPS-lite Subset (cont.) • ADD and SUB • The different fields are: – addU rd, rs, rt – op: operation of the instruction – subU rd, rs, rt – rs, rt, rd: the source and destination register specifiers 31 26 21 16 11 6 0 op rs rt rd shamt funct – shamt: shift amount 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits – funct: selects the variant of the operation in • OR Immediate: the “op” field – ori rt, rs, imm16 – address / immediate: address offset or immediate value 31 26 21 16 0 op rs rt immediate – target address: target address of the jump 6 bits 5 bits 5 bits 16 bits instruction 2

  3. Step 1a: The MIPS-lite Subset Logical Register Transfers (cont.) • RTL gives the meaning of the • LOAD and STORE Word instructions – lw rt, rs, imm16 – sw rt, rs, imm16 • All start by fetching the 31 26 21 16 0 instruction op rs rt immediate 6 bits 5 bits 5 bits 16 bits • BRANCH: – beq rs, rt, imm16 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits Logical Register Transfers Step 1: Requirements of the (cont.) Instruction Set • Memory op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ] – instruction & data inst Register Transfers • Registers (32 x 32) ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4 – read RS ORi R[rt] <– R[rs] | zero_ext(Imm16); PC <– PC + 4 – read RT LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4 – Write RT or RD BEQ if ( R[rs] == R[rt] ) then • PC PC <– PC + sign_ext(Imm16)] || 00 else PC <– PC + 4 3

  4. Step 1: Requirements of the Step 2: Components of the Instruction Set (cont.) Datapath • Extender • Combinational Elements • Add and Sub register or • Storage Elements extended immediate – Clocking methodology • Add 4 or extended immediate to PC Combinational Logic Elements Storage Element: Register • Adder CarryIn • Similar to the D Flip Flop A 32 except Adder Sum 32 Write Enable B Carry 32 – N-bit input and output Data In Data Out Sele N N – Write Enable input • MUX ct A 32 • Write Enable: MUX Y Clk 32 B 32 – negated (0): Data Out will O P not change A 32 • ALU ALU – asserted (1): Data Out will Result 32 B become Data In 32 4

  5. Register File Register File (cont.) • Register File consists of 32 registers: • Register is selected by: – Two 32-bit output busses: – RA (number) selects the register to put on busA (data) busA and busB – RB (number) selects the register to put – One 32-bit input bus: busW on busB (data) RW RARB – RW (number) selects the register to be Write Enable 5 5 5 written busA busW 32 32 32-bit via busW (data) when Write Enable is 1 32 Registers busB Clk 32 Register File (cont.) Register File (cont.) • Built using D flip-flops Read register number 1 Read data 1 • Clock input (CLK) Read register number 2 Read register Register file Write number 1 register – The CLK input is a factor ONLY during Read Register 0 data 2 Write Register 1 M data Write write operation u Read data 1 x Register n – 1 – During read operation, behaves as a Register n Read register combinational logic block: number 2 • RA or RB valid => busA or busB valid after M u Read data 2 “access time.” x 5

  6. Register File (cont.) Storage Element: Idealized Memory • Note: we still use the real clock to determine when to write • Memory (idealized) Write – One input bus: Data In C 0 Register 0 – One output bus: Data Out 1 D n-to-1 C Register number decoder Register 1 D n – 1 Write Enable Address n Data In DataOut C Register n – 1 32 32 D Clk C Register n Register data D Storage Element: Idealized Storage Element: Idealized Memory (cont.) Memory (cont.) • Memory word is selected by: • Clock input (CLK) – Address selects the word to put on – The CLK input is a factor ONLY Data Out during write operation – Write Enable = 1: address selects – During read operation, behaves as a the memory combinational logic block: word to be written via the Data In • Address valid => Data Out valid after bus “access time.” 6

  7. Step 3 3a: Overview of the Instruction Fetch Unit • Register Transfer Requirements • The common RTL operations –> Datapath Assembly – Fetch the Instruction: mem[PC] • Instruction Fetch – Update the program counter: • Sequential Code: PC <- PC + 4 • Read Operands and Execute • Branch and Jump: PC <- “something else” Operation 3a: Overview of the Instruction 3b: Add & Subtract Fetch Unit (cont.) • R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt – Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields PC Clk – ALUctr and RegWr: control logic Next Address Logic after decoding the instruction Address Instruction Word Instruction 32 Memory 7

  8. Register-Register Timing 3b: Add & Subtract (cont.) Clk 31 26 21 16 11 6 0 Clk-to-Q Old New Value PC op rs rt rd shamt funct Value Instruction Memory Access Time 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits Rs, Rt, Rd, Old New Value Op, Func Value Delay through Control Logic Rd Rs Rt ALUct Old New Value ALUctr RegWr r Value 5 5 5 RegWr Old New Value busA Rw Ra Rb Value Register File Access Time busW 32 32 32-bit Result ALU busA, Old New Value Registers 32 B Value 32 ALU Delay Clk busB busW Old New Value 32 Value Register Write Occurs Here 3c: Logical Operations with 3c: Logical Operations with Immediate (cont.) Immediate Rd Rt • R[rt] <- R[rs] op ZeroExt[imm16] ] RegDst Mux Rs ALUctr RegWr 5 5 5 11 31 26 21 16 0 busA Rw Ra Rb op rs rt immediate busW 32 Result 32 32-bit ALU 6 bits 5 bits 5 bits 16 bits Registers rd? 32 32 busB Clk 32 31 16 15 0 Mux ZeroExt immediate 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 imm16 32 16 16 bits 16 bits ALUSrc 8

  9. 3d: Load Operations (cont.) 3d: Load Operations Rd Rt RegDst Mux • R[rt] <- Mem[R[rs] + SignExt[imm16]] Rs ALUctr RegWr 5 5 5 Example: lw rt, rs, imm16 busA W_Src Rw Ra Rb busW 32 32 32-bit ALU Registers 32 32 busB 11 Clk 31 26 21 16 0 MemWr 32 Mux op rs rt immediate Mux WrEn Adr 6 bits 5 bits 5 bits 16 bits Extender rd Data In 32 Data 32 imm16 32 Memory 16 Clk ALUSrc ExtOp 3e: Store Operations 3e: Store Operations (cont.) Rd Rt ALUctr MemWr W_Src RegDst • Mem[ R[rs] + SignExt[imm16]] <- R[rt] Mux Rs Rt Example: sw rt, rs, imm16 RegWr 5 5 5 busA Rw Ra Rb busW 32 32 32-bit ALU Registers 32 32 31 26 21 16 0 Clk busB op rs rt immediate Mux Mux 32 WrEn Adr 6 bits 5 bits 5 bits 16 bits Extender Data In 32 32 Data imm16 32 Memory 16 Clk ALUSrc ExtOp 9

Recommend


More recommend