Extended Functionality: j Jump MemtoReg Control MemWrite Unit Branch PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK CLK 0 WE3 Zero WE SrcA PC' 25:21 0 A1 RD1 PC Instr 0 Result 1 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 PCJump 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + 27:0 31:28 25:0 <<2 Chapter 7 <30>
Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump 1 1 0 0 0 0 10 0 R-type 000000 1 0 1 0 0 1 00 0 lw 100011 0 X 1 0 1 X 00 0 101011 sw 0 X 0 1 0 X 01 0 000100 beq j 000010 Chapter 7 <31>
Control Unit: Main Decoder Instruction Op 5:0 RegWrite RegDst AluSrc Branch MemWrite MemtoReg ALUOp 1:0 Jump 1 1 0 0 0 0 10 0 R-type 000000 1 0 1 0 0 1 00 0 lw 100011 0 X 1 0 1 X 00 0 101011 sw 0 X 0 1 0 X 01 0 000100 beq 0 X X X 0 X XX 1 j 000010 Chapter 7 <32>
Review: Processor Performance Program Execution Time = (#instructions)(cycles/instruction)(seconds/cycle) = # instructions x CPI x T C Chapter 7 <33>
Single-Cycle Performance MemtoReg Control MemWrite Unit 0 Branch 0 PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK 1 0 CLK 010 1 WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 A RD ALU ALUResult ReadData 1 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 0 20:16 0 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + Result T C limited by critical path ( lw ) Chapter 7 <34>
Single-Cycle Performance • Single-cycle critical path: T c = t pcq_PC + t mem + max( t RF read , t sext + t mux ) + t ALU + t mem + t mux + t RF setup • Typically, limiting paths are: – memory, ALU, register file – T c = t pcq_PC + 2 t mem + t RF read + t mux + t ALU + t RF setup Chapter 7 <35>
Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = ? Chapter 7 <36>
Single-Cycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = t pcq_PC + 2 t mem + t RF read + t mux + t ALU + t RF setup = [30 + 2(250) + 150 + 25 + 200 + 20] ps = 925 ps Chapter 7 <37>
Single-Cycle Performance Example Program with 100 billion instructions: Execution Time = # instructions x CPI x T C = (100 × 10 9 )(1)(925 × 10 -12 s) = 92.5 seconds Chapter 7 <38>
Multicycle MIPS Processor • Single-cycle: + simple - cycle time limited by longest instruction ( lw ) - 2 adders/ALUs & 2 memories • Multicycle: + higher clock speed + simpler instructions run faster + reuse expensive hardware on multiple cycles - sequencing overhead paid many times • Same design steps: datapath & control Chapter 7 <39>
Multicycle State Elements • Replace Instruction and Data memories with a single unified memory – more realistic CLK CLK CLK WE WE3 A1 RD1 PC' PC RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <40>
Multicycle Datapath: Instruction Fetch STEP 1: Fetch instruction IRWrite CLK CLK CLK CLK WE WE3 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <41>
Multicycle Datapath: lw Register Read STEP 2a: Read source operands from RF IRWrite CLK CLK CLK CLK CLK WE WE3 A 25:21 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 Chapter 7 <42>
Multicycle Datapath: lw Immediate STEP 2b: Sign-extend the immediate IRWrite CLK CLK CLK CLK CLK WE WE3 A 25:21 A1 RD1 PC' PC Instr b RD A A2 RD2 EN Instr / Data Memory A3 Register WD File WD3 SignImm 15:0 Sign Extend Chapter 7 <43>
Multicycle Datapath: lw Address STEP 3: Compute the memory address IRWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b RD ALU ALUResult ALUOut A A2 RD2 EN SrcB Instr / Data Memory A3 Register WD File WD3 SignImm 15:0 Sign Extend Chapter 7 <44>
Multicycle Datapath: lw Memory Read STEP 4: Read data from memory IorD IRWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b Adr 0 RD ALU ALUResult ALUOut A A2 RD2 EN 1 SrcB Instr / Data Memory A3 CLK Register WD File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <45>
Multicycle Datapath: lw Write Register STEP 5: Write data back to register file IorD IRWrite RegWrite ALUControl 2:0 CLK CLK CLK CLK CLK CLK WE WE3 SrcA A 25:21 A1 RD1 PC' PC Instr b 0 RD Adr ALU ALUResult ALUOut A A2 RD2 EN 1 SrcB Instr / Data Memory 20:16 A3 CLK Register WD File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <46>
Multicycle Datapath: Increment PC STEP 6: Increment PC PCWrite IorD IRWrite RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr ALU ALUResult ALUOut A A2 RD2 00 EN EN 1 SrcB 4 01 Instr / Data Memory 20:16 A3 10 CLK Register WD 11 File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <47>
Multicycle Datapath: sw Write data in rt to memory PCWrite IorD MemWrite IRWrite RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 4 01 Instr / Data SrcB Memory 20:16 A3 10 CLK Register WD 11 File Data WD3 SignImm 15:0 Sign Extend Chapter 7 <48>
Multicycle Datapath: R-Type • Read from rs and rt • Write ALUResult to register file • Write to rd (instead of rt ) PCWrite IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 A 25:21 A1 RD1 1 PC' PC Instr b 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 4 01 Instr / Data 20:16 SrcB 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 SignImm 15:0 Sign Extend Chapter 7 <49>
Multicycle Datapath: beq • rs == rt ? • BTA = (sign-extended immediate << 2) + (PC+4) PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 Branch PCWrite PCSrc CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 b 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <50>
Multicycle Processor CLK PCWrite PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <51>
Multicycle Control Control MemtoReg Unit RegDst IorD Multiplexer Selects PCSrc Main ALUSrcB 1:0 Controller Opcode 5:0 ALUSrcA (FSM) IRWrite MemWrite Register PCWrite Enables Branch RegWrite ALUOp 1:0 ALU Funct 5:0 ALUControl 2:0 Decoder Chapter 7 <52>
Main Controller FSM: Fetch S0: Fetch Reset CLK 1 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 0 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK 0 WE WE3 Zero 0 A 25:21 A1 RD1 1 PC' PC Instr 0 01 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 1 1 20:16 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <53>
Main Controller FSM: Fetch S0: Fetch IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite CLK 1 PCWrite PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 0 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK 0 WE WE3 Zero 0 A 25:21 A1 RD1 1 PC' PC Instr 0 01 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 1 1 20:16 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <54>
Main Controller FSM: Decode S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite CLK 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst X CLK CLK CLK 0 0 CLK CLK 0 SrcA XXX CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 XX 0 RD Adr B ALU 20:16 ALUResult ALUOut A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <55>
Main Controller FSM: Address S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW CLK 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 1 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 10 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <56>
Main Controller FSM: Address S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or CLK S2: MemAdr Op = SW 0 PCWrite 0 PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 ALUSrcA = 1 MemWrite ALUSrcB 1:0 ALUSrcB = 10 IRWrite ALUOp = 00 ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst 1 CLK CLK CLK 0 0 CLK CLK 0 SrcA 010 CLK X WE WE3 Zero X A 25:21 A1 RD1 1 PC' PC Instr 0 10 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 1 1 X 4 01 Instr / Data SrcB 0 20:16 0 0 Memory A3 10 15:11 1 CLK Register X WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <57>
Main Controller FSM: lw S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = LW S3: MemRead IorD = 1 S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <58>
Main Controller FSM: sw S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW or S2: MemAdr Op = SW ALUSrcA = 1 ALUSrcB = 10 ALUOp = 00 Op = SW Op = LW S5: MemWrite S3: MemRead IorD = 1 IorD = 1 MemWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <59>
Main Controller FSM: R-Type S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUOp = 00 PCSrc = 0 IRWrite PCWrite Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 00 ALUOp = 10 Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <60>
Main Controller FSM: beq S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S8: Branch ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <61>
Multicycle Controller FSM S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S8: Branch ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite Writeback S3: MemRead RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <62>
Extended Functionality: addi S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUOp = 00 ALUOp = 10 PCSrc = 1 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 IorD = 1 IorD = 1 MemtoReg = 0 MemWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <63>
Main Controller FSM: addi S0: Fetch S1: Decode IorD = 0 AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 ALUOp = 00 ALUSrcB = 11 PCSrc = 0 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 1 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <64>
Extended Functionality: j PCEn IorD MemWrite IRWrite RegDst MemtoReg RegWrite ALUSrcA ALUSrcB 1:0 ALUControl 2:0 Branch PCWrite PCSrc 1:0 CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 31:28 25:21 A1 RD1 1 PC' PC Instr 00 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 01 1 4 01 Instr / Data SrcB 10 20:16 0 Memory A3 10 15:11 1 PCJump CLK Register WD 11 File 0 Data WD3 1 <<2 27:0 <<2 SignImm 15:0 Sign Extend 25:0 (jump) Chapter 7 <65>
Main Controller FSM: j S0: Fetch S1: Decode IorD = 0 S11: Jump AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 Op = J ALUOp = 00 ALUSrcB = 11 PCSrc = 00 ALUOp = 00 IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <66>
Main Controller FSM: j S0: Fetch S1: Decode IorD = 0 S11: Jump AluSrcA = 0 Reset ALUSrcB = 01 ALUSrcA = 0 Op = J ALUOp = 00 ALUSrcB = 11 PCSrc = 10 PCSrc = 00 ALUOp = 00 PCWrite IRWrite PCWrite Op = ADDI Op = BEQ Op = LW Op = R-type or S2: MemAdr Op = SW S6: Execute S9: ADDI S8: Branch Execute ALUSrcA = 1 ALUSrcA = 1 ALUSrcA = 1 ALUSrcB = 00 ALUSrcA = 1 ALUSrcB = 10 ALUSrcB = 00 ALUOp = 01 ALUSrcB = 10 ALUOp = 00 ALUOp = 10 PCSrc = 01 ALUOp = 00 Branch Op = SW Op = LW S7: ALU S5: MemWrite S10: ADDI Writeback S3: MemRead Writeback RegDst = 1 RegDst = 0 IorD = 1 IorD = 1 MemtoReg = 0 MemtoReg = 0 MemWrite RegWrite RegWrite S4: Mem Writeback RegDst = 0 MemtoReg = 1 RegWrite Chapter 7 <67>
Multicycle Processor Performance • Instructions take different number of cycles: – 3 cycles: beq , j – 4 cycles: R-Type, sw , addi – 5 cycles: lw • CPI is weighted average • SPECINT2000 benchmark: – 25% loads – 10% stores – 11% branches – 2% jumps – 52% R-type Average CPI = (0.11 + 0.02)(3) + (0.52 + 0.10)(4) + (0.25)(5) = 4.12 Chapter 7 <68>
Multicycle Processor Performance Multicycle critical path: T c = t pcq + t mux + max( t ALU + t mux , t mem ) + t setup CLK PCWrite PCEn Branch IorD PCSrc Control ALUControl 2:0 Unit MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 25:21 A1 RD1 1 PC' PC Instr 0 0 RD Adr B ALU ALUResult ALUOut A 20:16 A2 RD2 00 EN EN 1 1 4 01 Instr / Data SrcB 20:16 0 Memory A3 10 15:11 1 CLK Register WD 11 File 0 Data WD3 1 <<2 SignImm 15:0 Sign Extend Chapter 7 <69>
Multicycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = ? Chapter 7 <70>
Multicycle Performance Example Element Parameter Delay (ps) Register clock-to-Q t pcq _PC 30 Register setup t setup 20 Multiplexer t mux 25 ALU t ALU 200 Memory read t mem 250 Register file read t RF read 150 Register file setup t RF setup 20 T c = t pcq_PC + t mux + max( t ALU + t mux , t mem ) + t setup = t pcq_PC + t mux + t mem + t setup = [30 + 25 + 250 + 20] ps = 325 ps Chapter 7 <71>
Multicycle Performance Example Program with 100 billion instructions Execution Time = ? Chapter 7 <72>
Multicycle Performance Example Program with 100 billion instructions Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × 10 -12 ) = 133.9 seconds This is slower than the single-cycle processor (92.5 seconds). Why? Chapter 7 <73>
Multicycle Performance Example Program with 100 billion instructions Execution Time = (# instructions) × CPI × T c = (100 × 10 9 )(4.12)(325 × 10 -12 ) = 133.9 seconds This is slower than the single-cycle processor (92.5 seconds). Why? – Not all steps same length – Sequencing overhead for each step ( t pcq + t setup = 50 ps) Chapter 7 <74>
Review: Single-Cycle Processor Jump MemtoReg Control MemWrite Unit Branch PCSrc ALUControl 2:0 31:26 Op ALUSrc 5:0 Funct RegDst RegWrite CLK CLK CLK 0 WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 Result 1 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 PCJump 15:11 1 WriteReg 4:0 PCPlus4 + SignImm <<2 4 15:0 Sign Extend PCBranch + 27:0 31:28 25:0 <<2 Chapter 7 <75>
Review: Multicycle Processor CLK PCWrite PCEn Branch IorD PCSrc Control Unit ALUControl 2:0 MemWrite ALUSrcB 1:0 IRWrite ALUSrcA 31:26 Op RegWrite 5:0 Funct MemtoReg RegDst CLK CLK CLK CLK CLK 0 SrcA CLK WE WE3 Zero A 31:28 25:21 A1 RD1 1 PC' PC Instr 00 0 RD Adr B ALU ALUResult ALUOut 20:16 A A2 RD2 00 EN EN 01 1 4 01 Instr / Data SrcB 10 20:16 0 Memory A3 10 15:11 1 PCJump CLK Register WD 11 File 0 Data WD3 1 <<2 27:0 <<2 ImmExt 15:0 Sign Extend 25:0 (Addr) Chapter 7 <76>
Pipelined MIPS Processor • Temporal parallelism • Divide single-cycle processor into 5 stages: – Fetch – Decode – Execute – Memory – Writeback • Add pipeline registers between stages Chapter 7 <77>
Single-Cycle vs. Pipelined Single-Cycle 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 Instr Time (ps) Fetch Execute Memory Write Decode 1 Instruction ALU Read / Write Reg Read Reg Fetch Execute Memory Write Decode 2 Instruction ALU Read / Write Reg Read Reg Pipelined Instr Fetch Execute Memory Write Decode 1 Instruction ALU Read/Write Reg Read Reg Fetch Execute Memory Write Decode 2 Instruction ALU Read/Write Reg Read Reg Fetch Execute Memory Write Decode 3 Instruction ALU Read/Write Reg Read Reg Chapter 7 <78>
Pipelined Processor Abstraction 1 2 3 4 5 6 7 8 9 10 Time (cycles) $0 $s2 lw DM lw $s2, 40($0) IM RF 40 RF + $t1 $s3 add DM add $s3, $t1, $t2 IM RF $t2 RF + $s1 $s4 sub DM sub $s4, $s1, $s5 IM RF $s5 RF - $t5 $s5 and DM and $s5, $t5, $t6 IM RF $t6 RF & $s1 $s6 sw DM sw $s6, 20($s1) IM RF 20 RF + $t3 $s7 or DM or $s7, $t3, $t4 IM RF $t4 RF | Chapter 7 <79>
Single-Cycle & Pipelined Datapath CLK CLK CLK WE3 Zero WE SrcA 25:21 0 A1 RD1 PC' PC Instr 0 A RD ALU ALUResult ReadData 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcB Data Memory A3 1 Memory WriteData Register WD3 WD File 20:16 0 WriteReg 4:0 15:11 1 PCPlus4 + SignImm 4 <<2 15:0 Sign Extend PCBranch + Result CLK CLK ALUOutW CLK CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory WriteDataM Register WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 RdE 15:11 1 + SignImmE 4 <<2 15:0 PCBranchM Sign Extend + PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Memory Writeback Chapter 7 <80>
Corrected Pipelined Datapath CLK CLK ALUOutW CLK CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory WriteDataM Register WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE 15:11 1 SignImmE + <<2 15:0 Sign Extend PCBranchM 4 + PCPlus4F PCPlus4D PCPlus4E ResultW Fetch Decode Execute Memory Writeback WriteReg must arrive at same time as Result Chapter 7 <81>
Pipelined Processor Control CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD BranchE BranchM BranchD 31:26 PCSrcM Op ALUControlE 2:0 ALUControlD 5:0 Funct ALUSrcD ALUSrcE RegDstD RegDstE ALUOutW CLK CLK CLK CLK WE3 WE ZeroM SrcAE 25:21 0 A1 RD1 PC' PCF InstrD 0 A RD ALU ReadDataW ALUOutM 1 A RD 1 Instruction 20:16 A2 RD2 0 SrcBE Data Memory A3 1 Memory Register WriteDataM WriteDataE WD3 WD File RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE 15:11 1 + <<2 15:0 Sign Extend SignImmE PCBranchM 4 + PCPlus4F PCPlus4D PCPlus4E ResultW Same control unit as single-cycle processor • Control delayed to proper pipeline stage • Chapter 7 <82>
Pipeline Hazards • When an instruction depends on result from instruction that hasn’t completed • Types: – Data hazard: register value not yet written back to register file – Control hazard: next instruction not decided yet (caused by branches) Chapter 7 <83>
Data Hazard 1 2 3 4 5 6 7 8 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <84>
Handling Data Hazards • Insert nop s in code at compile time • Rearrange code at compile time • Forward data at run time • Stall the processor at run time Chapter 7 <85>
Compile-Time Hazard Elimination • Insert enough nop s for result to be ready • Or move independent useful instructions forward 1 2 3 4 5 6 7 8 9 10 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + nop DM nop IM RF RF nop DM nop IM RF RF $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <86>
Data Forwarding 1 2 3 4 5 6 7 8 Time (cycles) $s2 $s0 add DM add $s0, $s2, $s3 IM RF $s3 RF + $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <87>
Data Forwarding CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW 1 ALUOutM 10 A RD Instruction 20:16 A2 RD2 00 0 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D PCPlus4E PCBranchM ResultW ForwardBE RegWriteW ForwardAE RegWriteM Hazard Unit Chapter 7 <88>
Data Forwarding • Forward to Execute stage from either: – Memory stage or – Writeback stage • Forwarding logic for ForwardAE : if (( rsE != 0) AND ( rsE == WriteRegM ) AND RegWriteM ) then ForwardAE = 10 else if (( rsE != 0) AND ( rsE == WriteRegW ) AND RegWriteW ) then ForwardAE = 01 else ForwardAE = 00 Forwarding logic for ForwardBE same, but replace rsE with rtE Chapter 7 <89>
Stalling 1 2 3 4 5 6 7 8 Time (cycles) $0 $s0 lw DM lw $s0, 40($0) IM RF 40 RF + Trouble! $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF & $s4 $t1 or DM or $t1, $s4, $s0 IM RF $s0 RF | $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <90>
Stalling 1 2 3 4 5 6 7 8 9 Time (cycles) $0 $s0 lw DM lw $s0, 40($0) IM RF 40 RF + $s0 $s0 $t0 and DM and $t0, $s0, $s1 IM RF $s1 RF $s1 RF & $s4 $t1 or or DM or $t1, $s4, $s0 IM IM RF $s0 RF | Stall $s0 $t2 sub DM sub $t2, $s0, $s5 IM RF $s5 RF - Chapter 7 <91>
Stalling Hardware CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 EN 10 A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR PCPlus4E EN PCBranchM ResultW MemtoRegE ForwardBE RegWriteW ForwardAE RegWriteM FlushE StallF StallD Hazard Unit Chapter 7 <92>
Stalling Logic lwstall = (( rsD == rtE ) OR ( rtD == rtE )) AND MemtoRegE StallF = StallD = FlushE = lwstall Chapter 7 <93>
Control Hazards • beq : – branch not determined until 4 th stage of pipeline – Instructions after branch fetched before branch occurs – These instructions must be flushed if branch happens • Branch misprediction penalty – number of instruction flushed when branch is taken – May be reduced by determining branch earlier Chapter 7 <94>
Control Hazards: Original Pipeline CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE PCSrcM BranchD BranchE BranchM CLK CLK CLK CLK WE3 WE SrcAE ZeroM 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 10 EN A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D PCPlus4E CLR EN PCBranchM ResultW MemtoRegE ForwardBE RegWriteW ForwardAE RegWriteM FlushE StallD StallF Hazard Unit Chapter 7 <95>
Control Hazards 1 2 3 4 5 6 7 8 9 Time (cycles) $t1 lw DM 20 beq $t1, $t2, 40 IM RF $t2 RF - $s0 and DM 24 and $t0, $s0, $s1 IM RF $s1 RF & Flush these $s4 or DM instructions 28 or $t1, $s4, $s0 IM RF $s0 RF | $s0 sub DM 2C sub $t2, $s0, $s5 IM RF $s5 RF - 30 ... ... $s2 $t3 slt DM 64 slt $t3, $s2, $s3 slt IM RF $s3 RF Chapter 7 <96>
Early Branch Resolution CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE BranchD PCSrcD EqualD CLK CLK CLK CLK = WE3 WE SrcAE 25:21 0 A1 RD1 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 10 EN A RD Instruction 20:16 A2 RD2 0 00 SrcBE Data Memory 01 A3 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdE RdE 15:11 1 SignImmD SignImmE + Sign 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR CLR EN PCBranchD ResultW MemtoRegE RegWriteW ForwardBE ForwardAE RegWriteM FlushE StallF StallD Hazard Unit Introduced another data hazard in Decode stage Chapter 7 <97>
Early Branch Resolution 1 2 3 4 5 6 7 8 9 Time (cycles) $t1 lw DM 20 beq $t1, $t2, 40 IM RF $t2 RF - $s0 Flush and DM 24 and $t0, $s0, $s1 IM RF $s1 RF this & instruction 28 or $t1, $s4, $s0 2C sub $t2, $s0, $s5 30 ... ... $s2 $t3 slt DM 64 slt $t3, $s2, $s3 slt IM RF $s3 RF Chapter 7 <98>
Handling Data & Control Hazards CLK CLK CLK RegWriteE RegWriteM RegWriteW RegWriteD Control MemtoRegE MemtoRegM MemtoRegW MemtoRegD Unit MemWriteE MemWriteM MemWriteD ALUControlD 2:0 ALUControlE 2:0 31:26 Op ALUSrcD ALUSrcE 5:0 Funct RegDstD RegDstE BranchD PCSrcD EqualD CLK CLK CLK CLK = WE3 WE SrcAE 25:21 0 A1 RD1 0 00 PC' PCF InstrD A RD 01 ALU ReadDataW ALUOutM 1 1 10 EN A RD Instruction 20:16 A2 RD2 0 0 00 SrcBE Data Memory 01 A3 1 1 Memory 10 Register WriteDataM WriteDataE WD3 WD File 1 RsD RsE ALUOutW 25:21 0 RtD RtE 20:16 0 WriteRegE 4:0 WriteRegM 4:0 WriteRegW 4:0 RdD RdE 15:11 1 SignImmD SignImmE Sign + 15:0 Extend 4 <<2 + PCPlus4F PCPlus4D CLR CLR EN PCBranchD ResultW MemtoRegE ForwardBD ForwardBE RegWriteW ForwardAD ForwardAE RegWriteM RegWriteE BranchD FlushE StallD StallF Hazard Unit Chapter 7 <99>
Control Forwarding & Stalling Logic • Forwarding logic: ForwardAD = ( rsD !=0) AND ( rsD == WriteRegM ) AND RegWriteM ForwardBD = ( rtD !=0) AND ( rtD == WriteRegM) AND RegWriteM • Stalling logic: branchstall = BranchD AND [RegWriteE AND (( WriteRegE == rsD) OR ( WriteRegE == rtD )) OR [ MemtoRegM AND (( WriteRegM == rsD) OR ( WriteRegM == rtD ))] StallF = StallD = FlushE = lwstall OR branchstall Chapter 7 <100>
Recommend
More recommend