pipelining 5 stage pipeline
play

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant - PowerPoint PPT Presentation

PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files


  1. PIPELINING: 5-STAGE PIPELINE Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture

  2. Overview ¨ Announcement ¤ Tonight: Homework 1 deadline (11:59PM) n Verify your uploaded files before deadline ¨ This lecture ¤ Impacts of pipelining on performance ¤ The MIPS five-stage pipeline ¤ Pipeline hazards n Structural hazards n Data hazards

  3. Single-cycle RISC Architecture ¨ Example: simple MIPS architecture ¤ Critical path includes all of the processing steps Write Back Controller PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  4. Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  5. Single-cycle RISC Architecture ¨ Example program ¤ CT=6ns; CPU Time = 5 x 1 x 6ns = 30ns AND R1,R2,R3 How to improve? XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  6. Reusing Idle Resources ¨ Each processing step finishes in a fraction of a cycle ¤ Idle resources can be reused for processing next instructions Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  7. Pipelined Architecture ¨ Five stage pipeline ¤ Critical path determines the cycle time 0.7ns Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory 1.5ns 1.05ns 1.25ns 1.5ns

  8. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  9. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 5 x 5 x 1.5ns = 37.5ns > 30ns WORSE!! AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  10. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = ? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  11. Pipelined Architecture ¨ Example program ¤ CT=1.5ns; CPU Time = 9 x 1 x 1.5ns = 13.5ns What is the cost of pipelining? AND R1,R2,R3 XOR R4,R2,R3 SUB R5,R1,R4 ADD R6,R1,R4 MUL R7,R5,R6 Time CPU Time = IC x CPI x CT

  12. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic Critical Path Delay = 30

  13. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = Critical Path Delay = 30 IPS = Combinational Logic Combinational Logic D = IPS = Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = IPS = Delay = 10 Delay = 10 Delay = 10

  14. Pipelining Technique ¨ Improving throughput at the expense of latency ¤ Delay: D = T + n δ ¤ Throughput: IPS = n/(T + n δ ) Combinational Logic D = 31 Critical Path Delay = 30 IPS = 1/31 Combinational Logic Combinational Logic D = 32 IPS = 2/32 Critical Path Delay = 15 Critical Path Delay = 15 Comb. Logic Comb. Logic Comb. Logic D = 33 IPS = 3/33 Delay = 10 Delay = 10 Delay = 10

  15. Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

  16. Pipelining Latency vs. Throughput ¨ Theoretical delay and throughput models for perfect pipelining Delay (D) Throughput (IPS) 20 Relative Performance 15 10 5 0 0 50 100 150 200 Number of Pipeline Stages

  17. Five Stage MIPS Pipeline

  18. Simple Five Stage Pipeline ¨ A pipelined load-store architecture that processes up to one instruction per cycle Write Back PC Inst. Register Data ALU Memory File Memory Inst. Fetch Inst. Decode Execute Memory

  19. Instruction Fetch ¨ Read an instruction from memory (I-Memory) ¤ Use the program counter (PC) to index into the I- Memory ¤ Compute NPC by incrementing current PC n What about branches? ¨ Update pipeline registers ¤ Write the instruction into the pipeline registers

  20. Instruction Fetch clock Branch Target NPC = PC + 4 NPC clock PC + Why increment 4 by 4? Instruction Memory Pipeline Register

  21. Instruction Fetch clock P3 Branch Target NPC = PC + 4 NPC clock PC + P2 Why increment 4 by 4? Instruction P1 Memory Critical Path = Max{P1, P2, P3} Pipeline Register

  22. Instruction Decode ¨ Generate control signals for the opcode bits ¨ Read source operands from the register file (RF) ¤ Use the specifiers for indexing RF n How many read ports are required? ¨ Update pipeline registers ¤ Send the operand and immediate values to next stage ¤ Pass control signals and NPC to next stage

  23. Instruction Decode target NPC NPC reg Register Instruction File reg ctrl decode Pipeline Pipeline Register Register

  24. Execute Stage ¨ Perform ALU operation ¤ Compute the result of ALU n Operation type: control signals n First operand: contents of a register n Second operand: either a register or the immediate value ¤ Compute branch target n Target = NPC + immediate ¨ Update pipeline registers ¤ Control signals, branch target, ALU results, and destination

  25. Execute Stage Target NPC + Res reg ALU reg reg ctrl ctrl Pipeline Pipeline Register Register

  26. Memory Access ¨ Access data memory ¤ Load/store address: ALU outcome ¤ Control signals determine read or write access ¨ Update pipeline registers ¤ ALU results from execute ¤ Loaded data from D-Memory ¤ Destination register

  27. Memory Access Target Res Res addr Dat reg Memory data data ctrl ctrl Pipeline Pipeline Register Register

  28. Register Write Back ¨ Update register file ¤ Control signals determine if a register write is needed ¤ Only one write port is required n Write the ALU result to the destination register, or n Write the loaded data into the register file

  29. Five Stage Pipeline ¨ Ideal pipeline: IPC=1 ¤ Is there enough resources to keep the pipeline stages busy all the time? Inst. Fetch Decode Execute Memory Writeback + + PC ALU Reg. Reg. 4 File File Mem Mem

  30. Pipeline Hazards

  31. Pipeline Hazards ¨ Structural hazards: multiple instructions compete for the same resource ¨ Data hazards: a dependent instruction cannot proceed because it needs a value that hasn’t been produced ¨ Control hazards: the next instruction cannot be fetched because the outcome of an earlier branch is unknown

  32. Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

  33. Structural Hazards ¨ 1. Unified memory for instruction and data R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Separate inst. and data memories.

  34. Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0

  35. Structural Hazards ¨ 1. Unified memory for instruction and data ¨ 2. Register file with shared read/write access ports R1 ß Mem[R2] R3 ß Mem[R20] R6 ß R4-R5 R7 ß R1+R0 Register access in half cycles.

Recommend


More recommend