cs425 computer system design lecture 10 pipelining hazards
play

CS425 Computer System Design Lecture 10 Pipelining Hazards - PowerPoint PPT Presentation

CS425 Computer System Design Lecture 10 Pipelining Hazards Shankar Balachandran Dept. of Computer Science and Engineering IIT-Madras shankar@cse.iitm.ernet.in 8/28/2006 1 2 Recap 8/28/2006 3 Hennessey and Patterson Reference


  1. CS425 – Computer System Design Lecture 10 – Pipelining Hazards Shankar Balachandran Dept. of Computer Science and Engineering IIT-Madras shankar@cse.iitm.ernet.in 8/28/2006 1

  2. 2 Recap 8/28/2006

  3. 3 • Hennessey and Patterson Reference 8/28/2006

  4. Its Not That Easy for Computers • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Pipelining of branches & other instructions that change the PC – Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline 8/28/2006 4

  5. Structural Hazard – One Memory Port ALU I IF Dm Reg Reg Load n s ALU Inst 1 t IF Reg Dm Reg r. ALU Inst 2 IF Reg Dm Reg O r d Inst 3 ALU IF Reg Dm Reg e r Structural Hazard 8/28/2006 5

  6. Resolving Structural Hazards • Defn: attempt to use same hardware for two different things at the same time • Solution 1: Wait ⇒ must detect the hazard ⇒ must have mechanism to stall • Solution 2: Throw more hardware at the problem 8/28/2006 6

  7. Detection and Resolution ALU I IF Dm Reg Reg Load n s ALU Inst 1 t IF Reg Dm Reg r. ALU Inst 2 IF Reg Dm Reg O r d Bubble BubbleBubble Bubble Bubble Stall e r Inst 4 ALU IF Reg Dm Reg 8/28/2006 7

  8. Instruction Set and Structural Hazard • Simple to determine the sequence of resources used by an instruction – opcode tells it all • Uniformity in the resource usage • Compare MIPS to IA32? • MIPS approach => all instructions flow through same 5-stage pipelining 8/28/2006 8

  9. Data Hazards ALU add r1, r2, r3 IF Dm Reg Reg ALU sub r4, r1, r3 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 ALU IF Dm Reg Reg xor r10, r1, r11 8/28/2006 9

  10. Three Generic Data Hazards • Read After Write (RAW) Instr J tries to read operand before Instr I writes it I: add r1,r2,r3 J: sub r4,r1,r3 • Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. 8/28/2006 10

  11. Three Generic Data Hazards • Write After Read (WAR) InstrJ writes operand before InstrI reads it I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 8/28/2006 11

  12. Three Generic Data Hazards • Write After Write (WAW) Instr J writes operand before Instr I writes it. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 8/28/2006 12

  13. Forwarding to Avoid Data Hazard ALU add r1, r2, r3 IF Dm Reg Reg ALU sub r4, r1, r3 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 ALU IF Dm Reg Reg xor r10, r1, r11 8/28/2006 13

  14. mux 14 MEM/WR HW Change for Forwarding Memory Data EX/MEM ALU mux mux ID/EX Immediate 8/28/2006 NextPC Registers

  15. Data Hazard Even With Forwarding ALU lw r1, 0(r2) IF Dm Reg Reg ALU sub r4, r1, r6 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 8/28/2006 15

  16. Resolving this Load Hazard ALU lw r1, 0(r2) IF Dm Reg Reg ALU Bubble sub r4,r1,r6 Dm Reg IF Reg ALU and r6,r1,r7 Bubble Dm Reg IF Reg ALU Bubble or r8,r1,r9 Dm Reg IF Reg 8/28/2006 16

  17. Software Scheduling Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: Fast code: LW Rb,b LW Rb,b LW Rc,c LW Rc,c LW Re,e ADD Ra,Rb,Rc ADD Ra,Rb,Rc SW a,Ra LW Rf,f LW Re,e SW a,Ra LW Rf,f SUB Rd,Re,Rf SUB Rd,Re,Rf SW d,Rd SW d,Rd 8/28/2006 17

  18. Instruction Set Connection • What is exposed about this organizational hazard in the instruction set? • k cycle delay? – bad, CPI is not part of ISA • k instruction slot delay – load should not be followed by use of the value in the next k instructions • Nothing, but code can reduce run-time delays • MIPS did the transformation in the assembler 8/28/2006 18

  19. Historical Perspective: Microprogramming User program Main plus Data ADD Memory SUB this can change! AND . . . one of these is DATA mapped into one execution of these unit CPU control memory Supported complex instructions a sequence of simple micro-inst (RTs) Pipelined micro-instruction processing, but very limited view. Could not reorganize macroinstructions to enable pipelining 8/28/2006 19

  20. Control Hazard on Branches => Three Stage Stall ALU IF Dm Reg 10: beq r1,r3,36 Reg ALU IF Reg Dm Reg 14: and r2,r3,r5 ALU IF Reg Dm Reg 18: or r6,r1,r7 ALU IF Reg Dm Reg 22: add r8,r1,r9 ALU IF Dm Reg Reg 36: xor r10,r1,r11 8/28/2006 20

  21. Example : Branch Stall Impact • If 30% branch, Stall 3 cycles significant • Two part solution: – Determine branch taken or not sooner, AND – Compute taken branch address earlier • MIPS branch tests if register = 0 or ≠ 0 • MIPS Solution: – Move Zero test to ID/RF stage – Adder to calculate new PC in ID/RF stage – 1 clock cycle penalty for branch versus 3 8/28/2006 21

Recommend


More recommend