CS425 – Computer System Design Lecture 10 – Pipelining Hazards Shankar Balachandran Dept. of Computer Science and Engineering IIT-Madras shankar@cse.iitm.ernet.in 8/28/2006 1
2 Recap 8/28/2006
3 • Hennessey and Patterson Reference 8/28/2006
Its Not That Easy for Computers • Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle – Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) – Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) – Control hazards: Pipelining of branches & other instructions that change the PC – Common solution is to stall the pipeline until the hazard is resolved, inserting one or more “bubbles” in the pipeline 8/28/2006 4
Structural Hazard – One Memory Port ALU I IF Dm Reg Reg Load n s ALU Inst 1 t IF Reg Dm Reg r. ALU Inst 2 IF Reg Dm Reg O r d Inst 3 ALU IF Reg Dm Reg e r Structural Hazard 8/28/2006 5
Resolving Structural Hazards • Defn: attempt to use same hardware for two different things at the same time • Solution 1: Wait ⇒ must detect the hazard ⇒ must have mechanism to stall • Solution 2: Throw more hardware at the problem 8/28/2006 6
Detection and Resolution ALU I IF Dm Reg Reg Load n s ALU Inst 1 t IF Reg Dm Reg r. ALU Inst 2 IF Reg Dm Reg O r d Bubble BubbleBubble Bubble Bubble Stall e r Inst 4 ALU IF Reg Dm Reg 8/28/2006 7
Instruction Set and Structural Hazard • Simple to determine the sequence of resources used by an instruction – opcode tells it all • Uniformity in the resource usage • Compare MIPS to IA32? • MIPS approach => all instructions flow through same 5-stage pipelining 8/28/2006 8
Data Hazards ALU add r1, r2, r3 IF Dm Reg Reg ALU sub r4, r1, r3 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 ALU IF Dm Reg Reg xor r10, r1, r11 8/28/2006 9
Three Generic Data Hazards • Read After Write (RAW) Instr J tries to read operand before Instr I writes it I: add r1,r2,r3 J: sub r4,r1,r3 • Caused by a “Data Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. 8/28/2006 10
Three Generic Data Hazards • Write After Read (WAR) InstrJ writes operand before InstrI reads it I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Reads are always in stage 2, and – Writes are always in stage 5 8/28/2006 11
Three Generic Data Hazards • Write After Write (WAW) Instr J writes operand before Instr I writes it. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 • Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. • Can’t happen in MIPS 5 stage pipeline because: – All instructions take 5 stages, and – Writes are always in stage 5 8/28/2006 12
Forwarding to Avoid Data Hazard ALU add r1, r2, r3 IF Dm Reg Reg ALU sub r4, r1, r3 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 ALU IF Dm Reg Reg xor r10, r1, r11 8/28/2006 13
mux 14 MEM/WR HW Change for Forwarding Memory Data EX/MEM ALU mux mux ID/EX Immediate 8/28/2006 NextPC Registers
Data Hazard Even With Forwarding ALU lw r1, 0(r2) IF Dm Reg Reg ALU sub r4, r1, r6 IF Reg Dm Reg ALU IF Reg Dm Reg and r6, r1, r7 ALU IF Reg Dm Reg or r8, r1, r9 8/28/2006 15
Resolving this Load Hazard ALU lw r1, 0(r2) IF Dm Reg Reg ALU Bubble sub r4,r1,r6 Dm Reg IF Reg ALU and r6,r1,r7 Bubble Dm Reg IF Reg ALU Bubble or r8,r1,r9 Dm Reg IF Reg 8/28/2006 16
Software Scheduling Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: Fast code: LW Rb,b LW Rb,b LW Rc,c LW Rc,c LW Re,e ADD Ra,Rb,Rc ADD Ra,Rb,Rc SW a,Ra LW Rf,f LW Re,e SW a,Ra LW Rf,f SUB Rd,Re,Rf SUB Rd,Re,Rf SW d,Rd SW d,Rd 8/28/2006 17
Instruction Set Connection • What is exposed about this organizational hazard in the instruction set? • k cycle delay? – bad, CPI is not part of ISA • k instruction slot delay – load should not be followed by use of the value in the next k instructions • Nothing, but code can reduce run-time delays • MIPS did the transformation in the assembler 8/28/2006 18
Historical Perspective: Microprogramming User program Main plus Data ADD Memory SUB this can change! AND . . . one of these is DATA mapped into one execution of these unit CPU control memory Supported complex instructions a sequence of simple micro-inst (RTs) Pipelined micro-instruction processing, but very limited view. Could not reorganize macroinstructions to enable pipelining 8/28/2006 19
Control Hazard on Branches => Three Stage Stall ALU IF Dm Reg 10: beq r1,r3,36 Reg ALU IF Reg Dm Reg 14: and r2,r3,r5 ALU IF Reg Dm Reg 18: or r6,r1,r7 ALU IF Reg Dm Reg 22: add r8,r1,r9 ALU IF Dm Reg Reg 36: xor r10,r1,r11 8/28/2006 20
Example : Branch Stall Impact • If 30% branch, Stall 3 cycles significant • Two part solution: – Determine branch taken or not sooner, AND – Compute taken branch address earlier • MIPS branch tests if register = 0 or ≠ 0 • MIPS Solution: – Move Zero test to ID/RF stage – Adder to calculate new PC in ID/RF stage – 1 clock cycle penalty for branch versus 3 8/28/2006 21
Recommend
More recommend