DLX Pipeline 2-stage fully pipelined Adder 4-stage fully pipelined Multiplier 5-cycle non-pipelined Divider EX MEM A1 A2 IF ID WB M1 M2 M3 M4 DIV (5 cycle non pipelined) 9
Structural Hazard: WB stage 1 2 3 4 5 6 A IF ID + + MEM WB B IF ID EX MEM WB A : ADD.D F0, F2, F4 B : L.D F18, 100(R4) Contention for: Write ports in Register File in WB stage (cycle 6) Data paths through MEM stage (cycle 5) 13
Structural Hazard: WB stage 1 2 3 4 5 6 7 8 9 IF ID / / / / / MEM WB IF ID * * * * MEM WB IF ID + + MEM WB IF ID + + MEM WB IF ID EX MEM WB A : DIV.D F0, F2, F4 B: MUL.D F6, F8, F10 C: ADD.D F12, F14, F16 D: ADD.D F18, F20, F22 E: L.D F24, 100(R4) Contention for: Write ports in Register File in WB stage (cycle 9) Data paths through MEM stage (cycle 8) 14
Solutions for WB Structural Hazards 1. Multiple write ports in register file Extra hardware. Slowdown • Should we design for the peak vs average number of writes per cycle? • 2. Buffer requests at WB stage and write one at a time • How deep should the buffer queue be? 3. Stall: Allow only 1 write to propagate to the WB stage In MEM stage (EX/MEM pipeline register) Easy (+) Prioritize based on heuristics (longest latency) (+) Need to propagate stall backwards (-) Two sources of resource stalls (-) In ID stage : Only release instruction that won’t cause hazard in WB stage Centralized handling of stalls (+) Occurs earlier than necessary (-) We will allow. S.D and FP instruction to go through MEM stage at the same time 15
Stall in MEM stage EX MEM A1 A2 IF ID WB M1 M2 M3 M4 MUX DIV (5 cycle non pipelined) 16
Stall in ID stage Check if instruction currently in ID will use WB at the same cycle as a previously issued instruction. If so Stall else Issue the instruction Simple hardware implementation: • Shift register of length L equal to length of longest path from ID to WB – Tracks the usage of WB for the next L cycles – Bit j of the Shift Register is True whenever an issued instruction will use WB j cycles from now – Every cycle shift the contents by 1 bit (so bit j becomes bit number j-1) Assume instruction in ID wants to use register file in the WB stage: 1. Determine how many cycles later will instruction in ID use the WB stage (say d) (Depends on FU required by the instruction) 2. Check if bit d of register is set or not. If set Stall current instruction for 1 cycle else Set bit d of shift register to 1 3. Shift register one bit position 17
Summary of Design Features to Avoid Structural Hazards 1 2 3 4 5 6 7 DIV 1 1 MUL1 MUL1 1 1 1 1 ADD1 ADD1 1 1 ADD1 1 1 1 ADD2 1 1 1 1 DIV Writes LOAD 1 1 1 MUL Writes LOAD 1 1 1 ADD1 Writes 1 1 ADD2 Writes 3 1 LD Writes
Handling WB Conflict with stalls in ID Stage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 IF ID / / / / / M WB IF ID ID * * * * M WB IF IF ID ID ID + + M WB IF IF IF ID + + M WB IF ID ID EX M WB A : DIV.D F0, F2, F4 B: MUL.D F6, F8, F10 C: ADD.D F12, F14, F16 D: ADD.D F18, F20, F22 E: L.D F24, 100(R4)
Summary of Design Features to Avoid Structural Hazards • Contention for Data Path between ID and EX stages : To allow multiple FUs to be simultaneously active – Fully Pipelined FUs so that they have an initiation latency of 1 cycle – Allowed multiple instructions to be in the ID/EX pipeline register • Created separate pipeline registers for each non-pipelined FU • Sequence of DIV, MUL, ADD, ADD can all be simultaneously active • Contention for FUs in EX stage : – Fully pipeline the units – Replicate the units • Contention for Register File in WB stage • Contention for Datapaths from EX to WB – Single write port in Register File – Only 1 completing instruction will reach WB stage at any cycle – Implemented by stalling instruction at the ID stage if it will want WB at the same cycle as an 2 in-flight instruction
FP Pipeline Model 4-stage fully pipelined adder, Non-pipelined multiplier and divider A1 A2 A3 A4 EX IF MEM WB ID/REG DIV (6 cycle non pipelined) MUL (4 cycle non pipelined) 1
RAW Hazards Source instruction produces a value (writes a register) e.g Arithmetic, Load • Target instruction consumes value (read the register) e.g. Arithmetic, Store • Example: A: ADD.D F0 , F2, F4 B: ADD.D F6, F8, F0 C: MUL.D F10, F12, F14 Hazard Detection Unit: Checks for RAW Hazards for instruction in ID stage. Stalls instruction in ID till data is available B stalled in ID stage till F0 written by A at cycle 8 ID + + + + MEM WB IF ID + + + + MEM WB A IF ID s s s s s + + + + IF ID ID ID ID ID ID + + + + B IF s s s s s ID * * * C IF IF IF IF IF IF ID * * * 1 2 3 4 5 6 7 8 9 10 11 12 4
Hazard Detection Simple Signaling Mechanism indicating Register is in process of being changed For each register R: BUSY[R] flag : Indicates that R is the destination of an in-flight instruction Set BUSY[R] to TRUE when instruction that writes to R is issued (leaves ID/REG stage) Clear BUSY[R] when instruction writes to R in the WB stage Issue Stage (ID/REG stage): Let instruction being issued have source registers S1, S2 and destination register D while (BUSY[S1] OR BUSY[S2]) stall instruction in ID/REG stage; BUSY[D] = TRUE; WB Stage: (First half of cycle) Write result to destination register D; BUSY[D] = FALSE; 5
Hazard Detection Is Simple Signaling Mechanism sufficient? LD F0, 0(R1) MUL F6, F2, F4 ADD F10, F0, F12 Clears BUSY[F0] flag Sets BUSY[F0] flag 1 2 3 4 5 6 7 8 9 LD IF ID EX MEM WB MUL IF ID * * * * MEM WB ADD IF ID ID + + + + Stall Cycle Waits for 6 BUSY[F0] to be unset
Hazard Detection Is Simple Signaling Mechanism sufficient? Is it possible for the Reader to read the result of an unintended instruction? LD F0, 0(R1) MUL F0, F2, F4 ADD F10, F0, F12 Clears BUSY[F0] flag Sets BUSY[F0] flag 1 2 3 4 5 6 7 8 9 LD IF ID EX MEM WB MUL IF ID * * * * MEM WB ADD IF ID ID + + + + Stall Cycle Waits for 7 BUSY[F0] to be unset Will be handled by the method we will choose to remove WAW Hazards
Forwarding for RAW Hazards Forwarding hardware to directly move output of FU to ID stage • Hazard Detection Unit: Checks for RAW Hazards for instruction in ID stage. Stalls instruction in ID till data is available Forwarding Unit: Moves data directly from production to consumption points Example: ADD.D F0 , F2, F4 ADD.D F6, F8, F0 MUL.D F10, F12, F14 1 2 3 4 5 6 7 8 A IF ID + + + + MEM WB ID ID + B IF ID ID + IF IF IF C IF ID * 8
Forwarding Hardware Data Select Value MUX Source Register Destination Register A1 A2 A3 A4 Operand Select EX MEM WB IF ID MUX DIV (4 cycle non pipelined) MUL (4 cycle non pipelined) 9
Forwarding Issue Stage (ID/REG stage): Operand Select: If the destination register of the instruction completing its EX stage this cycle equals S1 or S2 (source registers of instruction in ID stage) then forward value being generated by the EX stage At most 1 instruction (besides SD) can complete EX stage on any cycle (Why?) .. FU 1 Destination Register of Completing Instruction MUX FU n Result of Completing Instruction Data Select (FU of completing instruction) 10
Forwarding Hardware Data Select Value MUX Source Register Destination Register A1 A2 A3 A4 Operand Select EX MEM WB IF ID MUX DIV (4 cycle non pipelined) MUL (4 cycle non pipelined) 9
RAW Hazards Example: L.D F0 , 0(R2) ADD.D F6, F8, F0 MUL.D F10, F12, F14 1 2 3 4 5 6 7 8 A IF ID EX MEM WB + B IF ID s + + + C IF ID * * s * 11
WAR and WAW Hazards WAR Hazards: Cannot arise (for same reasons as in integer pipeline) • Instruction B issued implies earlier instruction A in-flight • Instruction A in-flight implies it has read the registers WAW Hazards: Possible since path lengths to write stage differ A: ADD.D F0 , F2, F4 B: LD.D F0, 0(R2) 1 2 3 4 5 6 7 8 A IF ID + + + + MEM WB B IF ID EX MEM WB Writes completed out of order Detect WAW hazard in ID stage. Stall instruction in ID stage till safe ( in example: 3 cycles) • Prevent write by first instruction (A) by disabling its write control bit -- will • 12 pass through WB stage at cycle 8 but will not write the register file
WAW Hazards Detecting a WAW Hazards complicated by many possible cases: Stall instruction in ID if its destination matches that of an in-flight instruction that will write its destination later than the instruction in ID If path length from ID to WB for current instruction is d, check for a match of destination registers with all instructions that are more than d cycles away from their WB stage. (Easy in principle!) Unnecessarily complicated for rare event (How many comparisons needed?) Compromise : Stall instruction in ID if its destination matches that of any in-flight instruction May create unnecessary stalls (-) • WAW are relatively rare events (+) • Hardware is simpler (+) (How to implement? BUSY[D] flag directly provides information) • Note: We are only stalling for instructions that write to a common register without an intervening read to that register; else it’s a RAW stall. 13
Recommend
More recommend