ee 457 unit 6c
play

EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) - PowerPoint PPT Presentation

1 EE 457 Unit 6c Control Hazards 2 Control Hazards Control (branch) hazards are named such because they deal with 40: BEQ $1,$3,28 issues related to program control 44: AND $12,$2,$5 instructions (branch, jump, 48: OR $13,$6,$2


  1. 1 EE 457 Unit 6c Control Hazards

  2. 2 Control Hazards • Control (branch) hazards are named such because they deal with 40: BEQ $1,$3,28 issues related to program control 44: AND $12,$2,$5 instructions (branch, jump, 48: OR $13,$6,$2 subroutine call, etc.) 52: ADD $14,$2,$2 • There is some delay in determining … 72: LW $4,50($7) a branch or jump instruction and thus incorrect instructions may already be in the pipeline

  3. 3 An Opening Example CC7 CC8 CC1 CC2 CC3 CC4 CC5 CC6 CC9 Beq=true BEQ outcome 40: BEQ $1,$3,28 IM DM ALU known in MEM Reg Reg stage (CC4) IM DM 44: AND $12,$2,$5 ALU Reg Reg IM DM ALU Reg Reg 48: OR $13,$6,$2 IM DM 52: ADD $14,$2,$2 3 instructions ALU Reg Reg enter the pipeline … by CC4 72: LW $4,52(7) IM DM ALU Reg Reg • How can we solve this problem?

  4. 4 Option 1: Stalling • Option 1 : Start stalling the pipeline as soon as you detect that it is a branch and keep stalling until you know the outcome CC7 CC8 CC1 CC2 CC3 CC4 CC5 CC6 CC9 BEQ=true Disadvantage : • Penalty of 3 clocks for every 40: BEQ $1,$3,28 IM DM ALU Reg Reg branch and • HW is not simplified • Still need logic to stall • Still need to flush the following IM DM 44: AND $12,$2,$5 ALU Reg Reg instruc. IM DM ALU Reg Reg 48: OR $13,$6,$2 IM DM 52: ADD $14,$2,$2 ALU Reg Reg … 72: LW $4,52(7) IM DM ALU Reg Reg

  5. 5 Option 2: Flushing • Option 2 : Pipeline assumes sequential execution by default. Optimistically assume sequential execution. Since the incorrectly fetched instructions are still in stages [IF, ID, EX] that do not alter processor state (write a register or memory) they can be safely flushed. Let us add support for this flushing… CC7 CC8 CC1 CC2 CC3 CC4 CC5 CC6 CC9 BEQ=true 40: BEQ $1,$3,28 IM DM ALU Reg Reg Still have a 3 clock penalty when the branch outcome is true IM DM 44: AND $12,$2,$5 ALU Reg Reg IM DM ALU Reg Reg 48: OR $13,$6,$2 IM DM 52: ADD $14,$2,$2 ALU Reg Reg … 72: LW $4,52(7) IM DM ALU Reg Reg

  6. 6 Option 2: Flushing • Option 2 : Pipeline assumes sequential execution by default. Optimistically assume sequential execution. Since the incorrectly fetched instructions are still in stages [IF, ID, EX] that do no alter processor state (write a register or memory) they can be safely flushed. Let us add support for this flushing… CC7 CC8 CC1 CC2 CC3 CC4 CC5 CC6 CC9 BEQ=false 40: BEQ $1,$3,28 IM DM ALU Reg Reg No penalty when the branch outcome is false IM DM 44: AND $12,$2,$5 ALU Reg Reg IM DM ALU Reg Reg 48: OR $13,$6,$2 IM DM 52: ADD $14,$2,$2 ALU Reg Reg …

  7. 7 Flushing Strategy • To flush we merely override the pipeline control signals to insert 0’s similar to the stall logic – Stall logic can be re-used and triggered by a successful branch (Branch AND ALUZero = 1) – Stalling only dealt with ID and subsequent stages, not IF stage – Successful branch requires that the instruction in IF be discarded, but on the next cycle how will the DECODE stage know that the bits in the IF register are not a real instruction but a flushed/invalid instruction • When a branch outcome is true we will… – Zero out the control signals in the ID,EX,MEM stages – Set a control bit in the IF/ID stage register that will tell the DECODE stage on the next clock cycle that the instruction is INVALID

  8. 8 Late Branch Determination 0 1 FLUSH PCWrite Mem WB IRWrite HDU 0 Mem WB 0 1 Stall IF.Flush 0 WB 0 1 Ex MemToReg Control Branch 4 + rs Read + Sh. MemRead & Reg. 1 # MemWrite 5 Left 2 Pipeline Stage Register Pipeline Stage Register rt Instruction Register Read 0 Read 1 Reg. 2 # Pipeline Stage Register data 1 5 0 2 I-Cache . Write Zero PC ALUSelA Reg. # ALU Res. Read Write 0 0 data 2 D-Cache Data 1 1 1 2 Register File Data Mem. or ALU result Sign ALUSelB ALUSrc Extend Reset 16 32 Forwarding 0 Unit rs Prior ALU 1 rt Result rd Regwrite & Regwrite, WriteReg# WriteReg#

  9. 9 Late Branch Determination 0 1 FLUSH PCWrite Mem WB HDU IRWrite 0 Mem WB 0 1 Stall 1 IF.Flush 0 WB 0 1 Ex MemToReg Control Branch 4 + rs Read + 0 Sh. MemRead & Reg. 1 # MemWrite 5 Left 2 Pipeline Stage Register Pipeline Stage Register rt Instruction Register Read 0 Read 1 Reg. 2 # Pipeline Stage Register data 1 5 0 2 I-Cache . Write Zero PC ALUSelA Reg. # ALU Res. Read Write 0 0 data 2 D-Cache Data 1 1 1 2 Register File Data Mem. or ALU result Sign ALUSelB ALUSrc Extend What if HDU declares a STALL at the Reset 16 32 Forwarding same time a Branch is taken? 0 Unit rs Prior ALU 1 rt Result When we stall, PCWrite = 0 and won’t rd Regwrite & update PC and we will lose the Branch Regwrite, WriteReg# WriteReg# Target PC (PC=PC+disp)

  10. 10 Late Branch Determination w/ HDU fix 0 1 FLUSH PCWrite Mem WB IRWrite HDU 0 Mem WB 0 1 Stall IF.Flush 0 WB 0 1 Ex MemToReg Control Branch 4 + rs Read + 1 Sh. MemRead & Reg. 1 # MemWrite 5 Left 2 Pipeline Stage Register Pipeline Stage Register rt Instruction Register Read 0 Read 1 Reg. 2 # Pipeline Stage Register data 1 5 0 2 I-Cache . Write Zero PC ALUSelA Reg. # ALU Res. Read Write 0 0 data 2 D-Cache Data 1 1 1 2 Register File Data Mem. or ALU result Sign ALUSelB ALUSrc Extend Reset 16 32 Forwarding 0 Unit Fix the HDU’s PCWrite by OR’ing with rs Prior ALU 1 the Flush signal so that PCWrite will rt Result be ‘1’ whenever a branch is taken. rd Regwrite & Regwrite, WriteReg# WriteReg#

  11. 11 Early Branch Determination • The stage distance between fetch and branch determination and target computation determines how many instructions are flushed – Define this number as the branch penalty (how many instructions/clock cycles are wasted when a branch is taken) • If we can determine the branch outcome and target computation earlier, we can reduce this penalty • Observation: All necessary information for both branch outcome and target computation are available (late) in the decode stage – Move comparison and PC+disp. operations to the DECODE stage – Requires moving forwarding logic since branch instructions may need data from later in the pipe.

  12. 12 Early Branch Determination FLUSH • Add a comparator to the Decode stage and move the forwarding into this stage EX.RegWrite • We now forward from the end of one stage to the end of a previous stage EX.RegDst Stall Mem WB PCWrite HDU IRWrite 0 1 Mem WB IF.Flush WB Ex + MemToReg Control Branch 4 + Sh. rs Read Left 2 MemRead & Reg. 1 # MemWrite Read 5 0 data 1 Pipeline Stage Register Pipeline Stage Register 1 rt Instruction Register Read 2 Reg. 2 # 5 Pipeline Stage Register 3 0 I-Cache . Write = Read ALUSelA PC Reg. # ALU data 2 0 Res. 1 Write 0 D-Cache 2 Data 1 3 1 Register File Data Mem. or ALU result ALUSelB Sign ALUSrc RegDst Extend Reset 32 16 rs 0 Forwarding Unit rt 1 Prior ALU WriteReg# MemRead WriteReg# ALUResult MemRead, Regwrite, Regwrite, WriteReg# Result rd

  13. 13 Early Determination w/ Predict NT Fetch Decode Exec. Mem. WB (IF) (ID) (EX) (ME) BEQ $a0,$a1,L1 (NT) C1 BEQ L2: ADD $s1,$t1,$t2 C2 ADD BEQ C3 SUB $t3,$t0,$s0 SUB ADD BEQ C4 OR SUB ADD BEQ OR $s0,$t6,$t7 C5 BNE OR SUB ADD BEQ BNE $s0,$s1,L2 (T) C6 AND BNE OR SUB ADD L1: AND $t3,$t6,$t7 C7 ADD BNE OR SUB nop SW $t5,0($s1) C8 SUB ADD BNE OR nop LW $s2,0($s5) C9 OR SUB ADD BNE nop C10 BNE OR SUB ADD nop Using early determination & predict NT keeps the pipeline full when we are correct and has a single instruction penalty for our 5-stage pipeline

  14. 14 Branch Delay Slots • Problem: After a branch we fetch instructions that we are not sure should be executed • Idea: Find an instruction(s) that should always be executed (independent of whether branch is T or NT), move them to directly after the branch, and have HW just let them be executed (not flushed) no matter what the branch outcome is • Branch delay slot(s) = # of instructions that the HW will execute after a branch and not flush – Assuming early branch determination (i.e. in decode), only need 1 delay slot

  15. 15 Branch Delay Slot Example “Before” Code lw $s3,0($s4) lw $s3,0($s4) add $s0,$s1,$s2 beq $s3,$t8, NEXT lw $s3,0($s4) beq $s3,$t8, NEXT add $s0,$s1,$s2 add $s0,$s1,$s2 … delay slot instruc. … BEQ Delay Slot T NT Taken Not Taken Assume a single Move an ALWAYS Path Code Path Code instruction delay slot executed instruction (the “add” from above) (as with our updated early determination down into the delay “After” Code pipeline) slot and let it execute no matter what Flowchart perspective of the delay slot

Recommend


More recommend