Chapter Six 1 2004 Morgan Kaufmann Publishers
Pipelining • The laundry analogy for pipelining: 2 2004 Morgan Kaufmann Publishers
Pipelining • Improve performance by increasing instruction throughput – Single cycle – 2400 ps – pipelining – 1400 ps Ideal speedup is number of stages in the pipeline. Do we achieve this? 3 2004 Morgan Kaufmann Publishers
• Pipelining: key to making processors fast – is an implementation technique in which multiple instructions are overlapped in execution. • Execution of MIPS instructuons: take 5 steps classically: 1. IF: Fetch instr from mem. 2. ID: Read regs while decoding the instr. 3. EX: Execute the op or calculate an addr. 4. MEM: Access an operand in data mem. 4. MEM: Access an operand in data mem. � MIPS pipeline � � � 5. WB: Write the result into a reg. use 5 stages • Graphical representation of instr pipeline • Memory and register file are written/read in the first/last half of clock cycle (shaded area : it is in use) 2 4 6 8 10 � Time IF ID EX MEM add $s0, $t0, $t1 WB 4 2004 Morgan Kaufmann Publishers
Pipelining • What makes it easy – all instructions are the same length – just a few instruction formats – memory operands appear only in loads and stores – Operands must be aligned in memory • • What makes it hard? What makes it hard? The situations in pipelining that a planned instruction cannot execute in the proper clock cycle � � Pipeline Hazards. � � – structural hazards: suppose we had only one memory – data hazards: an instruction depends on a previous instruction – control hazards: need to worry about branch instructions 5 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Structural Hazard • Structural Hazards – The situation that a planned instr cannot execute in the clock proper cycle because hardware cannot support the combination of instructions that we want to execute in the same clock. • Example – The first instr is accessing data from memory, while the fourth instr is fetching an instr from that same memory. Structural hazard occurs if there is only one memory IF • Solution : Add more hardware (add another memory) 6 2004 Morgan Kaufmann Publishers
An Overview of Piplining – Data Hazard • Data Hazard: – The situation that a planned instr cannot execute in the proper clock cycle because data that is needed to execute the instr is still in the pipeline (not yet available). • Example 1 Example 2 ( load-use data hazard ) add $s0, $t0, $t1 lw $s0, $t0, $t1 sub $t2, $s0, $t3 sub $t2, $s0, $t3 IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB • Solutions – Data forwarding (bypassing) • Retrieving the data early from internal buffers rather than waiting for it to arrive to registers or memory. – Pipeline stall (bubbles) • A stall initiated in order to resolve a hazard. – Reordering code 7 2004 Morgan Kaufmann Publishers
An Overview of Piplining – Data Hazard solution Program � execution � • Forwarding 2 4 6 8 10 order � Time (Example 1) (in instructions) add $s0, $t0, $t1 IF ID EX MEM WB IF ID EX ME WB IF ID EX ME WB sub $t2, $s0, $t3 MEM IF ID EX MEM WB 2 4 6 8 10 12 14 • Forwarding Time Program � execution � order � + stall (in instructions) (Example 2) IF ID lw $s0, 20($t1) EX MEM WB IF ID EX ME WB bubble bubble bubble bubble bubble IF ID EX ME WB sub $t2, $s0, $t3 IF ID WB EX MEM Can’t Forward only 8 2004 Morgan Kaufmann Publishers
An Overview of Piplining – Data Hazard solution Reordering code (Ans.) The hazard occurs on $t2 between the 2 nd lw and the 1 st sw, so swapping the two sw and using forwarding can remove the stall IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB IF ID EX ME WB 9 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Control Hazard • Control Hazard: – The situation that a planned instr cannot execute in the proper cycle because the instr fetched I not the one that is needed; that is, the flow of instr addresses is not what the pipeline expected. • Example: Branch instr. • Solutions – Stall: – Stall: • Wait until the pipeline determines the outcome of the branch and knows what instr address to fetch from. – Prediction • Predict the branch to be taken, or untaken. When the guess is wrong, restart the pipeline from the proper branch address. – Delayed branch • Place an instr immediately after the branch instr that is not affected by the branch. So, a taken branch changes the address of the instr that follows this safe instr . 10 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Control Hazard Solution • Stall – For branch instrs: • Assumption: put in enough extra hardware to test regs, calculate the branch addr, and update PC during the 2 nd stage • E.g.: pipeline stall, bubble • Ex. Estimate the impact on the CPI of stalling on branches. Assume all other instr have a CPI of 1 and branches are 13% of the instructions. (Ans. CPI=1.13) 11 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Control Hazard Solution • Prediction 1. Always predict that branches as untaken Branch Branch untaken Branch taken 12 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Control Hazard Solution • Prediction 2. Have some predicted as taken & some as untaken • For example, always predict taken for branches that jump to an earlier address. 3. Dynamic hardware prediction: • make guesses depending on the behavior of each branch and may change predictions for a branch over the life of a program • E.g.: keep a history for each branch as taken or untaken, and then use the past to predict the future (90% accuracy) • Misprediction: – When the guess is wrong, the pipeline control must ensure that the instrs following the wrongly guessed branch have no effect and must restart the pipeline from the proper branch addr. – Longer pipelines exacerbate the problem. 13 2004 Morgan Kaufmann Publishers
An Overview of Pipelining – Control Hazard Solution • Delayed branch: used by MIPS – Always executes the next sequential instr, with the branch taking place after that one instr delay Program � execution � 14 2 4 6 8 10 12 order � Time (in instructions) (in instructions) Instruction � Data � beq $1, $2, 40 Reg ALU Reg fetch access Instruction � Data � add $4, $5, $6 Reg ALU Reg fetch access 2 ns (Delayed branch slot) Instruction � Data � lw $3, 300($0) Reg ALU Reg fetch access 2 ns 2 ns � 14 2004 Morgan Kaufmann Publishers
Pipeline Overview Summary • Pipelining: – exploits parallelism among the instrs in a sequential instr stream – Substantial adv.: is fundamentally invisible to the programmer • Big Picture: – Pipelining increases the # of simultaneously executing instrs and the rate at which instrs are started and completed. – Pipelining does not reduce the time it takes to complete an individual instr. – Pipelining improves instr throughput rather than individual instr execution time. – Pipeline designers must cope with structural, control, and data hazards. – Branch prediction, forwarding, and stalls help make a computer fast while still getting the right answers. 15 2004 Morgan Kaufmann Publishers
Basic Idea • The single-cycle datapath from Ch5 • There are two right-to-left flows – WB stage – cause data hazard – Selection of the next value of PC – cause control hazard 16 2004 Morgan Kaufmann Publishers
Pipelined Datapath • Separate each pipeline stage by inserting pipeline register • Assume register file is written/read in the first/last half of clock cycle Can you find a problem even if there are no dependencies? What instructions can we execute to manifest the problem? 17 2004 Morgan Kaufmann Publishers
Pipeline Examples: Load and Store STORE LOAD • IF • IF – instr -> IF/ID – instr -> IF/ID – PC+4 -> PC – PC+4 -> PC – PC+4 -> IF/ID – PC+4 -> IF/ID • ID • ID – Reg[IF/ID.rs] -> ID/EX – Reg[IF/ID.rs] -> ID/EX – Reg[IF/ID.rt] -> ID/EX – Reg[IF/ID.rt] -> ID/EX – – IF/ID.Sign-extended 32bits -> IF/ID.Sign-extended 32bits -> – – IF/ID.Sign-extended 32bits -> IF/ID.Sign-extended 32bits -> ID/EX ID/EX – IF/ID.pc+4 -> ID/EX – IF/ID.pc+4 -> ID/EX • EX • EX – mem-addr -> EX/MEM – mem-addr -> EX/MEM • MEM • MEM – Reg[rt] -> MEM[EX/MEM. – mem-data= MEM[EX/MEM. mem-addr] mem-addr] -> MEM/WB • WB • WB – Do nothing MEM/WB.mem-data -> Reg[ rt ] – ID/EX.Reg[rt] -> EX/MEM IF/ID.rt -> ID/EX -> EX/MEM -> 18 MEM/WB 2004 Morgan Kaufmann Publishers
Corrected Datapath ID/EX.Reg[rt] -> EX/MEM for sw IF/ID.rt -> ID/EX -> EX/MEM -> MEM/WB for lw 19 2004 Morgan Kaufmann Publishers
Graphically Representing Pipelines Time (in clock cycles) Program CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC7 execution order (in instructions) Reg ALU Reg IM DM lw $1, 100($0) Reg ALU Reg lw $2, 200($0) IM DM lw $3, 300($0) Reg ALU Reg IM DM • Can help with answering questions like: – how many cycles does it take to execute this code? – what is the ALU doing during cycle 4? – use this representation to help understand datapaths 20 2004 Morgan Kaufmann Publishers
Recommend
More recommend