ee 457 unit 9b
play

EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of - PowerPoint PPT Presentation

1 EE 457 Unit 9b In-Order Completion Speculation 2 Credits Some of the material in this presentation is taken from: Computer Architecture: A Quantitative Approach John Hennessy & David Patterson Some of the material in this


  1. 1 EE 457 Unit 9b In-Order Completion Speculation

  2. 2 Credits • Some of the material in this presentation is taken from: – Computer Architecture: A Quantitative Approach • John Hennessy & David Patterson • Some of the material in this presentation is derived from course notes and slides from – Prof. Michel Dubois (USC) – Prof. Murali Annavaram (USC) – Prof. David Patterson (UC Berkeley)

  3. 3 Tomasulo w/ Speculative Execution • In-order Issue • Out-of-Order Execution • In-order Completion – Completion = Commit = Graduation

  4. 4 OoO Execution w/ ROB • ROB allows for OoO execution but in-order completion Assume: SW always D-Cache I-Cache hits in cache 1 mult 2 add ROB Consider this sequence: 3 lw Reg. File Instruc. (Reorder (Assume mult takes 4 sub Queue several cycles) Buffer) Br. Pred. mult $5,$6,$7 Buffer Dispatch add $2,$3,$4 lw $8,0($5) sub $9,$0,$2 Mult. Queue L/S Queue Int. Queue Div Queue Addr. Buffer Simplification for EE457: Issue Cache miss can occur for Unit LW only but SW always hits (without this Exec. Unit Integer / simplification we need to Div Mul D-Cache cover store buffer design Branch and related issues) L/S Buffer CDB

  5. 5 OoO Execution w/ ROB • ROB allows for OoO execution but in-order completion D-Cache I-Cache 1 mult Current Head 2 Completed Reg. File Instruc. 3 lw 4 Completed Current Tail Queue mult $5,$6,$7 ROB add $2,$3,$4 Br. Pred. lw $8,0($5) Buffer Dispatch sub $9,$0,$2 Mult. Queue L/S Queue Int. Queue Div Queue ROB entry is allocated on dispatch. Addr. lw mult Buffer Issue When an instruction executes, its result is Unit stored in ROB then Exec. Unit committed to register Integer / file when it reaches the Div Mul D-Cache Branch head of the ROB (in- order completion) L/S Buffer CDB

  6. 6 Re-Order Buffer (ROB) • ROB is a FIFO (let’s say 32 locations) Valid Rd RegWrite Result – WP = Write pointer = Used by Dispatch Unit 0 0 0 1 • Each instruction issues in order and “takes a 1 0 $2 1 2 0 0 0 number” Top (rp) 3 1 $1 1 – RP = Read pointer = Used for committing 4 1 $2 1 the most senior / oldest instruction when it 5 1 $15 1 has completed without generating an 6 1 $2 1 exception 7 1 $12 1 8 1 $2 0 9 1 $7 0 10 0 $13 1 Bottom (wp) 11 0 0 1 12 0 $4 0 13 0 $2 1 The RP 14 0 0 1 1. WP – RP = number of items The WP in the FIFO (depth) 2. It is a circular FIFO/buffer

  7. 7 Dispatch and the ROB • No more token FIFO (for tagging instructions) as in OoO execution and completion – ROB entry is allocated for an instruction on issue/dispatch – When instruction finishes executing its result is buffered in the ROB entry until it can be committed safely • It does not (and cannot) use the RST (Register Status Table) as before – When an instruction is dispatched, the ROB is searched for its source register (Rs and/or Rt) producers • If an entry in the ROB is producing Rs/Rt but has not yet executed the ROB tag/slot of the producer is taken with the dependent instruction • If an entry in the ROB is producing Rs/Rt and the result is there waiting to be committed, that value is taken with the dependent instruction • If no entry in the ROB is producing Rs/Rt, data in the register file is taken with the dependent instruction • Since multiple entries in the ROB may match Rs/Rt a priority resolver is necessary

  8. 8 Take a Number vs. Take a Token • ROB forms a virtual queue! • ROB Tag = Paper token taken by the customer – Recall that we wrap back to 0 after the maximum tag number Helps to create a In State Bank of India, the cashier issues virtual queue. brass token to customers trying to draw money as an ID (and not at all to put them in any virtual queue / ordering). Token numbers are in random order. The cashier verifies the signature in the record rooms, returns with money, calls the token number and issues the money. Tokens are reclaimed & reused.

  9. 9 Example 1 Solutions • Case 1 Assume now serving customer 52 – Your number is 55 and mine is 65 – I am 10 numbers ( after / before ) you. • Case 2 – Your number is 55 and mine is 45 – I am 90 numbers ( after / before ) you.

  10. 10 Computing Distance • To find how many people are waiting subtract Assume now serving customer 52 the “Now Serving” number from the last number pulled • Example – Last number pulled = 92 – “Now Serving” = 52 – # Waiting = 40 • But suppose the last number pulled is 32 – Last number pulled = 32 – “Now Serving” = 52 – # Waiting = (-20) mod 100 = 80 mod 100!

  11. 11 Computing Distance • Depth = (WP – RP) mod 8 FIFO Initially Empty FIFO Depth = 4 D = WP-RP = 0-0 = 0 D = WP-RP = 4-0 = 0 WP 6 5 6 5 7 4 7 4 0 3 0 3 1 2 1 2 RP RP WP FIFO Depth = 1 FIFO Depth = 7 D = WP-RP = 4-3 = 1 D = WP-RP = (2-3) mod 8 = 7 WP 6 5 6 5 7 4 7 4 0 3 0 3 1 2 1 2 RP RP WP

  12. 12 ROB Dispatch for Rs • $2 is needed by dispatch • Which entry should be selected by you (the ROB)? Scenario 0 Scenario 1 Valid Rd RegWrite Valid Rd RegWrite 0 0 0 1 0 1 0 1 1 0 $2 1 1 1 $2 1 2 0 0 0 2 1 $10 1 Bottom (wp) Top (rp) 3 1 $1 1 3 0 $1 0 4 1 $2 1 4 0 $21 1 5 1 $15 1 5 0 $12 1 6 1 $2 1 6 0 $2 0 7 1 $12 1 7 0 $15 1 8 1 $2 0 8 0 $22 1 Top (rp) 9 1 $7 0 9 1 $7 1 10 0 $13 1 10 1 $13 0 Bottom (wp) 11 0 0 1 11 1 $2 1 12 0 $4 0 12 1 $1 1 13 0 $2 1 13 1 $2 0 14 0 0 1 14 1 $3 1

  13. 13 Dealing with Wrapping Scenario 0 Scenario 1 Set 1 0 0 Top Pointer (rp) 1 1 Bottom Pointer (wp) 2 2 Set 1 3 3 4 4 Set 0 Bottom Pointer (wp) Top Pointer (rp) Set 0 30 30 31 31 In each scenario, which set should be given higher priority of selection to forward the value of a particular register?

  14. 14 ROB Dispatch for Rs Similar logic for Rt Rd, RdTag, Instruction Resolve highest priority match of Rd to Valid, Instruction Rs for all valid instructions between completed, RdData Top Pointer and Last ROB entry (i.e. entry 31) 0 = Priority Resolver (Pass 1 Rs Data Valid = Highest Rs Data Priority Rs Tag Valid Priority Rs Tag Resolver Active Input) 2 (Pass = Highest ROB Priority Priority Active Input) Resolver = 30 Selects appropriate entry (Pass based on Top and Bottom Highest Pointer locations Priority 31 = Active Input) Resolve highest priority match of Rd to rs Rs for all valid instructions between Top ROB entry (i.e. entry 0) and Bottom Pointer

  15. 15 Issue Queues From Dispatch From Controller Dispatch Unit always places instruction in top Reg. register Instruction(s) move forward if there is room at the bottom To Issue Unit Controller Reg. Any instruction is a candidate for Reg. execution provided it is "ready" Choose the senior-most

  16. 16 SPECULATIVE EXECUTION

  17. 17 Branch Prediction + Speculation • To keep the backend fed with enough work we need to predict a branch's outcome and perform "speculative" execution beyond the predicted (unresolved) branch – Roll back mechanism (flush) in case of misprediction Head of ROB Basic Block NT-path T-path Conditional branches Speculative Execution Path

  18. 18 Speculation Example Basic Block • Predict branches and ROB Head (Assume stall) execute most likely path T NT Basic Block Basic Block – Simply flush ROB entries Correct Path after the mispredicted branch T NT Basic Block Basic Block – Need good prediction Spec. Path capabilities to make this useful Head Head Head Commit Unit Commit Unit Commit Unit Commit Unit Wrong-Path Execution Time 1: ROB Time 2a: ROB Time 2b: Time 3: ROB Red Entries = Predicted Black Entry = Mispredicted Flush ROB/Pipeline of Pipeline begins to fill w/ Branches branch instructions behind it correct path

  19. 19 Handling Jumps and Branches • IFQ is flushed every time a jump instruction enters the dispatch unit • When a branch enters the dispatch unit, branch prediction is performed using the BPB (Branch Prediction Buffer) – Last n (e.g. 3) bits of PC are used by the branch predictor – Branches are handled aggressively • Executed as soon as they arrive on the CDB without waiting for instruction to become the head of the ROB so as to determine if prediction was correct and take appropriate action • Selective flushing mechanism is used to flush instructions in backend in case of mispredicted branch

  20. 20 Flushing Mechanism • In order to flush instructions in the backend a 'flush' signal along with the following are conveyed to the backend – Current Top of ROB – Depth of the Branch Instruction • All instructions in the backend (as well as the ROB) with depth greater than the successful branch need to leave (be flushed) 0 0 Top Pointer (rp) 1 1 Taken Branch 2 2 3 3 Taken Branch WP 4 4 Top Pointer (rp) 5 5 WP 30 30 31 31 Flush Depth = 2 = (4-2) Flush Depth = 29 = (2-5) mod 32

Recommend


More recommend