1
play

1 Revise Scheduling* Revise Pipeline Stages RS1: ADD R6,R2,R4 - PDF document

Tomasulo Performance Observe at the EX IM stage, how many Lecture 8: Modern Dynamic Instruction Fetch Unit cycles to execute Scheduling this code? Reorder Decode Rename Regfile Buffer LW R2,45(R3) Tomasulo weakness, data forwarding,


  1. Tomasulo Performance Observe at the EX IM stage, how many Lecture 8: Modern Dynamic Instruction Fetch Unit cycles to execute Scheduling this code? Reorder Decode Rename Regfile Buffer LW R2,45(R3) Tomasulo weakness, data forwarding, ADD R6,R2,R4 reg mapping table, generic superscalar models, examples SUB R10,R0,R6 S-buf L-buf RS RS ADD R10,R10,R12 DM FU1 FU2 Assume load takes 1 cycle, ALU 1 cycle 1 2 Tomasulo vs MIPS Pipeline Tomasulo Complexity and Efficiency Modern processors IM IF employ deep pipeline How many cycles on Fetch Unit ID the 5-stage MIPS => Can the rename pipeline? stage be finished in EX Reorder Decode Rename Regfile Buffer one fast cycle? MEM Why does the simple WB pipeline run faster? => How are register S-buf L-buf RS RS content storages? DM Stall check FU1 FU2 Data forwarding 3 4 Review Tomasulo Inst Scheduling Review Data Forwarding Both in RS, no contention on CDB or FU MIPS pipeline data But tag broadcasting has forwarding: one cycle delay!! FU/MEM => FU ADD R2 ,R2,45 # R2=>tag p, result = A Why not in Tomasulo? When is it known that A SUB R6, R2 ,R4 # R4 is ready, = B will be ready? REG/ROB Cycle 1: ADD starts at FU, producing A Cycle 1: A is to be ready Cycle 2: ADD broadcast p + A Cycle 2: A and its tag are FU SUB matches on p and accepts A broadcast Cycle 3: SUB starts execution, FU calc A-B bypass If tag is broadcast one- ROB cycle earlier … A is produced at cycle 1, but consumed at Cycle 2: forward A from FU output to FU cycle 3 -- unavoidable? input… 5 6 1

  2. Revise Scheduling* Revise Pipeline Stages RS1: ADD R6,R2,R4 RS2: SUB R10,R0,R6 FETCH FETCH RS3: ADD R12,R10,R6 RS RS RS RS RS 1 2 3 4 5 ADD(1) has been ready and selected RENAME ISSUE 1. - ADD(1)’s tag is broadcast, and operands are sent to FU; - SUB is waken up and selected; REG/ROB Rd EXE 2. - SUB’s tag is broadcast, SELECT operands are sent to FU; SCHEDULE - forwarding logic replace 2 nd FU WB operand with FU output; EXE - ADD(2) is waken up and accepts FU output, and is COMMIT selected FU WB 3. So on and so forth… ISSUE: decode, rename, COMMIT RS can be centralized or distributed allocate RS and ROB, and read REG/ROB One cycle earlier EX: Wakeup and select inst, *Updated How to address CDB contention? then fu-execute 7 8 Examples: Intel P6 Rethink RS and ROB design … Data broadcasting to However, RS stations: Decode Data forwarding can Broadcasting saves be used Decode reg-write to reg- Not all n child read delay Rename instructions may fu- n child instructions execute next cycle ROB Rd can receive data RS and ROB may … simultaneously store duplicate • 40-entry ROB values • 20-entry RS station • Register Alias Table 9 10 Physical Register Register Mapping Approach Rename architectural r a r b r c p c Physical register register to physical p1 register RS entry Mapping Table p2 NO real architectural op Qj Qk busy Vj Vk p a p b p3 registers (now virtual p1 register) ROB entry alloc p2 p a RS => issue queue i-type dest PC valid result p3 p b Rename stage: allocate issue queue entry, p_n allocate ROB, allocate physical register Physical register: collection of all temporary free p_n What is tag now? register contents list val a val b 11 12 2

  3. Mis-speculation Recovery Change of pipeline RS+ROB: no changes to Committed FETCH IM arch. registers, so just mapping p1 clear pipeline and re-fetch Fetch Unit Fundamental issue: p2 RENAME software does not see p3 mapping 1 wrong register contents Decode Rename SCHEDULE ROB ROB issue queue REG Recovery for mapping mapping 2 approach: Roll back phy. regfile mapping table to the mis- EXE p_n speculation point WB mapping table Architectural registers status S-buf L-buf FU1 FU2 => virtual registers COMMIT How to implement mapping DM table supporting recovery? 13 14 Example: Intel Pentium 4 Alpha 21264 Pipeline Alloc Rename Rename Queue Schd Schd Schd Disp Disp Reg Reg Ex 128 entries 15 16 Summary of Dynamic Scheduling Generic Superscalar Processor Models Pipeline stages CDC6600: introduces Issue queue based scoreboarding Renaming (in-order) � Schedule � FU Schedule Rename Wakeup D-cache Commit (in-order) Tomasulo: introduces Regfile bypass select � Fetch renaming and tag broadcasting Two organizations FU commit Mapping table + phy reg + � issue queue + ROB; Reorder buffer: provides in- REN => SCHD => REG execute order commit Reservation based Reg alias table + RS + ROB, � reg in RS and ROB; Real OOO processors Reg REN => REG => SCHD FU Schedule Rename D-cache very complicated (like a bypass Fetch � vehicle) Scheduling methods bring impl variants ROB Tag broadcasting vs. � Wakeup � but all root in those basic FU scoreboarding (later) commit select � designs execute Source: Paracharla PhD thesis 1998 17 18 3

Recommend


More recommend