cs104 computer organization and design
play

CS104 Computer Organization and Design Datapaths CS104 (Hilton): - PowerPoint PPT Presentation

CS104 Computer Organization and Design Datapaths CS104 (Hilton): Datapaths [Slides adapted from A. Roths] 1 Admin Homework Homework 4 out tonight Due Monday March 26 th Download/check your submissions Reading: Chapter


  1. Micro-architectural factors • Micro-architecture: • The details of how the ISA is implemented • Affects CPI and Clock frequency • Often will look at fixed program, and consider MIPS • Million Instructions Per Second • MIPS = IPC * Frequency (in MHz) • IPC = Instruction Per Cycle (1 / CPI) • Gives “Bigger is better” number Instructions Cycles Instructions ————— x ————— = —————— Cycle Second Second (IPC) (Frequency) (Throughput) CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 30

  2. “Best” IPC • For now, best we can do: IPC = 1 (CPI = 1) • Do 1 instruction every cycle • Later: • Real processors can do multiple instructions at once! • Potentially: IPC < 1! • Best possible IPC depends on design CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 31

  3. Performance vs …. • 1990s: Performance at all cost • Actually more “clock frequency” at all cost… • Now: Care about other things • Energy (electric bill, battery life) • Power (cooling, also affects energy) • Area (chip cost) • Reliability (tolerance of transient faults: e.g., charge particle strikes) • … • Important metric these days “Performance / Watt” • Throughput divided by power consumption • Why? CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 32

  4. Performance Modeling and Analysis • Speaking of performance • Making a processor takes time (years) and money (millions) • Want to know it will perform well before you finish • If its wrong, doing it all over is painful… • Performance can be simulated in software • Estimate what IPC will be • Guide design • This is my other job by the way… CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 33

  5. Single-Cycle Datapath Performance + 4 a P Insn Register Data C Mem File Mem d s1 s2 d S X Control ROM/random logic • Goes against make common case fast (MCCF) principle + Low Cycles Per Instruction ( CPI ): 1 – Long clock period: to accommodate slowest insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 34

  6. Alternative: Multi-Cycle Datapath s3 << + 2 4 A P Insn I Register O D a C Mem R File Data B s5 s1 s2 d Mem s3 d s5 s5 S X s4 s3 • Multi-cycle datapath : attacks high clock period • Cut datapath into multiple stages (5 here), isolate using FFs • FSM control “walks” insns thru stages (by staging control signals) + Insns can bypass stages and exit early CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 35

  7. Finite State Machine (FSM) • FSM = States + Transitions • Next state: function of current state + inputs • Outputs: function of current state + inputs • Canonical Example: Combination Lock • Must enter 3 8 4 to unlock • P.S. Useful in software too CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 36

  8. Finite State Machines: Example Start • Combination Lock Example: • Need to enter 3 8 4 to unlock • Initial State: no valid piece of combo seen CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 37

  9. Finite State Machines: Example 3 Start 1 0-2,4-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • Input of 3: transition to new state • Any other input: stay in same state CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 38

  10. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 2 0-2,4-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 1: • Input = 8? Goto state 2 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 39

  11. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 4 3 2 3 0-2,4-9 0-2,5-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 2: • Input = 4? Goto state 3 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 40

  12. Finite State Machines: Example 3 3 Start 1 8 0-2,4-7,9 4 3 2 3 0-2,4-9 0-2,5-9 • Combination Lock Example: • Need to enter 3 8 4 to unlock • State 3: Unlock! CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 41

  13. FSM in Hardware • Flip flop (s) to hold state (s) • Combinatorial logic to determine next state/output • (Assumes FF enable on input_valid) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 42

  14. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 43

  15. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 44

  16. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 45

  17. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 46

  18. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 47

  19. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 48

  20. FSM Hardware Example CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 49

  21. FSM Implementation: ROM K-bit input Inputs N-bit state M-bit output 2 (N+K) Entry ROM K N + K N M N Register Outputs • Just saw: FSM implemented with sum-of-products • Remind us what that is? • Can also be implemented with a ROM CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 50

  22. FSM ROM Implementation Example • Combination Lock (3 8 4) Example • 4-bit input • 2-bit state • 64-entry ROM (indexed with S 1 S 0 I 3 I 2 I 1 I 0 ) • Each entry needs 3 bits (S 1 S 0 U) • 2 for next state • 1 for unlock signal • Example entries in ROM • 0x00 = 000 • 0x03 = 010 • 0x18 = 100 • 0x13 = 010 • 0x3_ = 001 CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 51

  23. Multi-cycle Datapath FSM Decode Insn Next Insn • First state: Get a New Instruction • Output signals to fetch (e.g., read enable IMEM) • Next State: Always Decode CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 52

  24. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP • Second State: Decode • Output signals to decode instruction (RdEn RegFile) • Go to Next Insn if NOP • Otherwise Execute CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 53

  25. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Branches: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 54

  26. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch ALU Writeback • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • ALU op: write register CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 55

  27. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Load ALU Read Writeback DMEM • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Load: Read Memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 56

  28. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Execute State • Execute Insn (varies by insn type) • Next State: Also depends on insn type • Store: Write Memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 57

  29. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Read DMEM State • Control signals enable DMEM Read • Next state is writeback CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 58

  30. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Writeback state • Control signals enable regfile write • Next state: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 59

  31. Multi-cycle Datapath FSM Decode Insn Next Execute Insn Insn NOP Branch Store Load Write ALU DMEM Read Writeback DMEM • Write DMEM state • Control signals enable memory write • Next state: Next Insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 60

  32. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 61

  33. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 62

  34. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF • Cycle 3: ALU CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 63

  35. Multi-Cycle Datapath Example: Add << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Example: Add • Cycle 1: Read IMEM • Cycle 2: Decode + Read RF • Cycle 3: ALU • Cycle 4: Writeback + Increment PC CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 64

  36. Multi-Cycle Datapath Performance << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Opposite performance split of single-cycle datapath + Short clock period – High CPI CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 65

  37. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores, 20% branches, 45% ALU CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 66

  38. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads , 15% stores, 20% branches, 45% ALU CPI= 0.20 * 5 + CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 67

  39. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores , 20% branches, 45% ALU CPI= 0.20 * 5 + 0.15 * 4 + CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 68

  40. Multi-cycle Data-path CPI • CPI depends on instructions • Branches / Jumps: 3 cycles • ALU: 4 cycles • Stores: 4 cycles • Loads: 5 cycles • Overall CPI is weighted average • Example: • 20% loads, 15% stores, 20% branches, 45% ALU CPI= 0.20 * 5 + 0.15 * 4 + 0.20 * 3 + 0.45 * 4 = 4.0 CS104 (Hilton) : Datapaths [Adapted from slides by A. Roth] 69

  41. Multi-cycle Datapath Performance • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50 ns/insn • Multi-cycle • Clock period = 10ns • CPI = (0.2*3+0.2*5+0.6*4) = 4 • Performance = 40 ns/insn • But wait… CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 70

  42. Multi-Cycle Datapath Performance << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X • Did not just cut up existing logic into 5 pieces • Also added logic (flip flops) • So clock period not 1/5 of single cycle, but slightly longer CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 71

  43. Multi-cycle Datapath Performance • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50 ns/insn • Multi-cycle • Clock period = 12ns • CPI = (0.2*3+0.2*5+0.6*4) = 4 • Performance = 48 ns/insn • Better, but not as exciting… • Can we do better still? • Have our cake (low CPI) and eat it too (high clock frequency)? CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 72

  44. Clock Period and CPI • Single-cycle datapath + Low CPI: 1 – Long clock period: to accommodate slowest insn insn0.fetch, dec, exec insn1.fetch, dec, exec • Multi-cycle datapath + Short clock period – High CPI insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • Can we have both low CPI and short clock period? – No good way to make a single insn go faster + Insn latency doesn’t matter anyway … insn throughput matters • Key: exploit inter-insn parallelism CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 73

  45. Pipelining • Pipelining : important performance technique • Improves insn throughput rather than insn latency • Exploits parallelism at insn-stage level to do so • Begin with multi-cycle design insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • When insn advances from stage 1 to 2, next insn enters stage 1 insn0.fetch insn0.dec insn0.exec insn1.fetch insn1.dec insn1.exec • Individual insns take same number of stages + But insns enter and leave at a much faster rate • Physically breaks “atomic” VN loop ... but must maintain illusion • Automotive assembly line analogy CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 74

  46. 5 Stage Multi-Cycle Datapath << + 2 4 A P Insn I Register O D a C Mem R File Data B s1 s2 d Mem d S X CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 75

  47. 5 Stage Pipelined Datapath PC PC << + 2 4 A O Insn Register PC a Mem File O D Data B s1 s2 d Mem d B S X IR IR IR IR • Temporary values (PC,IR,A,B,O,D) re-latched every stage • Why? 5 insns may be in pipeline at once, they share a single PC? • Notice, PC not latched after ALU stage (why not?) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 76

  48. Pipeline Terminology PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR • Stages: F etch, D ecode, e X ecute, M emory, W riteback • Latches (pipeline registers): PC , F/D , D/X , X/M , M/W CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 77

  49. Some More Terminology • Scalar pipeline : one insn per stage per cycle • Alternative: “superscalar” (next unit) • In-order pipeline : insns enter execute stage in VN order • Alternative: “out-of-order” (not covered in CSE 371) • Pipeline depth : number of pipeline stages • Nothing magical about five • Trend has been to deeper pipelines CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 78

  50. Pipeline Example: Cycle 1 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR add $3,$2,$1 • 3 instructions CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 79

  51. Pipeline Example: Cycle 2 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR lw $4,0($5) add $3,$2,$1 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 80

  52. Pipeline Example: Cycle 3 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add $3,$2,$1 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 81

  53. Pipeline Example: Cycle 4 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add $3,$2,$1 • 3 instructions CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 82

  54. Pipeline Example: Cycle 5 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4($7) lw $4,0($5) add CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 83

  55. Pipeline Example: Cycle 6 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,4(7) lw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 84

  56. Pipeline Example: Cycle 7 PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 85

  57. Pipeline Diagram • Pipeline diagram : shorthand for what we just saw • Across: cycles • Down: insns • Convention: X means lw $4,0($5) finishes execute stage and writes into X/M latch at end of cycle 4 1 2 3 4 5 6 7 8 9 F D X M W add $3,$2,$1 F D X M W lw $4,0($5) F D X M W sw $6,4($7) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 86

  58. What About Pipelined Control? • Should it be like single-cycle control? • But individual insn signals must be staged • Should it be like multi-cycle control? • But all stages are simultaneously active • How many different controllers are we going to need? • One for each insn in pipeline? • Solution: use simple single-cycle control, but pipeline it • Single controller CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 87

  59. Pipelined Control PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR xC mC wC CTRL mC wC wC CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 88

  60. Pipeline Performance Calculation • Single-cycle • Clock period = 50ns, CPI = 1 • Performace = 50ns/insn • Multi-cycle • Branch: 20% (3 cycles), load: 20% (5 cycles), other: 60% (4 cycles) • Clock period = 12ns , CPI = (0.2*3+0.2*5+0.6*4) = 4 • Remember: latching overhead makes it 12, not 10 • Performance = 48ns/insn • Pipelined • Clock period = 12ns • CPI = 1.5 (on average insn completes every 1.5 cycles) • Performance = 18ns/insn CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 89

  61. Q1: Why Is Pipeline Clock Period … • … > delay thru datapath / number of pipeline stages? • Latches (FFs) add delay • Pipeline stages have different delays, clock period is max delay • Both factors have implications for ideal number pipeline stages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 90

  62. Q2: Why Is Pipeline CPI… • … > 1? • CPI for scalar in-order pipeline is 1 + stall penalties • Stalls used to resolve hazards • Hazard : condition that jeopardizes VN illusion • Stall : artificial pipeline delay introduced to restore VN illusion • Calculating pipeline CPI • Frequency of stall * stall cycles • Penalties add (stalls generally don’t overlap in in-order pipelines) • 1 + stall-freq 1 *stall-cyc 1 + stall-freq 2 *stall-cyc 2 + … • Correctness/performance/MCCF • Long penalties OK if they happen rarely, e.g., 1 + 0.01 * 10 = 1.1 • Stalls also have implications for ideal number of pipeline stages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 91

  63. Dependences and Hazards • Dependence : relationship between two insns • Data : two insns use same storage location • Control : one insn affects whether another executes at all • Not a bad thing, programs would be boring without them • Enforced by making older insn go before younger one • Happens naturally in single-/multi-cycle designs • But not in a pipeline • Hazard : dependence & possibility of wrong insn order • Effects of wrong insn order cannot be externally visible • Stall : for order by keeping younger insn in same stage • Hazards are a bad thing: stalls reduce performance CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 92

  64. Why Does Every Insn Take 5 Cycles? PC PC << + 2 4 A O Insn Register PC a Mem File O D Data PC B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR add $3,$2,$1 lw $4,0($5) • Could /should we allow add to skip M and go to W? No – It wouldn’t help: peak fetch still only 1 insn per cycle – Structural hazards : imagine add follows lw CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 93

  65. Structural Hazards • Structural hazards • Two insns trying to use same circuit at same time • E.g., structural hazard on regfile write port • To fix structural hazards : proper ISA/pipeline design • Each insn uses every structure exactly once • For at most one cycle • Always at same stage relative to F CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 94

  66. Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $6,0($7) lw $4,0($5) add $3,$2,$1 • Let’s forget about branches and the control for a while • The three insn sequence we saw earlier executed fine… • But it wasn’t a real program • Real programs have data dependences • They pass values via registers and memory CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 95

  67. Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR sw $3,0($7) addi $6,1,$3 lw $4,0($3) add $3,$2,$1 • Would this “program” execute correctly on this pipeline? • Which insns would execute with correct inputs? • add is writing its result into $3 in current cycle – lw read $3 2 cycles ago → got wrong value – addi read $3 1 cycle ago → got wrong value • sw is reading $3 this cycle → OK (regfile timing: write first half) CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 96

  68. Memory Data Hazards A O Register O D a File Data B s1 s2 d Mem d B S F/D D/X X/M M/W X IR IR IR IR lw $4,0($1) sw $5,0($1) • What about data hazards through memory? No • lw following sw to same address in next cycle, gets right value • Why? DMem read/write take place in same stage • Data hazards through registers? Yes (previous slide) • Occur because register write is 3 stages after register read • Can only read a register value 3 cycles after writing it CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 97

  69. Fixing Register Data Hazards • Can only read register value 3 cycles after writing it • One way to enforce this: make sure programs don’t do it • Compiler puts two independent insns between write/read insn pair • If they aren’t there already • Independent means: “do not interfere with register in question” • Do not write it: otherwise meaning of program changes • Do not read it: otherwise create new data hazard • Code scheduling : compiler moves around existing insns to do this • If none can be found, must use nops • This is called software interlocks • MIPS : M icroprocessor w/out I nterlocking P ipeline S tages CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 98

  70. Software Interlock Example add $3,$2,$1 lw $4,0($3) sw $7,0($3) add $6,$2,$8 addi $3,$5,4 • Can any of last three insns be scheduled between first two • sw $7,0($3) ? No, creates hazard with add $3,$2,$1 • add $6,$2,$8 ? OK • addi $3,$5,4? No, lw would read $3 from it • Still need one more insn, use nop add $3,$2,$1 add $6,$2,$8 nop lw $4,0($3) sw $7,0($3) addi $3,$5,4 CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 99

  71. Software Interlock Performance • Same deal • Branch: 20%, load: 20%, store: 10%, other: 50% • Software interlocks • 20% of insns require insertion of 1 nop • 5% of insns require insertion of 2 nops • CPI is still 1 technically • But now there are more insns • #insns = 1 + 0.20*1 + 0.05*2 = 1.3 – 30% more insns (30% slowdown) due to data hazards CS104 (Hilton): Datapaths [Slides adapted from A. Roth’s] 100

Recommend


More recommend