basic pipelining wrap up
play

Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) - PowerPoint PPT Presentation

Slide Set #21: Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) Chapter 6 and beyond 1 2 Pipelining Big Picture Remember the single-cycle implementation Improve performance by increasing instruction throughput


  1. Slide Set #21: Basic Pipelining Wrap-up Exploiting ILP (from Slide Set 20) Chapter 6 and beyond 1 2 Pipelining Big Picture • • Remember the single-cycle implementation Improve performance by increasing instruction throughput – Inefficient because low utilization of hardware resources Program 200 400 600 800 1200 1400 1000 1600 1800 execution – Each instruction takes one long cycle Time order (in instructions) • Two possible ways to improve on this: Instruction Data lw $1, 100($0) R e g A L U R e g fetch acc es s Multicycle Pipelined Instruction Data lw $2, 200($0) 800 ps R e g A L U R e g fetch a c ces s Clock cycle time Instruction lw $3, 300($0) 800 ps fetch (vs. single cycle) 800 ps Amount of hardware used Program (vs. single cycle) 200 400 600 800 1000 1200 1400 execution Time order (in instructions) Split instruction into multiple stages Instruction Data (1 per cycle)? R e g A L U R e g lw $1, 100($0) fetch acc e ss Instruction Data lw $2, 200($0) R e g A L U R e g 200 ps fetch a c ces s Each stage has its own set of Instruction Data lw $3, 300($0) R e g A L U R e g hardware? 200 ps fetch ac c ess 200 ps 200 ps 200 ps 200 ps 200 ps How many instructions executing at Ideal speedup is number of stages in the pipeline. Do we achieve this? once? 3 4

  2. Exploiting More ILP • ILP = __________________ _________________ ________________ (parallelism within a single program) • How can we exploit more ILP? 1. ________________________ Pipelining and Beyond (Split execution into many stages) 2. ___________________________ (Start executing more than one instruction each cycle) 5 6 Example – Multiple Issue Multiple Issue Processors Key metric: CPI � � � IPC � • How many cycles does it take for this code to execute on a 2-issue CPU? • Key questions: add $t0, $t1, $t2 1. What set of instructions can be issued together? lw $s1, 0($s2) add $t0, $t0, $t4 sw $s1, 0($s3) 2. Who decides which instructions to issue together? Answer? – Static multiple issue – Dynamic multiple issue 7 8

  3. Multiple Issue Processors Example – MIPS Static Multiple Issue • What extra hardware do we need to do Static Multiple Issue? • What else for Dynamic Multiple Issue? 9 10 Example – Dynamic Multiple Issue Scheduling Exercise #1 Assume you must execute the following instructions in order. In any one cycle you can issue at most one integer op and one load or store. Show Instruction fetch� the resultant pipeline diagram. What’s the total number of cycles? In-order issue and decode unit If you can’t issue an instruction on a certain cycle, wait for the next cycle. lw $t0, 0($s2) Reservation� Reservation� Reservation� Reservation� ... station station station station sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 Functional� Floating� Load/� Out-of-order execute units Integer Integer ... point Store add $a0, $a0, $a3 Commit� In-order commit unit 11 12

  4. Exercise #2 Exercise #3: Static vs. Dynamic Multiple Issue Use same assumptions as with Exercise #1, but first schedule the code to try and eliminate stalls. Show the new pipeline diagram and total • Which do you think has been commercially successful – static or number of cycles. dynamic issue? Why? lw $t0, 0($s2) sub $s1, $t0, $s3 lw $t2, 0($s2) add $a0, $a1, $a2 add $a0, $a0, $a3 13 14 Exercise #4 Ideas for improving Multiple Issue • Look ahead at the slide for Idea #4 – loop unrolling. What is the 1. Non-blocking caches possible bug? 2. Speculation 3. Register renaming 4. Loop unrolling 15 16

  5. Idea #3: Register renaming Idea #4: Loop unrolling lw $t0, 0($s0) sw $t0, 4($s0) Loop: lw $t0, 0($s1) Loop: lw $t0, 0($s1) sw $t0, 0($s2) lw $t1, 4($s1) lw $t0, 0($s2) addi $s1, $s1, -4 lw $t2, 8($s1) sw $t0, 4($s2) addi $s2, $s2, -4 lw $t3,12($s1) bne $s1, $zero,Loop sw $t0, 0($s2) Problem? sw $t1, 4($s2) sw $t2, 8($s2) sw $t3,12($s2) addi $s1, $s1, -16 addi $s2, $s2, -16 Solution? bne $s1, $zero,Loop Why is this a good idea? 17 18 Chapter 6 Summary Multiple issue Deeply pipelined with deep pipeline (Section 6.10) Multiple-issue Multicycle Pipelined pipelined (Section 5.5) (Section 6.9) Single-cycle (Section 5.4) Slower Faster 19 Instructions per clock (IPC = 1/CPI)

Recommend


More recommend