example task doing a load of laundry w ash d ry f old
play

Example Task: Doing a load of laundry W ash, D ry, F old Each - PowerPoint PPT Presentation

Example Task: Doing a load of laundry W ash, D ry, F old Each laundry load takes T hours Completing n tasks requires nT hours T 2T 3T 4T 5T 6T 7T 8T


  1. Example • Task: Doing a load of laundry – W ash, D ry, F old – Each laundry load takes T hours • Completing n tasks requires nT hours T 2T 3T 4T 5T 6T 7T 8T 9T WDF WDF WDF WDF WDF WDF WDF WDF WDF 1

  2. Parallel Processing • M independent machines – W ash, D ry, F old – Do M laundry loads concurrently • Completing n tasks takes T x ceiling(n/M) hours T 2T 3T WDF WDF Requires M units to achieve speedup WDF WDF WDF WDF WDF WDF WDF 2

  3. Pipelining • Divide each task into component microtasks • Each microtask requires unit time ( same for all microtasks) – One microtask performed per stage • With p stages per task: n tasks require np time units. W 1 D 1 F 1 W 2 D 2 F 2 W 3 D 3 F 3 1 2 3 4 5 6 7 8 9 5

  4. Pipelining Task i begins immediately after task i- 1 completes its first stage p -stage pipeline: task n completes at time step n + p - 1 (vs. n x p sequential ) In steady-state p tasks concurrently active Pipeline Fill: 1-2 W 1 D 1 F 1 Steady State: 3-4 Pipeline Flush: 5-6 W 2 D 2 F 2 W 3 D 3 F 3 W 4 D 4 F 4 1 2 3 4 5 6 6

  5. Pipelining Task i begins immediately after task i- 1 completes its first stage For a p -stage pipeline: task n completes at time step n + p - 1 In steady-state p tasks concurrently active Latency of a task = p time steps (Not changed by Pipelining) 1 2 3 4 5 6 7 8 9 W D F W D F W D F W D F W D F W D F W D F By reducing time for n tasks from np to n+p-1 Increases task throughput 7

  6. Latency and Throughput • Latency of a task: – Time elapsed between start and finish of the task Assume that W, D, F take 1 hour each In all designs latency is the same (3 hours) • Throughput : Number of tasks completed per unit time – Non-pipelined design : T = p time units per task: Throughput = 1/p 1 task completes every p time units – Pipelined design with p-stage pipeline (unit time per stage) n tasks in n+p-1 time: Throughput = n/(n+p-1) = 1/(1 + (p-1)/n) approaches 1 (as n >> p) 1 task completes per time unit Speedup = T non-pipelined /T pipelined = np/(n+p-1) For n >> p Speedup approaches p – Parallel processing design with M machines: Every T = p time units M tasks complete Throughput = M/p and Speedup M 8

  7. Multi Cycle Implementation ALUWrite P C IR AWrite A REG ALU MEM ALUout IRWrite FILE PCWrite B MDR BWrite ALUop 4 MEMRead MDRWrite STATE MACHINE DECODER 9

  8. Multi-Cycle Design State Machine Model Instruction Fetch : IR = IM[PC]; 0 PC = PC+4 Instruction Decode: Generate Control Signals 1 A = REG[$rs] B = REG[$rt] ALUout = PC + Shift(SE(offset)) R-R : p= A q = B lw : p= A q = SE(d) sw : p= A q =SE(d) beq : p = A; q = B; 2 5 8 ALUout = p op q ALUout = p op q ALUout = p op q 10 Z = (p .eq. q); 10 R-R : lw : If (z == 1) PC = ALUOUT; sw : 3 6 9 REG[$rd] = ALUout MDR = DM[ALUout] DM[ALUout] = B lw : 7 REG[$rt] = MDR S6 S0 S1 S5 S7 S0 S1 S2 S3 S0 2 LD (5 cycles) ADD (4 cycles)

Recommend


More recommend