pipelining
play

Pipelining Raul Queiroz Feitosa Parts of these slides are from the - PowerPoint PPT Presentation

Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings Objective To present the Pipelining concept, its limitations and the techniques for performance optimization 2 Pipelining Outline


  1. Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings

  2. Objective To present the Pipelining concept, its limitations and the techniques for performance optimization 2 Pipelining

  3. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 3 Pipelining

  4. Instruction Cycle State Diagram instruction operand operation address decoding calculation operand instruction fetch multiple fetch operands indirection instruction data address operation decoding no indirection interrupt result address calculation interrrupt result interrrupt check store multiple interrupt results 4 Pipelining

  5. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 5 Pipelining

  6. Instruction Pipelining  Instruction cycle is split into sequential steps  A specific hardware unit (pipeline stage) is built to perform each step  Pipeline stages are arranged as a chain pipeline stages ● ● ● 1 2 k 6 Pipelining

  7. Instruction Pipelining - Example FI – fetch instruction DI – decode instruction CO – calculate operand FI DI CO FO EI WO FO – fetch operand EI – execute instruction WO – write operand (result) 7 Pipelining

  8. Instruction Pipeline Operation time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 1 FI DI CO FO EI WO instruction 2 FI DI CO FO EI WO instruction 3 FI DI CO FO EI WO instruction 4 FI DI CO FO EI WO instruction 5 FI DI CO FO EI WO instruction 6 FI DI CO FO EI WO instruction 7 FI DI CO FO EI WO instruction 15 FI DI CO FO EI WO FI DI CO FO EI WO instruction 16 8 Pipelining

  9. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 9 Pipelining

  10. Pipeline Performance Assuming  k stages  τ = τ 1 = τ 2 = ... = τ k ( τ i is the time delay of the i- th stage) T n,k time for a pipeline with k stages to execute n instructions  T n,1 = n k τ → (conventional machine)  T n,k = k τ + ( n-1) τ = ( n+k-1) τ → (pipeline)  nk nk The speedup   S k      ( n k 1 ) n k 1 nk   ! ! ! ! ! ! For large n lim S lim k   k     n k 1 n n 10 Pipelining

  11. Pipeline Performance Speedup Speedup 12 14 k = 12 stages n = 30 instructions 10 12 8 10 k = 9 stages 8 6 n = 20 instructions 6 4 n = 10 instructions k = 6 stages 4 2 2 0 0 1 2 4 8 16 32 64 128 0 5 10 15 20 Number of instructions Number of instructions Number of stages Number of instructions 11 Pipelining

  12. Pipeline Performance The optimal performance is never reached because:  The execution time is different from stage to stage        1 2 k  There is still a time delay to latch the output of each stage   d     max i i  Pipeline hazards 12 Pipelining

  13. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 13 Pipelining

  14. Pipeline Hazards  In some cases a portion of pipeline must stall, due to the so called hazards  Also called pipeline bubble  Types of hazards  Resource  Data  Control 14 Pipelining

  15. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 15 Pipelining

  16. Resource Hazards Also called structural hazards, occur when multiple instructions need the same resource, e.g., single port memory time time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 1 instruction 1 FI FI DI DI CO FO CO FO EI WO EI WO instruction 2 instruction 2 FI FI DI DI CO FO CO FO EI WO EI WO instruction 3 instruction 3 FI FI DI DI CO FO CO EI WO FO EI WO idle instruction 4 instruction 4 FI DI CO FO EI WO FI DI CO FO EI WO idle 16 Pipelining

  17. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 17 Pipelining

  18. Data Hazards  Conflict in access of an operand location  Two instructions to be executed in sequence  Both access a particular memory or register operand  Example: time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ADD EAX,EBX FI DI CO FO EI WO SUB ECX,EAX FI DI CO FO EI WO idle instruction 3 FI DI CO FO EI WO idle instruction 4 FI DI FO EI WO CO idle 18 Pipelining

  19. Outline  Instruction Cycle  Instruction Pipelining  Pipeline Performance  Pipeline Hazards  Resource  Data  Control 19 Pipelining

  20. Control Hazards Also called branch hazard. What is the address of the instruction following a conditional branch from? Known only here time time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ADD EAX,EBX ADD EAX,EBX FI FI DI DI CO FO CO FO EI WO EI WO JNZ ADDRESS JNZ ADDRESS FI FI DI DI CO FO CO FO EI WO EI WO instruction 3 instruction 3 FI DI CO FO FI EI WO DI CO FO EI WO idle instruction 4 instruction 4 FI DI CO FO EI WO no memory conflict! 20 Pipelining

  21. Control Hazards Dealing with Branches  Multiple Streams  Two pipelines  prefetch each branch into a separate pipeline (IBM 370/168 and IBM 3033). Always one pipeline produces no useful work.  Prefetch Branch Target  Target of branch is prefetched in addition to instructions following branch; keep target until branch is executed (IBM 360/91)  Loop buffer  Very fast memory maintained by fetch stage of pipeline. Check buffer before fetching from memory (CRAY-1)  Branch prediction  Delayed branching 21 Pipelining

  22. Branch Prediction Concept:  Instead of delaying the fetch of next instruction, it is predicted  Results are stored in temporary registers  If prediction correct, make results definitive  If prediction incorrect, flush results, and restart fetching from the right address 22 Pipelining

  23. Branch Prediction Static Methods:  Predict “never taken” or “always taken”  Predicted by opcode  There are two codes for each branch instruction → 1 bit indicates “predict taken” or “predict not taken”  Compiler analyses the code, guesses and generates the appropriate branch code.  Processor follows compiler suggestion  Implies in code incompatibility with previous processors 23 Pipelining

  24. Branch Prediction Dynamic Methods Based on recent branch history branch taken target not taken instruction state address address predict predict taken taken taken not taken taken not taken ● ● ● ● ● ● ● ● ● predict predict not taken not taken not taken taken Branch Prediction State Diagram Branch History Table 24 Pipelining

  25. Delayed Branch Concept The branch takes effect only after the execution of the following instruction  reduces the branch penalty not taken taken not taken taken MOV EDX,ECX MOV EDX,ECX ADD EAX,[EBX] ADD EAX,[EBX] ADD EAX,[EBX] ADD EAX,[EBX] JZ LA JZ LA always JZ LA JZ LA MOV EDX,ECX MOV EDX,ECX executed instruction instruction instrução instruction ... ... LA: instruction LA: instruction LA: instrução LA: instruction ... ... ... ... conventional branch delayed branch 25 Pipelining

  26. Delayed Branch Example: conventional branch Prediction wrong time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 MOV EDX,ECX FI DI CO FO EI WO ADD EAX,[EBX] FI DI CO FO EI WO FI DI CO FO EI WO JZ LA branch instruction 1 penalty FI DI FO DI instruction 2 FI FI instruction 3 FI DI CO FO EI WO instruction 4 26 Pipelining

  27. Delayed Branch Example: delayed branch Prediction wrong time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 FI DI CO FO EI WO ADD EAX,[EBX] FI DI CO FO EI WO JZ LA FI DI CO FO EI WO MOV EDX,ECX branch penalty instruction 1 FI DI FI DI instruction 2 FI FI FI DI CO FO EI WO instruction 3 instruction 4 27 Pipelining

  28. Exercises Exercise 1: Assume the pipeline shown in slide 7 containing 6 stages. Complete the graphs below that represent the pipeline operation assuming a single port memory. Hint : take in consideration the memory accesses for instruction fetch, operand fetch and result write. PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ADD [EBX+ESI], EAX INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] 28 Pipelining

Recommend


More recommend