Pipelining Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings
Objective To present the Pipelining concept, its limitations and the techniques for performance optimization 2 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 3 Pipelining
Instruction Cycle State Diagram instruction operand operation address decoding calculation operand instruction fetch multiple fetch operands indirection instruction data address operation decoding no indirection interrupt result address calculation interrrupt result interrrupt check store multiple interrupt results 4 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 5 Pipelining
Instruction Pipelining Instruction cycle is split into sequential steps A specific hardware unit (pipeline stage) is built to perform each step Pipeline stages are arranged as a chain pipeline stages ● ● ● 1 2 k 6 Pipelining
Instruction Pipelining - Example FI – fetch instruction DI – decode instruction CO – calculate operand FI DI CO FO EI WO FO – fetch operand EI – execute instruction WO – write operand (result) 7 Pipelining
Instruction Pipeline Operation time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 1 FI DI CO FO EI WO instruction 2 FI DI CO FO EI WO instruction 3 FI DI CO FO EI WO instruction 4 FI DI CO FO EI WO instruction 5 FI DI CO FO EI WO instruction 6 FI DI CO FO EI WO instruction 7 FI DI CO FO EI WO instruction 15 FI DI CO FO EI WO FI DI CO FO EI WO instruction 16 8 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 9 Pipelining
Pipeline Performance Assuming k stages τ = τ 1 = τ 2 = ... = τ k ( τ i is the time delay of the i- th stage) T n,k time for a pipeline with k stages to execute n instructions T n,1 = n k τ → (conventional machine) T n,k = k τ + ( n-1) τ = ( n+k-1) τ → (pipeline) nk nk The speedup S k ( n k 1 ) n k 1 nk ! ! ! ! ! ! For large n lim S lim k k n k 1 n n 10 Pipelining
Pipeline Performance Speedup Speedup 12 14 k = 12 stages n = 30 instructions 10 12 8 10 k = 9 stages 8 6 n = 20 instructions 6 4 n = 10 instructions k = 6 stages 4 2 2 0 0 1 2 4 8 16 32 64 128 0 5 10 15 20 Number of instructions Number of instructions Number of stages Number of instructions 11 Pipelining
Pipeline Performance The optimal performance is never reached because: The execution time is different from stage to stage 1 2 k There is still a time delay to latch the output of each stage d max i i Pipeline hazards 12 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 13 Pipelining
Pipeline Hazards In some cases a portion of pipeline must stall, due to the so called hazards Also called pipeline bubble Types of hazards Resource Data Control 14 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 15 Pipelining
Resource Hazards Also called structural hazards, occur when multiple instructions need the same resource, e.g., single port memory time time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 instruction 1 instruction 1 FI FI DI DI CO FO CO FO EI WO EI WO instruction 2 instruction 2 FI FI DI DI CO FO CO FO EI WO EI WO instruction 3 instruction 3 FI FI DI DI CO FO CO EI WO FO EI WO idle instruction 4 instruction 4 FI DI CO FO EI WO FI DI CO FO EI WO idle 16 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 17 Pipelining
Data Hazards Conflict in access of an operand location Two instructions to be executed in sequence Both access a particular memory or register operand Example: time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ADD EAX,EBX FI DI CO FO EI WO SUB ECX,EAX FI DI CO FO EI WO idle instruction 3 FI DI CO FO EI WO idle instruction 4 FI DI FO EI WO CO idle 18 Pipelining
Outline Instruction Cycle Instruction Pipelining Pipeline Performance Pipeline Hazards Resource Data Control 19 Pipelining
Control Hazards Also called branch hazard. What is the address of the instruction following a conditional branch from? Known only here time time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ADD EAX,EBX ADD EAX,EBX FI FI DI DI CO FO CO FO EI WO EI WO JNZ ADDRESS JNZ ADDRESS FI FI DI DI CO FO CO FO EI WO EI WO instruction 3 instruction 3 FI DI CO FO FI EI WO DI CO FO EI WO idle instruction 4 instruction 4 FI DI CO FO EI WO no memory conflict! 20 Pipelining
Control Hazards Dealing with Branches Multiple Streams Two pipelines prefetch each branch into a separate pipeline (IBM 370/168 and IBM 3033). Always one pipeline produces no useful work. Prefetch Branch Target Target of branch is prefetched in addition to instructions following branch; keep target until branch is executed (IBM 360/91) Loop buffer Very fast memory maintained by fetch stage of pipeline. Check buffer before fetching from memory (CRAY-1) Branch prediction Delayed branching 21 Pipelining
Branch Prediction Concept: Instead of delaying the fetch of next instruction, it is predicted Results are stored in temporary registers If prediction correct, make results definitive If prediction incorrect, flush results, and restart fetching from the right address 22 Pipelining
Branch Prediction Static Methods: Predict “never taken” or “always taken” Predicted by opcode There are two codes for each branch instruction → 1 bit indicates “predict taken” or “predict not taken” Compiler analyses the code, guesses and generates the appropriate branch code. Processor follows compiler suggestion Implies in code incompatibility with previous processors 23 Pipelining
Branch Prediction Dynamic Methods Based on recent branch history branch taken target not taken instruction state address address predict predict taken taken taken not taken taken not taken ● ● ● ● ● ● ● ● ● predict predict not taken not taken not taken taken Branch Prediction State Diagram Branch History Table 24 Pipelining
Delayed Branch Concept The branch takes effect only after the execution of the following instruction reduces the branch penalty not taken taken not taken taken MOV EDX,ECX MOV EDX,ECX ADD EAX,[EBX] ADD EAX,[EBX] ADD EAX,[EBX] ADD EAX,[EBX] JZ LA JZ LA always JZ LA JZ LA MOV EDX,ECX MOV EDX,ECX executed instruction instruction instrução instruction ... ... LA: instruction LA: instruction LA: instrução LA: instruction ... ... ... ... conventional branch delayed branch 25 Pipelining
Delayed Branch Example: conventional branch Prediction wrong time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 MOV EDX,ECX FI DI CO FO EI WO ADD EAX,[EBX] FI DI CO FO EI WO FI DI CO FO EI WO JZ LA branch instruction 1 penalty FI DI FO DI instruction 2 FI FI instruction 3 FI DI CO FO EI WO instruction 4 26 Pipelining
Delayed Branch Example: delayed branch Prediction wrong time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 FI DI CO FO EI WO ADD EAX,[EBX] FI DI CO FO EI WO JZ LA FI DI CO FO EI WO MOV EDX,ECX branch penalty instruction 1 FI DI FI DI instruction 2 FI FI FI DI CO FO EI WO instruction 3 instruction 4 27 Pipelining
Exercises Exercise 1: Assume the pipeline shown in slide 7 containing 6 stages. Complete the graphs below that represent the pipeline operation assuming a single port memory. Hint : take in consideration the memory accesses for instruction fetch, operand fetch and result write. PROGRAM 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ADD EAX,[EBX+ESI] INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] PROGRAM 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ADD [EBX+ESI], EAX INC EBX DEC [ESI*2+EBP] MOV CX,[ 4768] 28 Pipelining
Recommend
More recommend