CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk - PowerPoint PPT Presentation

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2020

Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2 Spring 2020

Review: Instruction Critical Paths q Calculate cycle time assuming negligible delays (for muxes, control unit, sign extend, PC access, shift left 2, wires) except: G Instruction and Data Memory (4 ns) G ALU and adders (2 ns) G Register File access (reads or writes) (1 ns) Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total R- 4 1 2 1 8 type 4 1 2 4 1 12 load 4 1 2 4 11 store 4 1 2 7 beq jump 4 4 CENG3420 L06.4 Spring 2020

Review: Single Cycle Disadvantages & Advantages q Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instr G especially problematic for more complex instructions like floating point multiply Cycle 1 Cycle 2 Clk lw sw Waste q May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle but q It is simple and easy to understand CENG3420 L06.5 Spring 2020

How Can We Make It Faster? q Start fetching and executing the next instruction before the current one has completed G Pipelining – (all?) modern processors are pipelined for performance G Remember the performance equation: CPU time = CPI * CC * IC q Under ideal conditions and with a large number of instructions, the speedup from pipelining is approximately equal to the number of pipe stages G A five stage pipeline is nearly five times faster because the CC is “nearly” five times faster q Fetch (and execute) more than one instruction at a time G Superscalar processing – stay tuned CENG3420 L06.6 Spring 2020

The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 IFetch Dec Exec Mem WB lw q IFetch: Instruction Fetch and Update PC q Dec: Registers Fetch and Instruction Decode q Exec: Execute R-type; calculate memory address q Mem: Read/write the data from/to the Data Memory q WB: Write the result data into the register file CENG3420 L06.7 Spring 2020

A Pipelined MIPS Processor q Start the next instruction before the current one has completed G improves throughput - total amount of work done in a given time G instruction latency (execution time, delay time, response time - time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type - clock cycle (pipeline stage time) is limited by the slowest stage - for some stages don’t need the whole clock cycle (e.g., WB) - for some instructions, some stages are wasted cycles (i.e., nothing is done during that cycle for that instruction) CENG3420 L06.8 Spring 2020

Single Cycle versus Pipeline Single Cycle Implementation (CC = 800 ps): Cycle 1 Cycle 2 Clk lw sw Waste 400 ps Pipeline Implementation (CC = 200 ps): lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB IFetch Dec Exec Mem WB R-type q To complete an entire instruction in the pipelined case takes 1000 ps (as compared to 800 ps for the single cycle case). Why ? q How long does each take to complete 1,000,000 adds ? CENG3420 L06.9 Spring 2020

Pipelining the MIPS ISA q What makes it easy G all instructions are the same length (32 bits) - can fetch in the 1 st stage and decode in the 2 nd stage G few instruction formats (three) with symmetry across formats - can begin reading register file in 2 nd stage G memory operations occur only in loads and stores - can use the execute stage to calculate memory addresses G each instruction writes at most one result (i.e., changes the machine state) and does it in the last few pipeline stages (MEM or WB) G operands must be aligned in memory so a single data transfer takes only one data memory access CENG3420 L06.10 Spring 2020

MIPS Pipeline Datapath Additions/Mods q State registers between each pipeline stage to isolate them IF:IFetch ID:Dec EX:Execute MEM: WB: MemAccess WriteBack IF/ID ID/EX EX/MEM Add Add MEM/WB Shift 4 left 2 Read Addr 1 Instruction Data Register Read Memory Memory Data 1 Read Addr 2 Read PC File Read Address Address ALU Write Addr Data Read Data 2 Write Data Write Data Sign Extend 16 32 System Clock CENG3420 L06.11 Spring 2020

MIPS Pipeline Control Path Modifications q All control signals can be determined during Decode G and held in the state registers between pipeline stages PCSrc ID/EX EX/MEM Control IF/ID Add MEM/WB Branch Add RegWrite Shift 4 left 2 Read Addr 1 Instruction Data Register Read Memory Memory Data 1 MemtoReg Read Addr 2 ALUSrc Read PC File Read Address Address ALU Write Addr Data Read Data 2 Write Data Write Data ALU cntrl MemRead Sign Extend 16 32 ALUOp RegDst CENG3420 L06.12 Spring 2020

Pipeline Control q IF Stage: read Instr Memory (always asserted) and write PC (on System Clock) q ID Stage: no optional control signals to set EX Stage MEM Stage WB Stage Reg ALU ALU ALU Brch Mem Mem Reg Mem Dst Op1 Op0 Src Read Write Write toReg R 1 1 0 0 0 0 0 1 0 lw 0 0 0 1 0 1 0 1 1 X 0 0 1 0 0 1 0 X sw beq X 0 1 0 1 0 0 0 X CENG3420 L06.13 Spring 2020

Graphically Representing MIPS Pipeline ALU IM Reg DM Reg q Can help with answering questions like: G How many cycles does it take to execute this code? G What is the ALU doing during cycle 4? G Is there a hazard, why does it occur, and how can it be fixed? CENG3420 L06.14 Spring 2020

Other Pipeline Structures Are Possible q What about the (slow) multiply operation? G Make the clock twice as slow or … G let it take two cycles (since it doesn’t use the DM stage) MUL ALU IM Reg DM Reg q What if the data memory access is twice as slow as the instruction memory? G make the clock twice as slow or … G let data memory access take two cycles (and keep the same clock rate) ALU IM DM1 DM2 Reg Reg CENG3420 L06.15 Spring 2020

Other Sample Pipeline Alternatives q ARM7 IM Reg EX PC update decode ALU op IM access reg DM access access shift/rotate commit result (write back) q XScale Reg ALU IM1 DM1 IM2 Reg SHFT DM2 PC update decode DM write ALU op BTB access reg 1 access reg write start IM access start DM access shift/rotate IM access exception reg 2 access CENG3420 L06.16 Spring 2020

Why Pipeline? For Performance! Time (clock cycles) Once the Inst 0 ALU pipeline is full, IM Reg DM Reg I one instruction n is completed s Inst 1 ALU IM Reg DM Reg t every cycle, so r. CPI = 1 ALU Inst 2 IM Reg DM Reg O r d ALU Inst 3 IM Reg DM Reg e r ALU Inst 4 IM Reg DM Reg Time to fill the pipeline CENG3420 L06.17 Spring 2020

Can Pipelining Get Us Into Trouble? q Yes: Pipeline Hazards G structural hazards: - a required resource is busy G data hazards: - attempt to use data before it is ready G control hazards: - deciding on control action depends on previous instruction q Can usually resolve hazards by waiting G pipeline control must detect the hazard G and take action to resolve hazards CENG3420 L06.19 Spring 2020

Structure Hazards q Conflict for use of a resource q In MIPS pipeline with a single memory G Load/store requires data access G Instruction fetch requires instruction access q Hence, pipeline datapaths require separate instruction/data memories G Or separate instruction/data caches q Since Register File CENG3420 L06.20 Spring 2020

Resolve Structural Hazard 1 Time (clock cycles) Reading data from lw ALU Mem Reg Mem Reg memory I n s Inst 1 ALU Mem Reg Mem Reg t r. ALU Inst 2 Mem Reg Mem Reg O r d ALU Inst 3 Mem Reg Mem Reg e r ALU Inst 4 Mem Reg Mem Reg Reading instruction from memory q Fix with separate instr and data memories (I$ and D$) CENG3420 L06.21 Spring 2020

Resolve Structural Hazard 2 Time (clock cycles) Fix register file add $1, ALU access hazard by IM Reg DM Reg I doing reads in the n second half of the s Inst 1 ALU IM Reg DM Reg t cycle and writes in r. the first half ALU Inst 2 IM Reg DM Reg O r d ALU add $2,$1, IM Reg DM Reg e r clock edge that controls clock edge that controls loading of pipeline state register writing registers CENG3420 L06.22 Spring 2020

Data Hazards: Register Usage q Dependencies backward in time cause hazards ALU add $1, IM DM Reg Reg ALU sub $4,$1,$5 IM DM Reg Reg ALU and $6,$1,$7 IM DM Reg Reg ALU or $8,$1,$9 IM DM Reg Reg ALU IM DM Reg Reg xor $4,$1,$5 q Read before write data hazard CENG3420 L06.24 Spring 2020

Data Hazards: Load Memory q Dependencies backward in time cause hazards ALU lw $1,4($2) IM DM Reg Reg I n s ALU sub $4,$1,$5 IM DM Reg Reg t r. ALU and $6,$1,$7 IM DM Reg Reg O r d ALU or $8,$1,$9 IM DM Reg Reg e r ALU IM DM Reg Reg xor $4,$1,$5 q Load-use data hazard CENG3420 L06.25 Spring 2020

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk - PowerPoint PPT Presentation

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2020 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2 Spring 2020 Outline q Pipeline

CENG 3420 Lecture 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Spring 2018

CENG 3420 Lecture 06: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2018 The

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting

Internal Pipeline Corrosion Kenneth Lee Pipeline Safety Director, Engineering & Research

Pipeline Construction Pipeline Construction Challenges Challenges NAPCA Workshop August 19,

Pipeline A Presentation by Team Pipeline Ben Lai Brandon Bakhshai Jeffrey Serio Somya

1,000 foot pipeline Connect Replacement (Saugus 3 and 4) Wells to Magic Mountain Pipeline

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Quantum Time-Space Tradeoffs by Recording Queries Yassine Hamoudi, Frdric Magniez IRIF ,

The Godson-3 Multi-Core Processor and its Application in High Performance Computers Weiwu Hu

Chapter 2 Chapter 2 Instruction-Level Parallelism and Its Exploitation p 1 Overview

Algorithmic Questions in Higher-Order Fourier Analysis Madhur Tulsiani TTI Chicago 1 1 2

The geometry of black hole entropy John Dougherty UC San Diego March 13, 2015 John Dougherty

Airports of Thailand Plc. Airports of Thailand Plc. Corporate Presentation Corporate

Airports of Thailand Plc. Corporate Presentation FY2008 (October 2007 September 2008)

Instruction-Level Parallelism Dynamic Pipelines Dr. Soner Onder CS 4431 Michigan Technological

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk - PowerPoint PPT Presentation

CENG 3420 Lecture 06: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2020 Outline q Pipeline Motivations q Pipeline Hazards q Exceptions q Background: Flip-Flop Control Signals CENG3420 L06.2 Spring 2020 Outline q Pipeline

CENG 3420 Lecture 07: Pipeline Bei Yu byu@cse.cuhk.edu.hk CENG3420 L07.1 Spring 2018

CENG 3420 Lecture 06: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG3420 L06.1 Spring 2018 The

Office of Pipeline Safety Office of Pipeline Safety Presentation on Presentation on Damage

Ma Magic Mountain Pipeline Phase 6 gic Mountain Pipeline Phase 6 Project ject Board Meeting

Internal Pipeline Corrosion Kenneth Lee Pipeline Safety Director, Engineering &amp; Research

Pipeline Construction Pipeline Construction Challenges Challenges NAPCA Workshop August 19,

Pipeline A Presentation by Team Pipeline Ben Lai Brandon Bakhshai Jeffrey Serio Somya

1,000 foot pipeline Connect Replacement (Saugus 3 and 4) Wells to Magic Mountain Pipeline

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th

Quantum Time-Space Tradeoffs by Recording Queries Yassine Hamoudi, Frdric Magniez IRIF ,

The Godson-3 Multi-Core Processor and its Application in High Performance Computers Weiwu Hu

Chapter 2 Chapter 2 Instruction-Level Parallelism and Its Exploitation p 1 Overview

Algorithmic Questions in Higher-Order Fourier Analysis Madhur Tulsiani TTI Chicago 1 1 2

The geometry of black hole entropy John Dougherty UC San Diego March 13, 2015 John Dougherty

Airports of Thailand Plc. Airports of Thailand Plc. Corporate Presentation Corporate

Airports of Thailand Plc. Corporate Presentation FY2008 (October 2007 September 2008)

Instruction-Level Parallelism Dynamic Pipelines Dr. Soner Onder CS 4431 Michigan Technological

Internal Pipeline Corrosion Kenneth Lee Pipeline Safety Director, Engineering & Research