RISC Design: Beyond Pipelining Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 16 (14 Feb 2013) CADSL
Single Lane Traffic 14 Feb 2013 EE-739@IITB 2 CADSL
Summary: Hazards • Structural hazards – Cause: resource conflict – Remedies: (i) hardware resources, (ii) stall (bubble) • Data hazards – Cause: data unavailablity – Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering • Control hazards – Cause: out-of-sequence execution (branch or jump) – Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush 14 Feb 2013 EE-739@IITB 3 CADSL
Limits of Pipelining Limits of Pipelining • IBM RISC Experience – Control and data dependences add 15% – Best case CPI of 1.15, IPC of 0.87 – Deeper pipelines (higher frequency) magnify dependence penalties • This analysis assumes 100% cache hit rates – Hit rates approach 100% for some programs – Many important programs have much worse hit rates 14 Feb 2013 EE-739@IITB 4 CADSL
Processor Performance Processor Performance Time Processor Performance = --------------- Program Instructions Cycles Time = X X Instruction Program Cycle (code size) (CPI) (cycle time) • In the 1980’s (decade of pipelining): – CPI: 5.0 => 1.15 • In the 1990’s (decade of superscalar): – CPI: 1.15 => 0.5 (best case) • In the 2000’s (decade of multicore): – Marginal CPI improvement 14 Feb 2013 EE-739@IITB 5 CADSL
Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) 14 Feb 2013 EE-739@IITB 6 CADSL
Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g g = fraction of time pipeline is filled 1-g = fraction of time pipeline is not filled (stalled) 14 Feb 2013 EE-739@IITB 7 CADSL
Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g • Tyranny of Amdahl’s Law [Bob Colwell] – When g is even slightly below 100%, a big performance hit will result – Stalled cycles are the key adversary and must be minimized as much as possible 14 Feb 2013 EE-739@IITB 8 CADSL
Limits on Instruction Level Parallelism (ILP) Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck) Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 (Jouppi disagreed) Kuck et al. [1972] 8 Riseman and Foster [1972] 51 (no control dependences) Nicolau and Fisher [1984] 90 (Fisher’s optimism) 14 Feb 2013 EE-739@IITB 9 CADSL
Superscalar Proposal • Go beyond single instruction pipeline, achieve IPC > 1 • Dispatch multiple instructions per cycle • Provide more generally applicable form of concurrency (not just vectors) • Geared for sequential code that is hard to parallelize otherwise • Exploit fine-grained or instruction-level parallelism (ILP) 14 Feb 2013 EE-739@IITB 10 CADSL
Motivation for Superscalar Motivation for Superscalar [Agerwala and Cocke] [Agerwala and Cocke] Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar) Typical Range 14 Feb 2013 EE-739@IITB 11 CADSL
Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Baseline scalar RISC – Issue parallelism = IP = 1 – Operation latency = OP = 1 – Peak IPC = 1 INSTRUCTIONS SUCCESSIVE 1 IF DE EX WB 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 TIME IN CYCLES (OF BASELINE MACHINE) 14 Feb 2013 EE-739@IITB 12 CADSL
Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined: cycle time = 1/m of baseline – Issue parallelism = IP = 1 inst / minor cycle – Operation latency = OP = m minor cycles – Peak IPC = m instr / major cycle (m x speedup?) 1 2 3 4 5 6 IF DE EX WB 2 5 1 4 6 3 14 Feb 2013 EE-739@IITB 13 CADSL
Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Superscalar: – Issue parallelism = IP = n inst / cycle – Operation latency = OP = 1 cycle – Peak IPC = n instr / cycle (n x speedup?) 1 2 3 4 5 6 7 8 9 IF WB EX DE 14 Feb 2013 EE-739@IITB 14 CADSL
Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • VLIW: Very Long Instruction Word – Issue parallelism = IP = n inst / cycle – Operation latency = OP = 1 cycle – Peak IPC = n instr / cycle = 1 VLIW / cycle WB IF DE EX 14 Feb 2013 EE-739@IITB 15 CADSL
Recommend
More recommend