RISC Design: Beyond Pipelining Virendra Singh Associate Professor - PowerPoint PPT Presentation

RISC Design: Beyond Pipelining Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 16 (14 Feb 2013) CADSL

Single Lane Traffic 14 Feb 2013 EE-739@IITB 2 CADSL

Summary: Hazards • Structural hazards – Cause: resource conflict – Remedies: (i) hardware resources, (ii) stall (bubble) • Data hazards – Cause: data unavailablity – Remedies: (i) forwarding, (ii) stall (bubble), (iii) code reordering • Control hazards – Cause: out-of-sequence execution (branch or jump) – Remedies: (i) stall (bubble), (ii) branch prediction/pipeline flush, (iii) delayed branch/pipeline flush 14 Feb 2013 EE-739@IITB 3 CADSL

Limits of Pipelining Limits of Pipelining • IBM RISC Experience – Control and data dependences add 15% – Best case CPI of 1.15, IPC of 0.87 – Deeper pipelines (higher frequency) magnify dependence penalties • This analysis assumes 100% cache hit rates – Hit rates approach 100% for some programs – Many important programs have much worse hit rates 14 Feb 2013 EE-739@IITB 4 CADSL

Processor Performance Processor Performance Time Processor Performance = --------------- Program Instructions Cycles Time = X X Instruction Program Cycle (code size) (CPI) (cycle time) • In the 1980’s (decade of pipelining): – CPI: 5.0 => 1.15 • In the 1990’s (decade of superscalar): – CPI: 1.15 => 0.5 (best case) • In the 2000’s (decade of multicore): – Marginal CPI improvement 14 Feb 2013 EE-739@IITB 5 CADSL

Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g • g = fraction of time pipeline is filled • 1-g = fraction of time pipeline is not filled (stalled) 14 Feb 2013 EE-739@IITB 6 CADSL

Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g  g = fraction of time pipeline is filled  1-g = fraction of time pipeline is not filled (stalled) 14 Feb 2013 EE-739@IITB 7 CADSL

Pipelined Performance Model Pipelined Performance Model N Pipeline Depth 1 g 1-g • Tyranny of Amdahl’s Law [Bob Colwell] – When g is even slightly below 100%, a big performance hit will result – Stalled cycles are the key adversary and must be minimized as much as possible 14 Feb 2013 EE-739@IITB 8 CADSL

Limits on Instruction Level Parallelism (ILP) Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck) Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 (Jouppi disagreed) Kuck et al. [1972] 8 Riseman and Foster [1972] 51 (no control dependences) Nicolau and Fisher [1984] 90 (Fisher’s optimism) 14 Feb 2013 EE-739@IITB 9 CADSL

Superscalar Proposal • Go beyond single instruction pipeline, achieve IPC > 1 • Dispatch multiple instructions per cycle • Provide more generally applicable form of concurrency (not just vectors) • Geared for sequential code that is hard to parallelize otherwise • Exploit fine-grained or instruction-level parallelism (ILP) 14 Feb 2013 EE-739@IITB 10 CADSL

Motivation for Superscalar Motivation for Superscalar [Agerwala and Cocke] [Agerwala and Cocke] Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar) Typical Range 14 Feb 2013 EE-739@IITB 11 CADSL

Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Baseline scalar RISC – Issue parallelism = IP = 1 – Operation latency = OP = 1 – Peak IPC = 1 INSTRUCTIONS SUCCESSIVE 1 IF DE EX WB 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 TIME IN CYCLES (OF BASELINE MACHINE) 14 Feb 2013 EE-739@IITB 12 CADSL

Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Superpipelined: cycle time = 1/m of baseline – Issue parallelism = IP = 1 inst / minor cycle – Operation latency = OP = m minor cycles – Peak IPC = m instr / major cycle (m x speedup?) 1 2 3 4 5 6 IF DE EX WB 2 5 1 4 6 3 14 Feb 2013 EE-739@IITB 13 CADSL

Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • Superscalar: – Issue parallelism = IP = n inst / cycle – Operation latency = OP = 1 cycle – Peak IPC = n instr / cycle (n x speedup?) 1 2 3 4 5 6 7 8 9 IF WB EX DE 14 Feb 2013 EE-739@IITB 14 CADSL

Classifying ILP Machines Classifying ILP Machines [Jouppi, DECWRL 1991] • VLIW: Very Long Instruction Word – Issue parallelism = IP = n inst / cycle – Operation latency = OP = 1 cycle – Peak IPC = n instr / cycle = 1 VLIW / cycle WB IF DE EX 14 Feb 2013 EE-739@IITB 15 CADSL

RISC Design: Beyond Pipelining Virendra Singh Associate Professor - PowerPoint PPT Presentation

RISC Design: Beyond Pipelining Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Unification Temur Kutsia RISC, Johannes Kepler University Linz, Austria kutsia@risc.jku.at

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

Sail, RISC-V, and CHERI-RISC-V Prashanth Mundkur and Peter G. Neumann, SRI International (most of

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

Adventures with RISC-V Vectors and LLVM Robin Kruppe Roger Espasa Chief Architect Embedded

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set

CISC / RISC Complex / Reduced Instruction Set Computers CISC / RISC p. 1/12 Instruction

Computer Systems Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)

Binary Numbers Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)

RISC-V Foundation Security Standing Committee Call to action !! Helena Handschuh, Rambus, SSC

Defeasibility in the law Giovanni Sartor EUI - European University Institute of Florence CIRSFID

P( ) 1 2 coin flipping Exampls Suppose you flip two coins & all outcomes are equally

x , P ( x ) is always (vacuously) true . x | P ( x ) is always false (

IRA Changes and Strategies You Must Know Bob Carlson AAII D.C. Metro Editor, Retirement Watch

New results on block entanglement in 1D systems Pasquale Calabrese Dipartimento di Fisica

How to capture, model, and verify the knowledge of legal, security, and privacy experts: a

Tutorial Verification Numerical Simulation Analytic and Numerical Methods in ODEs Further

Surfaces and surface roughness measurement Metrology and Instrumentation Dr. Belal Gharaibeh

RISC Design: Beyond Pipelining Virendra Singh Associate Professor - PowerPoint PPT Presentation

RISC Design: Beyond Pipelining Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

The future of operating systems on RISC-V Alex Bradbury asb@lowrisc.org @asbradbury 4th

PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

Roadmap 1. Instruction Set Architectures (ISA) What is CISC? What is RISC? Why did RISC prevail

LOGIC TECHNOLOGY FOR CS EDUCATION RISCAL The RISC Algorithm Language Wolfgang Schreiner

Unification Temur Kutsia RISC, Johannes Kepler University Linz, Austria kutsia@risc.jku.at

QEMU Support for the RISC-V Instruction Set Architecture Sagar Karandikar

Sail, RISC-V, and CHERI-RISC-V Prashanth Mundkur and Peter G. Neumann, SRI International (most of

End-to-end formal ISA verification of RISC-V processors with riscv-formal Clifford Wolf About

Adventures with RISC-V Vectors and LLVM Robin Kruppe Roger Espasa Chief Architect Embedded

DLX computer Electronic Computers M 1 RISC architectures RISC vs CISC (Reduced Instruction Set

CISC / RISC Complex / Reduced Instruction Set Computers CISC / RISC p. 1/12 Instruction

Computer Systems Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)

Binary Numbers Wolfgang Schreiner Research Institute for Symbolic Computation (RISC-Linz)

RISC-V Foundation Security Standing Committee Call to action !! Helena Handschuh, Rambus, SSC

Defeasibility in the law Giovanni Sartor EUI - European University Institute of Florence CIRSFID

P( ) 1 2 coin flipping Exampls Suppose you flip two coins &amp; all outcomes are equally

x , P ( x ) is always (vacuously) true . x | P ( x ) is always false (

IRA Changes and Strategies You Must Know Bob Carlson AAII D.C. Metro Editor, Retirement Watch

New results on block entanglement in 1D systems Pasquale Calabrese Dipartimento di Fisica

How to capture, model, and verify the knowledge of legal, security, and privacy experts: a

Tutorial Verification Numerical Simulation Analytic and Numerical Methods in ODEs Further

Surfaces and surface roughness measurement Metrology and Instrumentation Dr. Belal Gharaibeh

P( ) 1 2 coin flipping Exampls Suppose you flip two coins & all outcomes are equally