Superscalar Design: An Introduction Virendra Singh Associate - - PowerPoint PPT Presentation

▶

Dec 04, 2022 704 likes •867 views

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

SLIDE 1

CADSL

Superscalar Design:

An Introduction

Virendra Singh

Associate Professor Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay

http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in

EE-739: Processor Design

Lecture 21 (05 March 2013)

SLIDE 2

CADSL

EE-739@IITB

Cache: Advanced Optimizations

Small and simple first level caches
Critical timing path:
addressing tag memory, then
comparing tags, then
selecting correct set
Direct-mapped caches can overlap tag

compare and transmission of data

Lower associativity reduces power because

fewer cache lines are accessed

05 Mar 2013 2

SLIDE 3

CADSL

EE-739@IITB

L1 Size and Associativity

Access time vs. size and associativity

05 Mar 2013 3

SLIDE 4

CADSL

EE-739@IITB

L1 Size and Associativity

Energy per read vs. size and associativity

05 Mar 2013 4

SLIDE 5

CADSL

EE-739@IITB

Way Prediction

To improve hit time, predict the way to pre-set

mux

Mis-prediction gives longer hit time
Prediction accuracy
> 90% for two-way
> 80% for four-way
I-cache has better accuracy than D-cache
First used on MIPS R10000 in mid-90s
Used on ARM Cortex-A8
Extend to predict block as well
“Way selection”
Increases mis-prediction penalty

05 Mar 2013 5

SLIDE 6

CADSL

EE-739@IITB

Pipelining Cache

Pipeline cache access to improve

bandwidth

– Examples:

Pentium: 1 cycle
Pentium Pro – Pentium III: 2 cycles
Pentium 4 – Core i7: 4 cycles
Increases branch mis-prediction penalty
Makes it easier to increase associativity

05 Mar 2013 6

SLIDE 7

CADSL

EE-739@IITB

Multibanked Caches

Organize cache as independent banks

to support simultaneous access

– ARM Cortex-A8 supports 1-4 banks for L2 – Intel i7 supports 4 banks for L1 and 8 banks for L2

Interleave banks according to block

address

05 Mar 2013 7

SLIDE 8

CADSL

Wish list: Highway

05 Mar 2013 EE-739@IITB 8

SLIDE 9

CADSL

Single Lane Traffic

05 Mar 2013 EE-739@IITB 9

SLIDE 10

CADSL

Limits of Pipelining Limits of Pipelining

IBM RISC Experience

– Control and data dependences add 15% – Best case CPI of 1.15, IPC of 0.87 – Deeper pipelines (higher frequency) magnify dependence penalties

This analysis assumes 100% cache hit

rates

– Hit rates approach 100% for some programs – Many important programs have much worse hit rates

05 Mar 2013 EE-739@IITB 10

SLIDE 11

CADSL

Limits on Instruction Level Parallelism (ILP)

Weiss and Smith [1984] 1.58 Sohi and Vajapeyam [1987] 1.81 Tjaden and Flynn [1970] 1.86 (Flynn’s bottleneck) Tjaden and Flynn [1973] 1.96 Uht [1986] 2.00 Smith et al. [1989] 2.00 Jouppi and Wall [1988] 2.40 Johnson [1991] 2.50 Acosta et al. [1986] 2.79 Wedig [1982] 3.00 Butler et al. [1991] 5.8 Melvin and Patt [1991] 6 Wall [1991] 7 (Jouppi disagreed) Kuck et al. [1972] 8 Riseman and Foster [1972] 51 (no control dependences) Nicolau and Fisher [1984] 90 (Fisher’s optimism) 05 Mar 2013 EE-739@IITB 11

SLIDE 12

CADSL

Superscalar Proposal

Go beyond single instruction pipeline,

achieve IPC > 1

Dispatch multiple instructions per cycle
Provide more generally applicable form of

concurrency (not just vectors)

Geared for sequential code that is hard to

parallelize otherwise

Exploit fine-grained or instruction-level

parallelism (ILP)

05 Mar 2013 EE-739@IITB 12

SLIDE 13

CADSL

Motivation for Superscalar Motivation for Superscalar [Agerwala and Cocke] [Agerwala and Cocke]

Typical Range Speedup jumps from 3 to 4.3 for N=6, f=0.8, but s =2 instead of s=1 (scalar)

05 Mar 2013 EE-739@IITB 13

SLIDE 14

CADSL

Classifying ILP Machines Classifying ILP Machines

[Jouppi, DECWRL 1991]

Baseline scalar RISC

– Issue parallelism = IP = 1 – Operation latency = OP = 1 – Peak IPC = 1

1 2 3 4 5 6 IF DE EX WB 1 2 3 4 5 6 7 8 9 TIME IN CYCLES (OF BASELINE MACHINE) SUCCESSIVE INSTRUCTIONS

05 Mar 2013 EE-739@IITB 14