Superscalar Design: An Introduction Virendra Singh Associate - PowerPoint PPT Presentation

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in EE-739: Processor Design Lecture 23 (11 March 2013) CADSL

Superscalar Pipeline Stages Superscalar Pipeline Stages Fetch Instruction Buffer Decode In Program Order Dispatch Buffer Dispatch Issuing Buffer Out Execute of Order Completion Buffer Complete In Program Store Buffer Order Retire 11 Mar 2013 EE-739@IITB 2 CADSL

Superscalar Architecture  Wide pipelines for enhanced throughput  ILP is not necessarily exploited by widening the pipelines and adding more resources  Processor policies towards fetching decoding, and executing instruction have significant effect on its ability to discover instructions which can be executed concurrently  Instruction issue policy limits or enhances performance because it determines the processor’s look ahead capability 07 Mar 2013 EE-739@IITB 3 CADSL

Highway 11 Mar 2013 EE-739@IITB 4 CADSL

Bad Traffic 11 Mar 2013 EE-739@IITB 5 CADSL

Instruction Flow Instruction Flow  Objective: Fetch multiple instructions per cycle • Challenges : PC  Branches: control dependences Instruction Memory  Branch target misalignment 3 instructions fetched  Instruction cache misses 11 Mar 2013 EE-739@IITB 6 CADSL

Instruction Fetch  Fetch s instructions from I-cache  I-Cache must be wide enough that each row of the I-Cache array can store s instructions and that an entire row can be accessed  Fetch width = Row width  Assume access latency is 1 cycle 11 Mar 2013 EE-739@IITB 7 CADSL

I-Cache Organization I-Cache Organization Tag Tag D Tag E C Tag 1 cache line = 1 physical row 11 Mar 2013 EE-739@IITB 8 CADSL

I-Cache Organization I-Cache Organization Tag D E Tag C Tag 1 cache line = 2 physical rows 11 Mar 2013 EE-739@IITB 9 CADSL

Instruction Flow Instruction Flow  Objective: Fetch multiple instructions per cycle • Challenges: – Branches: control dependences – Branch target misalignment – Instruction cache misses • Solutions – Code alignment (static vs. dynamic) – Prediction/speculation 11 Mar 2013 EE-739@IITB 10 CADSL

Fetch Alignment Fetch Alignment 11 Mar 2013 EE-739@IITB 11 CADSL

Instruction Fetch  2 – way set associative I-Cache with a line size of 16 instructions (64 bytes)  Each row of the I-Cache stores 4 associative sets 9two per set) of instructions  Each line of I-cache spans four physical rows  Physical I-cache array is actually composed of 4 independent sub-arrays  One instruction can be accessed form one array 11 Mar 2013 EE-739@IITB 12 CADSL

RIOS-I Fetch Hardware RIOS-I Fetch Hardware 11 Mar 2013 EE-739@IITB 13 CADSL

Issues in Decoding Issues in Decoding • Primary Tasks  Identify individual instructions (!)  Determine instruction types  Determine dependences between instructions • Two important factors  Instruction set architecture  Pipeline width 11 Mar 2013 EE-739@IITB 14 CADSL

Pentium Pro Fetch/Decode Pentium Pro Fetch/Decode 11 Mar 2013 EE-739@IITB 15 CADSL

Thank You 11 Mar 2013 EE-739@IITB 16 CADSL

Superscalar Design: An Introduction Virendra Singh Associate - PowerPoint PPT Presentation

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

Out- -of of- -Order Order Out Tomasulos Algorithm Superscalar CPU Superscalar CPU -

Out- -of of- -Order Order Out Superscalar CPU Superscalar CPU Cliff Frey and Vicky Liu May

Superscalar Processors Raul Queiroz Feitosa Parts of these slides are from the support material

A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level

1 Register Renaming Examples Register Mapping Status Loop: Renamed dynamic instructions: R1

CIS 371 Computer Organization and Design Unit 9: Superscalar Pipelines Slides developed by Milo

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture

Sequential Presentation Of Long Instructions Limits of pipelining, The case for superscalar,

Banked Multiported Register Files for High-Frequency Superscalar Microprocessors Jessica H. Tseng

FabScalar RISC-V Rangeen Basu Roy Chowdhury Anil Kumar Kannepalli Eric Rotenberg FabScalar

Caches Out-of-order execution Data flow model Samira Khan Superscalar processor March

Superscalar Pipelines Slides developed by Joe Devietti, Milo Martin & Amir Roth at U. Penn

Task Superscalar: Using Processors as Functional Units Yoav Etsion Alex Ramirez Rosa M.

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai

Multiple Instruction Issue Multiple instructions issued each cycle a processor that can

CPU ORGANIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

BO5-2 Calorimeter Trigger 402.06.03 W. H. Smith, U. Wisconsin-Madison L3 Manager, HL-LHC

5. Applications of the Integral 5.1 Area Under Curves 5.2 Average Value 5.3 Growth and Decay

Organization Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan

PERFORMANCE OPTIMISATION Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Hardware design

Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Khler,

ECE321 Electronics I Fall 2006 Professor James E. Morris Lecture 3 3 rd October, 2006

Sambuz

Useful Links

Newsletter

Mail Us

Superscalar Design: An Introduction Virendra Singh Associate - PowerPoint PPT Presentation

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture and D ependable S ystems L ab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail:

Out- -of of- -Order Order Out Tomasulos Algorithm Superscalar CPU Superscalar CPU -

Out- -of of- -Order Order Out Superscalar CPU Superscalar CPU Cliff Frey and Vicky Liu May

Superscalar Processors Raul Queiroz Feitosa Parts of these slides are from the support material

A Fault Tolerant Superscalar Processor 1 [Based on Coverage of a Microarchitecture-level

1 Register Renaming Examples Register Mapping Status Loop: Renamed dynamic instructions: R1

CIS 371 Computer Organization and Design Unit 9: Superscalar Pipelines Slides developed by Milo

Superscalar Design: Instruction Flow Techniques Virendra Singh Associate Professor C omputer A

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture

Superscalar Design: An Introduction Virendra Singh Associate Professor C omputer A rchitecture

Sequential Presentation Of Long Instructions Limits of pipelining, The case for superscalar,

Banked Multiported Register Files for High-Frequency Superscalar Microprocessors Jessica H. Tseng

FabScalar RISC-V Rangeen Basu Roy Chowdhury Anil Kumar Kannepalli Eric Rotenberg FabScalar

Caches Out-of-order execution Data flow model Samira Khan Superscalar processor March

Superscalar Pipelines Slides developed by Joe Devietti, Milo Martin &amp; Amir Roth at U. Penn

Task Superscalar: Using Processors as Functional Units Yoav Etsion Alex Ramirez Rosa M.

CSC2/458 Parallel and Distributed Systems Automatic Parallelization in Hardware Sreepathi Pai

Multiple Instruction Issue Multiple instructions issued each cycle a processor that can

CPU ORGANIZATION Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah

BO5-2 Calorimeter Trigger 402.06.03 W. H. Smith, U. Wisconsin-Madison L3 Manager, HL-LHC

5. Applications of the Integral 5.1 Area Under Curves 5.2 Average Value 5.3 Growth and Decay

Organization Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan

PERFORMANCE OPTIMISATION Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Hardware design

Parallel Programming and Heterogeneous Computing Shared-Memory Hardware Max Plauth, Sven Khler,

ECE321 Electronics I Fall 2006 Professor James E. Morris Lecture 3 3 rd October, 2006

Sambuz

Useful Links

Newsletter

Mail Us

Superscalar Pipelines Slides developed by Joe Devietti, Milo Martin & Amir Roth at U. Penn