CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 - PowerPoint PPT Presentation

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html

Memory Hierarchy ● Two principles: – Smaller is faster – Principle of locality ● Processor speed grows much faster than memory speed ● Registers – Cache – Memory – Disk – Upper level vs. lower level ● Cache design

Cache Design Questions ● Cache is arranged in terms of blocks – To take advantage of spatial locality ● Design choices: – Q1: block placement – where to place a block in upper level? – Q2: block identification – how to find a block in upper level? – Q3: block replacement – which block to replace on a miss? – Q4: write strategy – what happens on a write?

Block Placement: Fully Associative 0 8 11 Block 11 can go anywhere 16 Cache 24 Memory

Block Placement: Direct 0 8 Block 11 can 11 go only in block number 16 11 mod 8 Cache 24 Memory

Block Placement: Set Associative 0 8 Block 11 can 11 go in set number 16 11 mod 4 Cache 24 Memory

Continuum of Choices ● Memory has n blocks, cache has m blocks ● Fully associative is the same as set associative with one set ( m -way set associative) ● Direct placement is the same as 1-way set associative (with m sets) ● Most processors use direct, 2-way/4-way set associative

Block Identification ● How many different blocks of memory can be mapped (at different times) to a cache block? ● Fully associative: n ● Direct: n/m ● k-way set associative: k*n/m ● Each cache block has a tag saying which block of memory is currently present in it – A valid bit is set to 0 if no memory block is in the cache block currently

Block Identification (continued) ● How many bits for the tag? log 2  k ∗ n / m  ● How many sets in cache? m / k ● How many bits to identify the correct set? log 2  m / k 

Block Identification (continued) ● How many blocks in memory? n , log 2  n  to represent block number in memory ● Given a memory address: Tag Index Block offset log 2  k  log 2  n − log 2  m  log 2  block-size  log 2  m − log 2  k  – Select set using index, block from set using tag – Select location from block using block offset – tag + index = block address

Block Replacement Policy ● Cache miss ==> bring block onto cache – What if no free block in set? – Need to replace a block ● Possible policies: – Random – Least-Recently Used (LRU) ● Lesser miss-rate, but harder to implement

Replacement Policy Performance 2-way LRU 2-way Random 4-way LRU 4-way Random 8-way LRU 8-way Random 16KB Cache size 64KB 256KB 0.00% 1.00% 2.00% 3.00% 4.00% 5.00% 6.00% Cache miss rate

Write Strategy ● Reads are dominant – All instructions are read – Even for data, loads dominate over stores ● Reads can be fast – Can read from multiple blocks while performing tag comparison – Cannot do the same with writes ● Should pay attention to write performance too!

When do Writes go to Memory? ● Write through: each write is mirrored to memory also – Easier to implement ● Write back: write to memory only when block is replaced – Faster writes – Some writes do not go to memory at all! – But, read miss may cause more delay ● Block being replaced has to be written back ● Optimize using dirty bit – Also, bad for multiprocessors and I/O

Write Stalls ● In write through, may have to stall waiting for write to complete – Called a write stall – Can employ a write buffer to enable the processor to proceed during the write-through

What to do on a Write Miss? ● Write-allocate (or, fetch on write): load block on a cache miss during a write ● No-write allocate (or, write around): just write directly to main memory ● Write-allocate usually goes with write-back, and no-write allocate goes with write-through

The Alpha AXP 21064 Cache ● 34-bit physical address – 29 bits for block address – 5 bits for block offset ● 8 KB cache, direct-mapped – 8 bits for index – 29 – 8 = 21 bits for tag

Steps in Memory Read ● Four steps: – Step-1: CPU puts out the address – Step-2: Index selection – Step-3: Tag comparison, read from data – Step-4: Data returned to CPU (assuming hit) ● This takes two cycles

Steps in Memory Write ● Write-through policy is used ● Write buffer with four entries – Each entry can have up to 4 words from the same block – Write merging: successive writes to the same block use the same write-buffer entry

Some More Details ● What happens on a miss? – Cache sends signal to CPU asking it to wait – No replacement policy required (direct mapped) – Write miss ==> write-around ● 8KB separate instruction cache

Separate versus Unified Cache ● Direct-mapped I-Cache D-Cache U-Cache cache, 32-byte 1KB 3.06% 24.61% 13.34% blocks, SPEC92, 2KB 2.26% 20.57% 9.78% on DECstation 4KB 1.78% 15.94% 7.24% 5000 8KB 1.10% 10.19% 4.57% ● Unified cache has 16KB 0.64% 6.47% 2.87% twice the size of I- 32KB 0.39% 4.82% 1.99% cache or D-cache 64KB 0.15% 3.77% 1.35% ● 75% instruction 128KB 0.02% 2.88% 0.95% references Miss-rates

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 - PowerPoint PPT Presentation

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Memory Hierarchy Two principles: Smaller is faster Principle of locality

CS422 Computer Architecture Spring 2004 Lecture 04, 06 Jan 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 23, 26 Mar 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 13, 17 Feb 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 15, 20 Feb 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 33, 22 Apr 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 02, 01 Jan 2004 Bhaskaran Raman Department of

Theory of Computation Textbook The Nature of Computation by Cristopher Moore and (CS

User Interface Design and Programming - CS422 Luc Renambot renambot@uic.edu Yiwen Sun

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

A New Golden Age for 1. Software advances can inspire architecture Computer Architecture:

Roadmap for Section 2.1. Architecture Overview Program Execution Environment Kernel Mode

Outline SI232 Introduction to Computer Class Survey / Role Call Architecture What is:

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

CSE 141-- Introduction to Computer Architecture Jeff Brown CSE 141, S2'06 Jeff Brown What is

Operating Systems: Operating Systems: Memory management Memory management Fall 2008 Fall 2008

Quality of Service and Asynchronous Transfer Mode in IP Internetworks Bruce A. Mah

Historical perspective From sequential computing to distributed computing and ...

Magnetic Reconnection: dynamics and particle acceleration J. F. Drake University of Maryland

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 - PowerPoint PPT Presentation

CS422 Computer Architecture Spring 2004 Lecture 18, 26 Feb 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html Memory Hierarchy Two principles: Smaller is faster Principle of locality

CS422 Computer Architecture Spring 2004 Lecture 04, 06 Jan 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 23, 26 Mar 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 13, 17 Feb 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 15, 20 Feb 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 33, 22 Apr 2004 Bhaskaran Raman Department of

CS422 Computer Architecture Spring 2004 Lecture 02, 01 Jan 2004 Bhaskaran Raman Department of

Theory of Computation Textbook The Nature of Computation by Cristopher Moore and (CS

User Interface Design and Programming - CS422 Luc Renambot renambot@uic.edu Yiwen Sun

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture &amp; Computer Architecture &amp;

Introduction to Software Architecture Reid Holmes Architecture Architecture is: All

CMS Strip Readout Architecture for SLHC OUTLINE brief review of LHC strip readout architecture p

A New Golden Age for 1. Software advances can inspire architecture Computer Architecture:

Roadmap for Section 2.1. Architecture Overview Program Execution Environment Kernel Mode

Outline SI232 Introduction to Computer Class Survey / Role Call Architecture What is:

Memory Hierarchy Design Memory Hierarchy Design Chapter 5 and Appendix C 1 Overview

CSE 141-- Introduction to Computer Architecture Jeff Brown CSE 141, S2'06 Jeff Brown What is

Operating Systems: Operating Systems: Memory management Memory management Fall 2008 Fall 2008

Quality of Service and Asynchronous Transfer Mode in IP Internetworks Bruce A. Mah

Historical perspective From sequential computing to distributed computing and ...

Magnetic Reconnection: dynamics and particle acceleration J. F. Drake University of Maryland

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &