Chapter 4 Cache Memory Contents Computer memory system overview - PowerPoint PPT Presentation

Chapter 4 Cache Memory

Contents • Computer memory system overview —Characteristics of memory systems —Memory hierarchy • Cache memory principles • Elements of cache design —Cache size —Mapping function —Replacement algorithms —Write policy —Line size —Number of caches • Pentium 4 and PowerPC cache organizations

Key Points • Memory hierarchy —processor registers —cache —main memory —fixed hard disk —ZIP cartridges, optical disks, and tape • Going down the hierarchy —decreasing cost, increasing capacity, and slower access time • Principles of locality —during the execution of a program, memory references tend to cluster

4.1 Computer Memory System Overview • Characteristics of memory systems —Location —Capacity —Unit of transfer —Access method —Performance —Physical type —Physical characteristics – volatile/nonvolatile – erasable/nonerasable —Organization

Location • CPU • Internal —main memory —cache • External(secondary) —peripheral storage devices —disk, tape

Capacity • Word size —natural unit of organization —8, 16, 32, and 64 bits • Number of words —memory capacity

Unit of Transfer • Internal memory —Usually governed by data bus width • External memory —Usually a block which is much larger than a word

Access Methods (1) • Sequential —Start at the beginning and read through in order —Access time depends on location of data and previous location —e.g. tape • Direct —Individual blocks have unique address —Access is by jumping to vicinity plus sequential search —Access time depends on location of data and previous location —e.g. disk

Access Methods (2) • Random —Each location has a unique address —Access time is independent of location or previous access —e.g. RAM • Associative —Data is retrieved based on a portion of its contents rather than its address —Access time is independent of location or previous access —e.g. cache

Performance • Access time (latency) —For random-access memory – time between presenting the address and getting the valid data —For non-random-access memory – time to position the read-write head at the location • Memory Cycle time (primarily applied to random-access memory) —Time may be required for the memory to “recover” before next access – die out on signal lines – regenerate data if they are read destructively —access time + recover time • Transfer Rate —For random-access memory, equal to 1/(cycle time)

Performance • For non-random-access memory, the following relationship holds: T N = T A + N/R where T N = Average time to read or write N bits T A = Average access time N = Number of bits R = Transfer rate, in bits per second(bps)

Physical Types • Semiconductor —RAM, ROM • Magnetic —Disk, Tape • Optical —CD, CD-R, CD-RW, DVD

Physical Characteristics • Volatile/Nonvolatile • Erasable/Nonerasable

Questions on Memory Design • How much? —Capacity • How fast? —Time is money • How expensive?

Hierarchy List • Registers • L1 Cache • L2 Cache • Main memory • Disk cache • Disk • Optical • Tape

Memory Hierarchy - Diagram

As Going Dow n The Hierarchy • Decreasing cost per bit • Increasing capacity • Increasing access time • Decreasing frequency of access of memory by the processor

An Example • Suppose we have two levels of memory —L1 : 1000 words, 0.01 us access time —L2 : 100,000 words, 0.1 us access time —H = fraction of all memory accesses found in L1 —T1 = access time to L1 —T2 = access time to L2 • Suppose H = 0.95 —(0.95)(0.01 us) + (0.05)(0.01 us + 0.1 us) = 0.095 + 0.0055 = 0.015 us —average access time is much closer to 0.01 us

Principle of Locality • As going down the hierarchy, we had the decreasing frequency of access by the processor —this is possible due to the principle of locality • During the course of the execution of a program, memory references tend to cluster —programs contain loops and procedures – there are repeated references to a small set of instructions —operations on arrays involve access to a clustered set of data – there are repeated references to a small set of data

4.2 Cache Memory Principles • Cache —Small amount of fast memory local to processor —Sits between main memory and CPU

Cache/Main Memory Structure

Cache Read Operation • CPU requests contents of memory location • Check cache for this data • If present, get from cache (fast) • If not present, read required block from main memory to cache • Then deliver from cache to CPU • Cache includes tags to identify which block of main memory is in each cache slot

Cache Read Operation

4.3 Elements of Cache Design • Design issues —Size —Mapping Function – direct, associative, set associative —Replacement Algorithm – LRU, FIFO, LFU, Random —Write Policy – Write through, write back —Line Size —Number of Caches – single or two level – unified or split

Size Does Matter • Small enough to make it cost effective • Large enough for performance reasons —but larger caches tend to be slightly slower than small ones

Mapping Function • Fewer cache lines than main memory blocks —mapping is needed —also need to know which memory block is in cache • Techniques —Direct —Associative —Set associative • Example case —Cache size : 64 KByte —Line size : 4 Bytes – cache is organized as 16 K lines —Main memory size : 16 Mbytes – each byte is directly addressable by a 24-bit address

Direct Mapping • Maps each block into a possible cache line • Mapping function i = j modulo m where i = cache line number j = main memory block number m = number of lines in the cache • Address is in three parts —Least Significant w bits identify unique word —Most Significant s bits specify one memory block – these are split into a cache line field r and a tag s-r(most significant)

Direct Mapping - Address Structure • Address length = (s + w) bits Number of addressable units = 2 s+ w words or bytes • Block size = line size = 2 w words or bytes • Number of blocks in main memory = 2 s+ w /2 w = 2 s • Number of lines in cache = m = 2 r • • Size of tag = (s – r) bits

Direct Mapping - Address Structure Tag s-r Line or Slot r w 2 8 14 • 24 bit address(22 + 2) • 2 bit word identifier (4 bytes in a block) • 22 bit block identifier — 8 bit tag (= 22-14) — 14 bit slot or line • No two blocks mapping into the same line have the same tag field

Direct Mapping - Cache Line Mapping Cache line Main Memory blocks assigned 0, m, 2m, 3m…2 s -m 0 1,m+ 1, 2m+ 1…2 s -m+ 1 1 m-1, 2m-1,3m-1…2 s -1 m-1

Direct Mapping - Cache Line Mapping Cache line Starting memory address of block 0 000000, 010000,…, FF0000 1 000004, 010004,…, FF0004 m-1 00FFFC, 01FFFC,…, FFFFFC

Direct Mapping - Cache Organization

Direct Mapping Example

Direct Mapping Pros & Cons • Simple and inexpensive to implement • Fixed cache location for any given block —If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Associative Mapping • A main memory block can be loaded into any line of cache • Memory address is interpreted as a tag and a word field —Tag field uniquely identifies a block of memory • Every line’s tag is simultaneously examined for a match —Cache searching gets complex and expensive

Associative Mapping - Address Structure • Address length = (s + w) bits Number of addressable units = 2 s+ w words or bytes • Block size = line size = 2 w words or bytes • Number of blocks in main memory = 2 s+ w /2 w = 2 s • • Number of lines in cache = cannot specify using s or w • Size of tag = s bits

Associative Mapping - Address Structure Word Tag 22 bit 2 bit • 22 bit tag stored with each 32 bit block of data • Compare tag field with tag entry in cache to check for hit • Least significant 2 bits of address identify which byte is required from 32 bit data block

Fully Associative Cache Organization

Associative Mapping - Example

Associative Mapping Pros & Cons • Flexible as to which block to replace when a new block is read into the cache —need to select one which is not going to be used in the near future • Complex circuitry is required to examine the tags of all cache lines

Set Associative Mapping • A compromise of direct and associative methods • Cache is divided into a number of sets(v) • Each set contains a number of lines(k) • The relationships are m = v x k i = j modulo v where i = cache set number j = main memory block number m = number of lines in the cache

Chapter 4 Cache Memory Contents Computer memory system overview - PowerPoint PPT Presentation

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of memory systems Memory hierarchy Cache memory principles Elements of cache design Cache size Mapping function Replacement

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

CSCE 410/611: Virtualization ! Definitions, Terminology ! Why Virtual Machines? !

F LOW P ROPHET : Generic and Accurate Traffic Prediction for Data-parallel Cluster Computing Hao

1 Today VM as a Tool for Caching Address spaces Conceptually, virtual memory is an array

GOING PAPERLESS With CrossCountry App and Toolkit With Helen and Jose Diacono TODAY WE

Virtual Memory CS 351: Systems Programming Michael Saelee <lee@iit.edu> Computer Science

Set-Associative Caches Improve cache hit ratio by allowing a memory location to be placed in

Operating Systems ECE344 Ding Yuan Lecture Overview Today well cover more paging mechanisms:

WORKPLACE WELLBEING TEAM NUMBER 02 TEAM LEADER Sara Dickinson, Stantec TEAM MEMBERS Andy

Chapter 4 Cache Memory Contents Computer memory system overview - PowerPoint PPT Presentation

Chapter 4 Cache Memory Contents Computer memory system overview Characteristics of memory systems Memory hierarchy Cache memory principles Elements of cache design Cache size Mapping function Replacement

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Cache Systems CPU Main Main CPU Memory Memory 400MHz 10MHz Cache 10MHz Memory Hierarchy

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Virtual Memory 1 Memory Hierarchy Memory 4GB Cache 1M Registers 1K Question: What if

Lecture 23: Cache, Memory, Virtual Memory Todays topics: Cache examples, caching

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Cache Example Main memory: Byte addressable memory of size 4GB = 2 32 bytes Cache size: 64KB = 2 16

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

Distributed Memory and Cache Consistency Distributed Memory and Cache Consistency (some slides

CSCE 410/611: Virtualization ! Definitions, Terminology ! Why Virtual Machines? !

F LOW P ROPHET : Generic and Accurate Traffic Prediction for Data-parallel Cluster Computing Hao

1 Today VM as a Tool for Caching Address spaces Conceptually, virtual memory is an array

GOING PAPERLESS With CrossCountry App and Toolkit With Helen and Jose Diacono TODAY WE

Virtual Memory CS 351: Systems Programming Michael Saelee &lt;lee@iit.edu&gt; Computer Science

Set-Associative Caches Improve cache hit ratio by allowing a memory location to be placed in

Operating Systems ECE344 Ding Yuan Lecture Overview Today well cover more paging mechanisms:

WORKPLACE WELLBEING TEAM NUMBER 02 TEAM LEADER Sara Dickinson, Stantec TEAM MEMBERS Andy

Virtual Memory CS 351: Systems Programming Michael Saelee <lee@iit.edu> Computer Science