Chapter 4 Cache Memory
Contents • Computer memory system overview —Characteristics of memory systems —Memory hierarchy • Cache memory principles • Elements of cache design —Cache size —Mapping function —Replacement algorithms —Write policy —Line size —Number of caches • Pentium 4 and PowerPC cache organizations
Key Points • Memory hierarchy —processor registers —cache —main memory —fixed hard disk —ZIP cartridges, optical disks, and tape • Going down the hierarchy —decreasing cost, increasing capacity, and slower access time • Principles of locality —during the execution of a program, memory references tend to cluster
4.1 Computer Memory System Overview • Characteristics of memory systems —Location —Capacity —Unit of transfer —Access method —Performance —Physical type —Physical characteristics – volatile/nonvolatile – erasable/nonerasable —Organization
Location • CPU • Internal —main memory —cache • External(secondary) —peripheral storage devices —disk, tape
Capacity • Word size —natural unit of organization —8, 16, 32, and 64 bits • Number of words —memory capacity
Unit of Transfer • Internal memory —Usually governed by data bus width • External memory —Usually a block which is much larger than a word
Access Methods (1) • Sequential —Start at the beginning and read through in order —Access time depends on location of data and previous location —e.g. tape • Direct —Individual blocks have unique address —Access is by jumping to vicinity plus sequential search —Access time depends on location of data and previous location —e.g. disk
Access Methods (2) • Random —Each location has a unique address —Access time is independent of location or previous access —e.g. RAM • Associative —Data is retrieved based on a portion of its contents rather than its address —Access time is independent of location or previous access —e.g. cache
Performance • Access time (latency) —For random-access memory – time between presenting the address and getting the valid data —For non-random-access memory – time to position the read-write head at the location • Memory Cycle time (primarily applied to random-access memory) —Time may be required for the memory to “recover” before next access – die out on signal lines – regenerate data if they are read destructively —access time + recover time • Transfer Rate —For random-access memory, equal to 1/(cycle time)
Performance • For non-random-access memory, the following relationship holds: T N = T A + N/R where T N = Average time to read or write N bits T A = Average access time N = Number of bits R = Transfer rate, in bits per second(bps)
Physical Types • Semiconductor —RAM, ROM • Magnetic —Disk, Tape • Optical —CD, CD-R, CD-RW, DVD
Physical Characteristics • Volatile/Nonvolatile • Erasable/Nonerasable
Questions on Memory Design • How much? —Capacity • How fast? —Time is money • How expensive?
Hierarchy List • Registers • L1 Cache • L2 Cache • Main memory • Disk cache • Disk • Optical • Tape
Memory Hierarchy - Diagram
As Going Dow n The Hierarchy • Decreasing cost per bit • Increasing capacity • Increasing access time • Decreasing frequency of access of memory by the processor
An Example • Suppose we have two levels of memory —L1 : 1000 words, 0.01 us access time —L2 : 100,000 words, 0.1 us access time —H = fraction of all memory accesses found in L1 —T1 = access time to L1 —T2 = access time to L2 • Suppose H = 0.95 —(0.95)(0.01 us) + (0.05)(0.01 us + 0.1 us) = 0.095 + 0.0055 = 0.015 us —average access time is much closer to 0.01 us
Principle of Locality • As going down the hierarchy, we had the decreasing frequency of access by the processor —this is possible due to the principle of locality • During the course of the execution of a program, memory references tend to cluster —programs contain loops and procedures – there are repeated references to a small set of instructions —operations on arrays involve access to a clustered set of data – there are repeated references to a small set of data
4.2 Cache Memory Principles • Cache —Small amount of fast memory local to processor —Sits between main memory and CPU
Cache/Main Memory Structure
Cache Read Operation • CPU requests contents of memory location • Check cache for this data • If present, get from cache (fast) • If not present, read required block from main memory to cache • Then deliver from cache to CPU • Cache includes tags to identify which block of main memory is in each cache slot
Cache Read Operation
4.3 Elements of Cache Design • Design issues —Size —Mapping Function – direct, associative, set associative —Replacement Algorithm – LRU, FIFO, LFU, Random —Write Policy – Write through, write back —Line Size —Number of Caches – single or two level – unified or split
Size Does Matter • Small enough to make it cost effective • Large enough for performance reasons —but larger caches tend to be slightly slower than small ones
Mapping Function • Fewer cache lines than main memory blocks —mapping is needed —also need to know which memory block is in cache • Techniques —Direct —Associative —Set associative • Example case —Cache size : 64 KByte —Line size : 4 Bytes – cache is organized as 16 K lines —Main memory size : 16 Mbytes – each byte is directly addressable by a 24-bit address
Direct Mapping • Maps each block into a possible cache line • Mapping function i = j modulo m where i = cache line number j = main memory block number m = number of lines in the cache • Address is in three parts —Least Significant w bits identify unique word —Most Significant s bits specify one memory block – these are split into a cache line field r and a tag s-r(most significant)
Direct Mapping - Address Structure • Address length = (s + w) bits Number of addressable units = 2 s+ w words or bytes • Block size = line size = 2 w words or bytes • Number of blocks in main memory = 2 s+ w /2 w = 2 s • Number of lines in cache = m = 2 r • • Size of tag = (s – r) bits
Direct Mapping - Address Structure Tag s-r Line or Slot r w 2 8 14 • 24 bit address(22 + 2) • 2 bit word identifier (4 bytes in a block) • 22 bit block identifier — 8 bit tag (= 22-14) — 14 bit slot or line • No two blocks mapping into the same line have the same tag field
Direct Mapping - Cache Line Mapping Cache line Main Memory blocks assigned 0, m, 2m, 3m…2 s -m 0 1,m+ 1, 2m+ 1…2 s -m+ 1 1 m-1, 2m-1,3m-1…2 s -1 m-1
Direct Mapping - Cache Line Mapping Cache line Starting memory address of block 0 000000, 010000,…, FF0000 1 000004, 010004,…, FF0004 m-1 00FFFC, 01FFFC,…, FFFFFC
Direct Mapping - Cache Organization
Direct Mapping Example
Direct Mapping Pros & Cons • Simple and inexpensive to implement • Fixed cache location for any given block —If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high
Associative Mapping • A main memory block can be loaded into any line of cache • Memory address is interpreted as a tag and a word field —Tag field uniquely identifies a block of memory • Every line’s tag is simultaneously examined for a match —Cache searching gets complex and expensive
Associative Mapping - Address Structure • Address length = (s + w) bits Number of addressable units = 2 s+ w words or bytes • Block size = line size = 2 w words or bytes • Number of blocks in main memory = 2 s+ w /2 w = 2 s • • Number of lines in cache = cannot specify using s or w • Size of tag = s bits
Associative Mapping - Address Structure Word Tag 22 bit 2 bit • 22 bit tag stored with each 32 bit block of data • Compare tag field with tag entry in cache to check for hit • Least significant 2 bits of address identify which byte is required from 32 bit data block
Fully Associative Cache Organization
Associative Mapping - Example
Associative Mapping Pros & Cons • Flexible as to which block to replace when a new block is read into the cache —need to select one which is not going to be used in the near future • Complex circuitry is required to examine the tags of all cache lines
Set Associative Mapping • A compromise of direct and associative methods • Cache is divided into a number of sets(v) • Each set contains a number of lines(k) • The relationships are m = v x k i = j modulo v where i = cache set number j = main memory block number m = number of lines in the cache
Recommend
More recommend