ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB
Overview Caches hold a subset of data from the main memory Three types of caches • Direct mapped • Set associative • Fully associative Today: Direct mapped • Each memory value can only be in one place in the cache • Is it there (Hit?) • Or is it not there (Miss?) ECE232: Introduction to Caches 2
Direct Mapped Cache - Textbook Location determined by address Direct mapped: only one choice (Block address) modulo (#Blocks in cache) • #Blocks is a power of 2 Use low-order address bits ECE232: Introduction to Caches 3
Direct mapped cache (assume 1 byte/Block) 4-Block Direct Cache Block 0 can be Memory Mapped Cache occupied by data from Memory blocks • 0000 2 0 0 0, 4, 8, 12 1 1 2 2 Cache Block 1 can be 3 3 0100 2 4 occupied by data from 5 Memory blocks • Cache 6 1, 5, 9, 13 Index 7 1000 2 8 Cache Block 2 can be 9 occupied by data from 10 Memory blocks • 11 2, 6, 10, 14 1100 2 12 13 14 Cache Block 3 can be 15 occupied by data from Memory blocks • 3, 7, 11, 15 Block Index ECE232: Introduction to Caches 4
Direct Mapped Cache – Index and Tag Memory 1 byte 00 00 2 0 0 1 1 2 2 3 3 01 00 2 4 5 Cache 6 Index Memory block 7 10 00 2 address 8 9 10 index tag 11 11 00 2 12 13 index determines block in cache 14 index = (address) mod (# blocks) 15 The number of cache blocks is power of 2 cache index is the lower n bits Block of memory address Index ECE232: Introduction to Caches 5
Direct Mapped w/Tag Memory tag 0 0 1 1 00 10 2 11 2 3 3 4 5 01 10 6 Cache Memory block 7 Index 8 address 9 10 10 10 index tag 11 12 tag determines which memory 13 block occupies cache block 11 10 14 15 hit: cache tag field = tag bits of address Block miss: tag field tag bits of Index address ECE232: Introduction to Caches 6
Direct Mapped Cache Simplest mapping is a direct mapped cache Each memory address is associated with one possible block within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache ECE232: Introduction to Caches 7
Finding Item within Block In reality, a cache block consists of a number of bytes/words to (1) increase cache hit due to locality property and (2) reduce the cache miss time Given an address of item, index tells which block of cache to look in Then, how to find requested item within the cache block? Or, equivalently, “What is the byte offset of the item within the cache block?” ECE232: Introduction to Caches 8
Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset within the indexed block TAG INDEX OFFSET Tag to check if have Index to select a Byte offset correct block block in cache Example: Block size of 8 bytes; select byte 4 (or 2 nd word) tag Memory address 0 11 1 2 11 01 100 3 Cache Index ECE232: Introduction to Caches 9
Accessing data in a direct mapped cache Three types of events: cache hit: cache block is valid and contains proper address, so read desired word cache miss: nothing in cache in appropriate block, so fetch from memory cache miss, block replacement: wrong data is in cache at appropriate block, so discard it and fetch desired data from memory Cache Access Procedure: • (1) Use Index bits to select cache block • (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits • (3) If they match, use the offset to read out the word/byte ECE232: Introduction to Caches 10
Tags and Valid Bits How do we know which particular block is stored in a cache location? Store block address as well as the data • Actually, only need the high-order bits • Called the tag • What if there is no data in a location? Valid bit: 1 = present, 0 = not present • Initially 0 • ECE232: Introduction to Caches 11
Cache Example 8-blocks, 1 byte/block, direct mapped Initial state Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N ECE232: Introduction to Caches 12
Cache Example Addr Binary Hit/mis Cache addr s block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 13
Cache Example Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 14
Cache Example Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 15
Cache Example Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 16
Cache Example Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 17
Example: Larger Block Size 64 blocks, 16 bytes/block To what block number does address 1200 map? • Block address = 1200/16 = 75 Block number = 75 modulo 64 = 11 31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits ECE232: Introduction to Caches 18
Block Size Considerations Larger blocks should reduce miss rate Due to spatial locality • But in a fixed-sized cache Larger blocks fewer of them • • More competition increased miss rate Larger blocks pollution • Larger miss penalty Can override benefit of reduced miss rate • Early restart and critical-word-first can help • ECE232: Introduction to Caches 19
Cache Misses On cache hit, CPU proceeds normally On cache miss Stall the CPU pipeline • Fetch block from next level of hierarchy • Instruction cache miss • • Restart instruction fetch Data cache miss • • Complete data access ECE232: Introduction to Caches 20
Write-Through On data-write hit, could just update the block in cache But then cache and memory would be inconsistent • Write through: also update memory But makes writes take longer e.g., if base CPI = 1, 10% of instructions are stores, write to • memory takes 100 cycles Effective CPI = 1 + 0.1×100 = 11 • Solution: write buffer Holds data waiting to be written to memory • CPU continues immediately • • Only stalls on write if write buffer is already full ECE232: Introduction to Caches 21
Write-Back Alternative: On data-write hit, just update the block in cache Keep track of whether each block is dirty • When a dirty block is replaced Write it back to memory • Can use a write buffer to allow replacing block to be read first • ECE232: Introduction to Caches 22
Measuring Cache Performance Components of CPU time Program execution cycles • • Includes cache hit time Memory stall cycles • • Mainly from cache misses With simplifying assumptions: Memory stall cycles Memory accesses Miss rate Miss penalty Program Instructio ns Misses Miss penalty Program Instructio n ECE232: Introduction to Caches 23
Average Access Time Hit time is also important for performance Average memory access time (AMAT) AMAT = Hit time + Miss rate × Miss penalty • Example CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 • cycles, I-cache miss rate = 5% AMAT = 1 + 0.05 × 20 = 2ns • • 2 cycles per instruction ECE232: Introduction to Caches 24
Summary Today: Direct mapped cache Performance: tied to whether values are located in the cache • Cache miss = bad performance Need to understand how to numerically determine system performance based on cache hit rate Why might direct mapped caches be bad • Lots of data map to same location in cache Idea • Maybe we should have multiple locations for each data value • Next time: set associative ECE232: Introduction to Caches 25
Recommend
More recommend