ece232 hardware organization and design
play

ECE232: Hardware Organization and Design Lecture 22: Introduction to - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main memory Three types of caches


  1. ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB

  2. Overview  Caches hold a subset of data from the main memory  Three types of caches • Direct mapped • Set associative • Fully associative  Today: Direct mapped • Each memory value can only be in one place in the cache • Is it there (Hit?) • Or is it not there (Miss?) ECE232: Introduction to Caches 2

  3. Direct Mapped Cache - Textbook Location determined by address  Direct mapped: only one choice  (Block address) modulo (#Blocks in cache) • #Blocks is a  power of 2 Use low-order  address bits ECE232: Introduction to Caches 3

  4. Direct mapped cache (assume 1 byte/Block) 4-Block Direct Cache Block 0 can be  Memory Mapped Cache occupied by data from Memory blocks • 0000 2 0 0 0, 4, 8, 12 1 1 2 2 Cache Block 1 can be 3 3  0100 2 4 occupied by data from 5 Memory blocks • Cache 6 1, 5, 9, 13 Index 7 1000 2 8 Cache Block 2 can be  9 occupied by data from 10 Memory blocks • 11 2, 6, 10, 14 1100 2 12 13 14 Cache Block 3 can be  15 occupied by data from Memory blocks • 3, 7, 11, 15 Block Index ECE232: Introduction to Caches 4

  5. Direct Mapped Cache – Index and Tag Memory 1 byte 00 00 2 0 0 1 1 2 2 3 3 01 00 2 4 5 Cache 6 Index Memory block 7 10 00 2 address 8 9 10 index tag 11 11 00 2 12 13 index determines block in cache  14 index = (address) mod (# blocks)  15 The number of cache blocks is power  of 2  cache index is the lower n bits Block of memory address Index ECE232: Introduction to Caches 5

  6. Direct Mapped w/Tag Memory tag 0 0 1 1 00 10 2 11 2 3 3 4 5 01 10 6 Cache Memory block 7 Index 8 address 9 10 10 10 index tag 11 12 tag determines which memory  13 block occupies cache block 11 10 14 15 hit: cache tag field = tag bits of  address Block miss: tag field  tag bits of  Index address ECE232: Introduction to Caches 6

  7. Direct Mapped Cache Simplest mapping is a direct mapped cache  Each memory address is associated with one possible block  within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache ECE232: Introduction to Caches 7

  8. Finding Item within Block In reality, a cache block consists of a number of bytes/words  to (1) increase cache hit due to locality property and (2) reduce the cache miss time Given an address of item, index tells which block of cache to  look in Then, how to find requested item within the cache block?  Or, equivalently, “What is the byte offset of the item within  the cache block?” ECE232: Introduction to Caches 8

  9. Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset  within the indexed block TAG INDEX OFFSET Tag to check if have Index to select a Byte offset correct block block in cache Example: Block size of 8 bytes; select byte 4 (or 2 nd word)  tag Memory address 0 11 1 2 11 01 100 3 Cache Index ECE232: Introduction to Caches 9

  10. Accessing data in a direct mapped cache Three types of events:  cache hit: cache block is valid and contains proper address,  so read desired word cache miss: nothing in cache in appropriate block, so fetch  from memory cache miss, block replacement: wrong data is in cache at  appropriate block, so discard it and fetch desired data from memory Cache Access Procedure:  • (1) Use Index bits to select cache block • (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits • (3) If they match, use the offset to read out the word/byte ECE232: Introduction to Caches 10

  11. Tags and Valid Bits How do we know which particular block is stored in a cache  location? Store block address as well as the data • Actually, only need the high-order bits • Called the tag • What if there is no data in a location?  Valid bit: 1 = present, 0 = not present • Initially 0 • ECE232: Introduction to Caches 11

  12. Cache Example 8-blocks, 1 byte/block, direct mapped  Initial state  Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N ECE232: Introduction to Caches 12

  13. Cache Example Addr Binary Hit/mis Cache addr s block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 13

  14. Cache Example Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 14

  15. Cache Example Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 15

  16. Cache Example Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 16

  17. Cache Example Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 17

  18. Example: Larger Block Size 64 blocks, 16 bytes/block  To what block number does address 1200 map? • Block address =  1200/16  = 75  Block number = 75 modulo 64 = 11  31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits ECE232: Introduction to Caches 18

  19. Block Size Considerations Larger blocks should reduce miss rate  Due to spatial locality • But in a fixed-sized cache  Larger blocks  fewer of them • • More competition  increased miss rate Larger blocks  pollution • Larger miss penalty  Can override benefit of reduced miss rate • Early restart and critical-word-first can help • ECE232: Introduction to Caches 19

  20. Cache Misses On cache hit, CPU proceeds normally  On cache miss  Stall the CPU pipeline • Fetch block from next level of hierarchy • Instruction cache miss • • Restart instruction fetch Data cache miss • • Complete data access ECE232: Introduction to Caches 20

  21. Write-Through On data-write hit, could just update the block in cache  But then cache and memory would be inconsistent • Write through: also update memory  But makes writes take longer  e.g., if base CPI = 1, 10% of instructions are stores, write to • memory takes 100 cycles Effective CPI = 1 + 0.1×100 = 11 • Solution: write buffer  Holds data waiting to be written to memory • CPU continues immediately • • Only stalls on write if write buffer is already full ECE232: Introduction to Caches 21

  22. Write-Back Alternative: On data-write hit, just update the block in cache  Keep track of whether each block is dirty • When a dirty block is replaced  Write it back to memory • Can use a write buffer to allow replacing block to be read first • ECE232: Introduction to Caches 22

  23. Measuring Cache Performance Components of CPU time  Program execution cycles • • Includes cache hit time Memory stall cycles • • Mainly from cache misses With simplifying assumptions:  Memory stall cycles Memory accesses    Miss rate Miss penalty Program Instructio ns Misses    Miss penalty Program Instructio n ECE232: Introduction to Caches 23

  24. Average Access Time Hit time is also important for performance  Average memory access time (AMAT)  AMAT = Hit time + Miss rate × Miss penalty • Example  CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 • cycles, I-cache miss rate = 5% AMAT = 1 + 0.05 × 20 = 2ns • • 2 cycles per instruction ECE232: Introduction to Caches 24

  25. Summary  Today: Direct mapped cache  Performance: tied to whether values are located in the cache • Cache miss = bad performance  Need to understand how to numerically determine system performance based on cache hit rate  Why might direct mapped caches be bad • Lots of data map to same location in cache  Idea • Maybe we should have multiple locations for each data value • Next time: set associative ECE232: Introduction to Caches 25

Recommend


More recommend