IC220: Caching 1 (Chapter 5) 1
Memory, Cost, and Performance • Ideal World: we want a memory that is – Fast, – Big, & – Cheap! (Choose any two!) • Recent “real world” situation: SRAM access times are .5 – 2.5ns at cost of $500 to $1000 per GB. DRAM access times are 50-70ns at cost of $10 to $20 per GB. Flash memory access times are 5000-50,000 ns at cost of $0.75-$1 per GB Disk access times are 5 to 20 million ns at cost of $.05 to $0.10 per GB. • Solution: CACHING 2
Caching Concepts and Terminology Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag 3
Principle of Locality • Basic observations on how memory tends to be accessed in computer programs: • If an item is referenced, 1. it will tend to be referenced again soon (TEMPORAL LOCALITY) 2. nearby items will tend to be referenced soon. (SPATIAL LOCALITY) 4
Caching Basics Cache consists of N bytes (N=8 in our examples here) To read or write to a given address: 1. First look in cache. 2. If it’s there, we have a HIT. 3. Otherwise, it’s a MISS and you must fetch from main memory. – If cache is full, must EVICT to insert the new data. Which cache line should be evicted? • Random • Least-Recently Used (LRU) (our examples will always follow LRU) 5
Example 1 – Fully associative (no blocking) Memory Cache (N = 8) Processor 16 67 1. Read 42 17 3 2. Read 43 Address Data Address Data 3. Read 18 18 27 4. Read 43 19 32 5. Read 17 ... 6. Read 42 42 78 7. Read 19 43 59 8. Read 45 44 24 9. Read 44 45 56 10. Read 46 46 87 11. Read 43 47 36 12. Read 47 48 98 13. Read 18 49 59 Total hits? Total misses? 6
Analysis of Example 1 (FA, no blocking) # of misses • Miss Rate = = # of accesses • Two kinds of misses: – Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) • What was good: • What was bad: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 7
How to handle a miss? • Things we need to do: 1. Stall the CPU until miss completes 2. (If cache is full) Evict old data • Random or LRU 3. Fetch the needed data from memory 4. Restart the CPU Time for this is called the Miss Penalty. 8
Blocking Goal: Exploit spatial locality Main idea: • Group memory into blocks • B bytes in each block of memory • B bytes in each cache line • Always fetch and evict entire blocks (even if not all data was requested yet) • Position within cache line determined by Byte Offset: 9
Example 2 – Fully associative with blocking Memory Cache (N=8, B=2) Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K Data 3. Read 18 B 18 27 L Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 6. Read 42 B 42 78 L O 7. Read 19 43 59 C K 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 10. Read 46 B 46 87 L 11. Read 43 O 47 36 C K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 10
Analysis of Example 2 (FA with blocking) # of misses • Miss Rate = = # of accesses • Advantages: • Disadvantages: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 11
How big should the blocks be? • Keeping cache size N fixed, – Smaller B means: – Larger B means: • Increasing block size tends to decrease miss rate, up to a point: 12
Improving Hit Time with Direct Mapping Problem: How to determine whether a block is in cache? Fully associative (previous examples): • Requested block could be anywhere in cache • Must search through all cache lines, or keep an extra data structure Direct Mapped Cache: • Assign index 0 through (N/B - 1) to each cache line • Each memory block is assigned one possible index • Formula: 13
Example 3 – Direct-Mapped Cache (N=8, B=2) Memory Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K Data 3. Read 18 B 18 27 L Index Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 0 6. Read 42 B 42 78 L O 7. Read 19 43 59 C K 1 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 2 10. Read 46 B 46 87 L 11. Read 43 O 47 36 C 3 K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 14
Analysis of Example 3 (Direct mapped) # of misses • Miss Rate = = # of accesses • THREE kinds of misses: – Compulsory miss (first time accessing) – Capacity miss (not enough room, got evicted) – Conflict miss (same index, would have enough room in FA) • Advantages: • Disadvantages: • Measurement concepts: – Hit Time: How long to look up something that is in cache – Miss Penalty: How long to fetch something not in cache 15
Compromise: Set-Associative Goal: Combine low miss rate of FA with good hit time of DM Fully associative (FA): • Requested block could be anywhere in cache Direct Mapped (DM): • Assign index 0 through (N/B - 1) to each cache line • Each memory block is assigned one possible index k-way Set Associative Cache: • Group cache lines into “sets” of k lines each • Each set has a DM index, 0 through N/(kB) - 1 • Within each set, addresses can go anywhere (associative) • Formula: 16
Example 4 – 2-way Set-Associative Cache (N=8, B=2, k=2) Memory Processor B 16 67 1. Read 42 L O 17 3 C 2. Read 43 K 3. Read 18 B 18 27 Data L Index Address O 4. Read 43 19 32 C Offset 0 Offset 1 K 5. Read 17 ... 6. Read 42 B 42 78 L O 0 7. Read 19 43 59 C K 8. Read 45 B 44 24 L 9. Read 44 O 45 56 C K 10. Read 46 B 46 87 L 11. Read 43 1 O 47 36 C K 12. Read 47 B 48 98 L 13. Read 18 O 49 59 C Total hits? K Total misses? 17
Performance Tradeoffs • Block size – Advantages of small B: – Advantages of large B: – Typical values: 64 bytes (bytes, not bits!) • Associativity – Advantages of small k (DM): – Advantages of large k (SA, FA): – Typical values: 4, 8, 12 18
What to do on a write? Cache (N=5, B=1, k=1) Processor Memory 1. Read 24 Address Data 20 7 2. Write 5 to 24 0 21 3 3. Read 26 1 22 27 23 32 2 4. Write 8 to 25 24 101 3 5. Write 9 to 21 25 78 4 6. Write 2 to 24 26 34 7. Read 29 27 87 28 53 29 93 19
Write Strategies • Write-through: Update memory immediately • Write-back: Update memory on eviction 20
Write-back example Cache (N=5, B=1, k=1) Processor Memory 1. Read 24 Address Data 20 7 2. Write 5 to 24 0 21 3 3. Read 26 1 22 27 23 32 2 4. Write 8 to 25 24 101 3 5. Write 9 to 21 25 78 4 6. Write 2 to 24 26 34 7. Read 29 27 87 28 53 29 93 21
Efficient Bit Manipulation Given 2-way associative cache with N=64 and B=8, what is the set index for address 153? Formulas : NEW: (assuming dealing with powers-of-2) a. Express in binary. (153 10 = 99 16 = 10011001 2 ) b. Grab the right bits! ByteOffset = Index = Tag = 22
Real Cache with Efficient Bit Manipulation 23
Example #1: Bit Manipulation 1. Suppose direct-mapped cache has: – B=8 byte blocks – 2KiB cache Show how to break the following address into the tag, index, & byte offset. 0000 1000 0101 1100 0001 0001 0111 1001 2. Same cache, but now 4-way associative. How does this change things? 0000 1000 0101 1100 0001 0001 0111 1001 24
Example #2: Bit Manipulation Suppose a direct-mapped cache divides addresses as follows: 21 bits 7 bits 4 bits tag index byte offset What is the block size? The number of blocks? Total size of the cache? (usually refers to size of data only) 25
Review: Main concepts Locality: Temporal and Spatial Each access: Hit or Miss Eviction strategies: Random or Least-Recently Used (LRU) Reasons for a Miss: Compulsory, Capacity, or Conflict Measurements: Miss Rate, Hit Time, Miss Penalty Cache types: Fully Associative, Direct-Mapped, or Set Associative Parameters: N (total size), B (size of block), k (associativity) Write strategies: Write-through or Write-back Implementation details: Stall, Valid bit, Dirty bit, Tag 26
Recommend
More recommend