ADMIN • 12 week exam next Wed – Do practice problems before Monday • Homework due Friday – Last late turn-in Monday at 0800 Slide Set #16: Exploiting Memory Hierarchy • Chapter 7 Reading – 7.1-7.3 1 2 Memory, Cost, and Performance Locality • Ideal World: we want a memory that is • A principle that makes caching work – Fast, – Big, & • If an item is referenced, 1. it will tend to be referenced again soon – Cheap! why? • Real World: SRAM access times are .5 – 5ns at cost of $4000 to $10,000 per GB. 2004 DRAM access times are 50-70ns at cost of $100 to $200 per GB. Disk access times are 5 to 20 million ns at cost of $.50 to $2 per GB. 2. nearby items will tend to be referenced soon. • Solution? why? 3 4
Caching Basics Example – (Simplified) Direct Mapped Cache Memory Cache (N = 5) Processor • Definitions 1. Minimum unit of data: “block” or “cache line” 20 7 Address Data 1. Read 24 For now assume, block is 1 byte 21 3 2. Data requested is in the cache: 2. Read 25 22 27 3. Data requested is not in the cache: 0 3. Read 26 23 32 • Cache has a given number of blocks (N) 24 101 4. Read 24 1 25 78 Challenge: How to locate an item in the cache? • 5. Read 21 26 59 – Simplest way: 2 6. Read 26 27 24 Cache index = (Data address) mod N 7. Read 24 28 56 e.g., N = 10, Address = 1024, Index = 3 29 87 e.g., N = 16, Address = 33, Index = 8. Read 26 30 36 – Implications 4 9. Read 27 31 98 For a given data address, there is __________ possible cache index But for a given cache index there are __________ possible data items that Total hits? could go there Total misses? 5 6 Exercise #1 – Direct Mapped Cache Exercise #2 – Direct Mapped Cache Cache ( N = 4 ) Memory Cache (N = 5) Processor Memory Processor 20 7 Address Data 20 7 Address Data 1. Read 30 1. Read 30 21 3 21 3 2. Read 31 2. Read 31 22 27 22 27 0 0 3. Read 30 3. Read 30 23 32 23 32 24 101 24 101 4. Read 26 4. Read 26 1 1 25 78 25 78 5. Read 25 5. Read 25 26 59 26 59 2 2 6. Read 28 6. Read 28 27 24 27 24 7. Read 23 7. Read 23 28 56 28 56 3 3 29 87 29 87 8. Read 25 8. Read 25 30 36 30 36 4 9. Read 28 9. Read 28 31 98 31 98 Total hits? Total hits? Total misses? 7 Total misses? 8
Exercise #3 – Stretch Exercise #4 – Stretch • Suppose we want to minimize the total number of bits needed to • Look back at Exercises 1 and 2 and identify at least two different implement a cache with N blocks. What is inefficient about our kinds of reasons for why there might be a cache miss. current design? • How might you possibly address each type of miss? • (Hint – consider bigger addresses) 9 10 Improving our basic cache Approach #1 – Increase Block Size � � ByteAddres s Index = � � mod N � BytesPerBl ock � • Why did we miss? How can we fix it? Memory Cache Processor 20 7 Address Data 1. Read 24 21 3 2. Read 25 22 27 3. Read 26 23 32 0 24 101 4. Read 24 25 78 1 5. Read 21 26 59 6. Read 18 27 24 2 28 56 7. Read 24 29 87 8. Read 27 30 36 3 9. Read 26 31 98 11 12
Approach #2 – Add Associativity Performance Impact – Part 1 � � ByteAddres s N • To be fair, want to compare cache organizations with same data size Index = � � mod � BytesPerBl ock � Associativ ity – E.g., increasing block size must decrease number blocks (N) Memory Cache Processor Overall, increasing block size tends to decrease miss rate: • 40% 20 7 Address Data 1. Read 24 21 3 35% 2. Read 25 22 27 30% 3. Read 26 23 32 25% Miss rate 24 101 4. Read 24 20% 0 25 78 15% 5. Read 21 26 59 10% 6. Read 18 27 24 5% 28 56 7. Read 24 0% 1 4 16 64 256 29 87 8. Read 27 1 KB � Block size (bytes) 30 36 8 KB � 9. Read 26 31 98 16 KB � 64 KB � 256 KB 13 14 Performance Impact – Part 2 Exercise #1 – Show final cache and total hits Block size = 2, N = 4 • Increasing block size… Memory Cache Processor – May help by exploiting _____________locality 20 7 Address Data 1. Read 16 – But, may hurt by increasing _____________ 21 3 2. Read 14 (due to smaller __________ ) 22 27 3. Read 17 – Lesson – want block size > 1, but not too large 23 32 0 24 101 4. Read 13 25 78 • Increasing associativity 1 5. Read 24 26 59 – Overall N stays the same, but smaller number of sets 6. Read 17 27 24 – May help by exploiting _____________ locality 2 28 56 7. Read 15 (due to fewer ____________ ) 29 87 8. Read 25 30 36 – May hurt because cache gets slower 3 9. Read 27 31 98 – Do we want associativity? 15 16
Exercise #2 Exercise #3 – Fill in blanks, show final cache & total hits Block size = _____, N = ______, Assoc = _____ • Show the correct formula for calculating the cache index, given the Memory Cache Processor cache parameters below 1. N = 10, Block size = 4 20 7 Address Data 1. Read 24 21 3 2. Read 25 22 27 3. Read 26 23 32 24 101 4. Read 24 0 2. N = 8, Block size = 1, Associativity = 4 25 78 5. Read 21 26 59 6. Read 26 27 24 28 56 7. Read 24 29 87 8. Read 26 3. N = 16, Block size = 8, Associativity = 2 30 36 1 9. Read 27 31 98 17 18 Exercise #4 Further Issues • When the associativity is > 1 and the cache is full, then whenever there is a miss the cache will: How to deal with writes? • – Find the set where the new data should go – Choose some existing data from that set to “evict” – Place the new data in the newly empty slot • Bit details – how can we store more efficiently? How should the cache decide which data to evict? • What happens on a miss? Evictions? 19 20
Recommend
More recommend