ADMIN • Ethics Discussion & Reading Quiz – Wed April 12 – Reading posted online • Reading – finish Chapter 7 – Sections 7.4 (skip 531-536), 7.5, 7.7, 7.8 SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) 1 2 Down the home stretch… Split Caches • Instructions and data have different properties Monday Wednesday Friday – May benefit from different cache organizations (block size, assoc…) 2-Apr Review Exam Memory ICache DCache 9-Apr Memory Ethics Discussion. Virtual Memory. L1 (L1) (L1) Reading Quiz. I/O 16-Apr I/O. Pipelining. Pipelining, hazards. L2 Cache L2 Cache HW (Ch 7) Due. 23-Apr ILP and Improving Last class. Advanced Main memory Main memory multiple issue. multiple issue topics/review. Course paper due. • Why else might we want to do this? Final Exam – Monday May 1 (first exam day) 3 4
Cache Performance Performance Example #1 – Unified Cache • Simplified model: • Suppose processor has a CPI of 1.5 given a perfect cache. If there are 1.2 memory accesses per instruction, a miss penalty of 20 cycles, and a miss execution time = (execution cycles + stall cycles) × × cycle time rate of 10%, what is the effective CPI with the real cache? × × = execTime + stallTime MemoryAcce sses stall cycles = MissRate MissPenalt y • • Pr ogram Instructio ns Misses (or) = MissPenalt y • • Pr ogram Instructio n • Two typical ways of improving performance: – decreasing the miss rate – decreasing the miss penalty What happens if we increase block size? Add associativity? 5 6 Performance Example #2 – Split Cache Exercise #1 • Suppose processor has a CPI of 1.0 given a perfect cache. If the instruction • Suppose processor has a CPI of 2.0 given a perfect cache. If there are 1.5 cache miss rate is 3% and the data cache miss rate is 10%, what is the effective memory accesses per instruction, a miss penalty of 40 cycles, and a miss CPI with the real cache? rate of 5%, what is the effective CPI with the real cache? Assume a miss penalty of 10 cycles and that 40% of instructions access data. 7 8
Exercise #2 Exercise #3 • You are given a processor with a 64KB, direct-mapped instruction cache and a • Suppose a processor has a base CPI of 1.0 (no cache misses) but currently 64 KB, 4-way associative data cache. For a certain program, the instruction an effective of CPI of 2.0 once misses are considered. There are 1.5 cache miss rate is 4% and the data cache miss rate is 5%. The miss penalty is 10 memory accesses per instruction. If the processor has a unified cache with cycles for the I-cache and 20 cycles for the D-cache. If the CPI is 1.5 with a a miss rate of 2%, how low must the miss penalty be in order to improve perfect cache, and 30% of instructions access data, what is the effective CPI? the effective CPI to 1.3? 9 10 Exercise #4 – Stretch Cache Complexities • A certain processor has a CPI of 1.0 with a perfect cache and a CPI of • Not always easy to understand implications of caches: 1.2 when memory stalls (due to misses) are included. We wish to speed up the performance of this processor by 2x, which we will do 1200 2000 by increasing the clock rate. This, however, will not improve the Radix sort Radix sort 1000 memory system, so misses will take just as long in absolute terms. 1600 How much faster must the clock rate be in to meet our goal? 800 1200 600 800 400 Quicksort 400 Quicksort 200 0 0 4 8 16 32 64 128 256 512 1024 2048 4096 4 8 16 32 64 128 256 512 1024 2048 4096 Size (K items to sort) Size (K items to sort) Theoretical behavior of Observed behavior of Radix sort vs. Quicksort Radix sort vs. Quicksort 11 12
Cache Complexities Program Design for Caches – Example 1 5 • Here is why: • Option #1 Radix sort 4 for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) 3 x[i][j] = x[i][j] + 1; 2 • Option #2 1 Quicksort for (i = 0; i < 200; i++) 0 for (j = 0; j < 20; j++) 4 8 16 32 64 128 256 512 1024 2048 4096 Size (K items to sort) x[i][j] = x[i][j] + 1; • Memory system performance is often critical factor – multilevel caches, pipelined processors, make it harder to predict outcomes – Compiler optimizations to increase locality sometimes hurt ILP • Difficult to predict best algorithm: need experimental data 13 14 Program Design for Caches – Example 2 • Why might this code be problematic? int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) VIRTUAL MEMORY for (j = 0; j < 1024; j++) A[i][j] += B[i][j]; • How to fix it? 15 16
Virtual memory summary (part 1) Virtual memory summary (part 2) Virtual address Virtual address 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Data access without Data access with Memory address Virtual page number Page offset Virtual page number Page offset virtual memory: virtual memory: “all problems in Computer Science can be solved by another level of indirection” Translation Translation -- Butler Lampson 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Physical page number Page offset Physical page number Page offset Physical address Cache Cache Disk Disk Memory Memory 17 18 Virtual Memory Address Translation Terminology: • Main memory can act as a cache for the secondary storage (disk) • Cache block � � � � • Cache miss � � � � Virtual addresses Physical addresses Address translation • Cache tag � � � � • Byte offset � � � � Virtual address 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Virtual page number Page offset Disk addresses • Advantages: Translation – Illusion of having more physical memory – Program relocation 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 – Protection Physical page number Page offset • Note that main point is caching of disk in main memory but will affect all our memory references! 19 20 Physical address
Pages: virtual memory blocks Page Tables Virtual page • Page faults: the data is not in memory, retrieve it from disk number Page table – huge miss penalty (slow disk), thus Physical memory Physical page or • pages should be fairly Valid disk address 1 1 • Replacement strategy: 1 1 0 1 1 – can handle the faults in software instead of hardware 0 Disk storage 1 1 0 1 • Writeback or write-through? 21 22 Example – Address Translation Part 1 Example – Address Translation Part 2 Page Table • Our virtual memory system has: Translate the following addresses: Valid? Physical Page 1. C0001560 – 32 bit virtual addresses or Disk Block # – 28 bit physical addresses C0000 1 A204 – 4096 byte page sizes C0001 1 A200 • How to split a virtual address? 2. C0006123 C0002 0 FB00 C0003 1 8003 Virtual page # Page offset C0004 1 7290 C0005 0 5600 3. C0002450 • What will the physical address look like? C0006 1 F5C0 … Physical page # Page offset • How many entries in the page table? 23 24
Exercise #1 Exercise #2 (new problem – not related to #1) Page Table • Given system with Translate the following addresses: Valid? Physical Page – 20 bit virtual addresses 1. B004890 or Disk Block # – 16 bit physical addresses B000 1 B004 – 256 byte page sizes B001 1 A120 • How to split a virtual address? 2. B002123 B002 0 AB00 B003 0 8003 Virtual page # Page offset B004 1 7590 B005 1 5800 3. B006001 • What will the physical address look like? B006 1 F4C0 … Physical page # Page offset • How many entries in the page table? 25 26 Exercise #3 Exercise #4 Page Table • Is it possible to have the physical address be wider (more bits) than Given the fragment of a page table on the right, answer Valid? Physical the virtual address? the following questions assuming a page size of Page # If so, when would this ever make sense? 1024 bytes B000 1 B0 B001 1 A0 1. What is the virtual address size (# bits) B002 0 AB B003 0 80 B004 1 90 2. What is the physical address size (# bits) B005 1 58 B006 1 F4 … 3. Number of entries in page table? 27 28
Recommend
More recommend