admin
play

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, - PowerPoint PPT Presentation

ADMIN Reading finish Chapter 5 Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220 Set #17: Caching Finale and Virtual Reality (Chapter 5) 1 2 Cache Performance Performance Example Simplified model: Suppose processor


  1. ADMIN • Reading – finish Chapter 5 – Sections 5.4 (skip 511-515), 5.5, 5.11, 5.12 IC220 Set #17: Caching Finale and Virtual Reality (Chapter 5) 1 2 Cache Performance Performance Example • Simplified model: • Suppose processor has a CPI of 1.5 given a perfect cache. If there are 1.2 memory accesses per instruction, a miss penalty of 20 cycles, and a miss execution time = (execution cycles + stall cycles) × × cycle time rate of 10%, what is the effective CPI with the real cache? × × = execTime + stallTime MemoryAcce sses stall cycles = MissRate MissPenalt y • • Pr ogram Instructio ns Misses (or) = MissPenalt y • • Pr ogram Instructio n • Two typical ways of improving performance: – decreasing the miss rate – decreasing the miss penalty What happens if we increase block size? Add associativity? 3 4

  2. Split Caches Cache Complexities • Instructions and data have different properties • Not always easy to understand implications of caches: – May benefit from different cache organizations (block size, assoc…) 1200 2000 Radix sort Radix sort 1000 ICache DCache 1600 L1 800 (L1) (L1) 1200 600 800 400 L2 Cache L2 Cache Quicksort Quicksort 400 200 0 0 4 8 16 32 64 128 256 512 1024 2048 4096 4 8 16 32 64 128 256 512 1024 2048 4096 Size (K items to sort) Size (K items to sort) Theoretical behavior of Observed behavior of Main memory Main memory Radix sort vs. Quicksort Radix sort vs. Quicksort • Why else might we want to do this? 5 6 Cache Complexities Program Design for Caches – Example 1 5 • Here is why: • Option #1 Radix sort 4 for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) 3 x[i][j] = x[i][j] + 1; 2 • Option #2 1 Quicksort for (i = 0; i < 200; i++) 0 4 8 16 32 64 128 256 512 1024 2048 4096 for (j = 0; j < 20; j++) Size (K items to sort) x[i][j] = x[i][j] + 1; • Memory system performance is often critical factor – multilevel caches, pipelined processors, make it harder to predict outcomes – Compiler optimizations to increase locality sometimes hurt ILP • Difficult to predict best algorithm: need experimental data 7 8

  3. Program Design for Caches – Example 2 • Why might this code be problematic? int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) VIRTUAL MEMORY for (j = 0; j < 1024; j++) A[i][j] += B[i][j]; • How to fix it? 9 10 Virtual memory summary (part 1) Virtual memory summary (part 2) Virtual address Virtual address 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Data access without Data access with Memory address Virtual page number Page offset Virtual page number Page offset virtual memory: virtual memory: Translation Translation 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Physical page number Page offset Physical page number Page offset Physical address Cache Cache Disk Disk Memory Memory 11 12

  4. Virtual Memory Address Translation Terminology: • Main memory can act as a cache for the secondary storage (disk) • Cache block � � � � Virtual addresses Physical addresses • Cache miss � � � � Address translation • Cache tag � � � � • Byte offset � � � � Virtual address 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Virtual page number Page offset Disk addresses • Advantages: Translation – Illusion of having more physical memory – Program relocation 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 – Protection Physical page number Page offset • Note that main point is caching of disk in main memory but will affect all our memory references! 13 Physical address 14 Pages: virtual memory blocks Page Tables Virtual page • Page faults: the data is not in memory, retrieve it from disk number Page table – huge miss penalty (slow disk), thus Physical page or Physical memory • pages should be fairly Valid disk address 1 1 • Replacement strategy: 1 1 0 1 1 – can handle the faults in software instead of hardware 0 Disk storage 1 1 0 1 • Writeback or write-through? 15 16

  5. Example – Address Translation Part 1 Example – Address Translation Part 2 EX 7-31… Page Table • Our virtual memory system has: Translate the following addresses: Valid? Physical Page – 32 bit virtual addresses 1. C0001560 or Disk Block # – 28 bit physical addresses C0000 1 A204 – 4096 byte page sizes C0001 1 A200 • How to split a virtual address? 2. C0006123 C0002 0 FB00 C0003 1 8003 Virtual page # Page offset C0004 1 7290 C0005 0 5600 3. C0002450 • What will the physical address look like? C0006 1 F5C0 … Physical page # Page offset • How many entries in the page table? 17 18 Making Address Translation Fast Protection and Address Spaces • A cache for address translations: translation lookaside buffer • Every program has its own “address space” – Program A’s address 0xc000 0200 not same as program B’s TLB Virtual page Physical page number Valid Dirty Ref Tag address – OS maps every virtual address to distinct physical addresses 1 0 1 1 1 1 Physical memory • How do we make this work? 1 1 1 1 0 1 0 0 0 – Page tables – 1 0 1 Page table Physical page Valid Dirty Ref or disk address – TLB – 1 1 1 1 0 0 Disk storage 1 0 0 1 0 1 0 0 0 1 0 1 1 0 1 • Can program A access data from program B? Yes, if… 0 0 0 1 1 1 1 1 1 1. OS can map different virtual page #’s to same physical page #’s 0 0 0 1 1 1 • So A’s 0xc000 0200 = B’s 0xb320 0200 2. Program A has read or write access to the page Typical values: 16-512 entries, miss-rate: .01% - 1% miss-penalty: 10 – 100 cycles 3. OS uses supervisor/kernel protection to prevent user programs from modifying page table/TLB 19 20

  6. TLBs and Caches Integrating Virtual Memory, TLBs, and Caches Virtual address (Figure 5.25) What happens after translation? Virtual address TLB access 31 30 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 Virtual page number Page offset TLB miss No Yes TLB hit? exception Physical address No Yes Write? Translation Try to read data from cache No Yes Write access bit on? Write protection Try to write data 29 28 27 15 14 13 12 11 10 9 8 3 2 1 0 exception No Yes Cache miss stall to cache Cache hit? while read block Physical page number Page offset Deliver data to the CPU No Y es Cache miss stall Cache hit? while read block Write data into cache, update the dirty bit, and put the data and the address into the write buffer Cache 21 22 Modern Systems Concluding Remarks • Fast memories are small, large memories are slow – We really want fast, large memories – Caching gives this illusion • Principle of locality – Programs use a small part of their memory space frequently • Memory hierarchy – L1 cache ↔ ↔ L2 cache ↔ ↔ … ↔ ↔ DRAM memory ↔ ↔ ↔ ↔ ↔ ↔ ↔ disk ↔ ↔ ↔ • Memory system design is critical for multiprocessors 23 24

Recommend


More recommend