IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 1
Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? • Fully associative: k = N/B • 4-way set associative, k = 4 • Direct-mapped, k = 1 2
Improving Cache Performance Remember key metrics: Miss Rate, Hit Time, Miss Penalty What happens if we: • Increase the cache size (N)? • Increase the block size (keeping N the same)? • Increase associativity (keeping N the same)? 3
Cache performance key tradeoff Inherent conflict: HIT TIME vs MISS RATE 4
More hierarchy – L2 cache? • Problem: CPUs get faster, DRAM gets bigger – Must keep hit time small (1 or 2 cycles) – But then cache must be small too (fast SRAM is expensive) – So miss rate gets higher... • Solution: Add another level of cache: – try and optimize the ____________ on the 1st level cache – try and optimize the ____________ on the 2nd level cache 5
Memory Hierarchy 6
Questions • Will the miss rate of a L2 cache be higher or lower than for the L1 cache? • Claim: “The register file is really the lowest level cache” What are reasons in favor and against this statement? 7
Split Caches • Instructions and data have different properties – May benefit from different cache organizations (block size, assoc…) ICache DCache (L1) (L1) L2 Cache CPU L3, L4, …? Main memory 8
What does an address refer to? The old way: • Address refers to a specific byte in main memory (DRAM). • This is called a physical address. Problems with this: CPU Physical address Cache Memory 9
Virtual memory: Main idea CPU works with (fake) virtual addresses. Operating system translates to physical addresses. Advantages: CPU Virtual address OS Translation New challenge: Physical address Cache Memory 10
Pages and virtual address translation • Virtual AND physical addresses divided into blocks called pages. • Typical page size is 4KiB (means 12 bits for offset) Cache Disk Memory 11
Page Tables • Translation from virtual to physical pages stored in page table. 12
Pages: virtual memory blocks • Page faults: the data is not in memory, retrieve it from disk – huge miss penalty (slow disk), thus • pages should be fairly • Replacement strategy: – can handle the faults in software instead of hardware • Writeback or write-through? 13
Address Translation Terminology: •Cache block •Cache miss •Cache tag •Byte offset 14
Making Address Translation Fast • A cache for address translations: translation lookaside buffer (TLB) Typical values: 16-512 PTEs (page table entries), miss-rate: .01% - 1% 15 miss-penalty: 10 – 100 cycles
Virtual Memory Take-Aways • CPU/programs deal with virtual addresses (virtual page number + page offset). • Translated to physical addresses (physical page # + page offset) between CPU and cache. • Memory is divided into blocks called pages, commonly 4KiB (therefore 12 bits for page offset). • Page tables, managed by the operating system for each process, store virtual->physical page number mapping, as well as that process’s permissions (read/write). • TLB is a special CPU cache for page table lookups. • Physical addresses can reside in DRAM (typical), or be stored on disk (making RAM “look” larger to CPU), or can even refer to other devices (memory-mapped I/O). 16
Modern Systems 17
Program Design 2D array layout • Consider this C declaration: int A[4][3] = { {10, 11, 12}, {20, 21, 22}, {30, 31, 32}, {40, 41, 42} }; • How is this array stored in memory? 20
Program Design for Caches – Example 1 • Option #1 for (j = 0; j < 20; j++) for (i = 0; i < 200; i++) x[i][j] = x[i][j] + 1; • Option #2 for (i = 0; i < 200; i++) for (j = 0; j < 20; j++) x[i][j] = x[i][j] + 1; 21
Program Design for Caches – Example 2 • Why might this code be problematic? int A[1024][1024]; int B[1024][1024]; for (i = 0; i < 1024; i++) for (j = 0; j < 1024; j++) A[i][j] += B[i][j]; • How to fix it? 22
Concluding Remarks • Fast memories are small, large memories are slow – We really want fast, large memories – Caching gives this illusion • Principle of locality – Programs use a small part of their memory space frequently • Memory hierarchy – L1 cache ↔ L2 cache ↔ … ↔ DRAM memory ↔ disk • Memory system design is critical for multiprocessors 23
Recommend
More recommend