Cache design overview ANY cache can be viewed as k-way associative. - PowerPoint PPT Presentation

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? • Fully associative: k = N/B IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) • 4-way set associative, k = 4 • Direct-mapped, k = 1 1 2 Improving Cache Performance Cache performance key tradeoff Inherent conflict: Remember key metrics: Miss Rate, Hit Time, Miss Penalty What happens if we: HIT TIME vs MISS RATE • Increase the cache size (N)? • Increase the block size (keeping N the same)? • Increase associativity (keeping N the same)? 3 4

More hierarchy – L2 cache? Memory Hierarchy • Problem: CPUs get faster, DRAM gets bigger – Must keep hit time small (1 or 2 cycles) – But then cache must be small too (fast SRAM is expensive) – So miss rate gets higher... • Solution: Add another level of cache: – try and optimize the ____________ on the 1st level cache – try and optimize the ____________ on the 2nd level cache 5 6 Questions Split Caches • Instructions and data have different properties – May benefit from different cache organizations (block size, assoc…) • Will the miss rate of a L2 cache be higher or lower than for the L1 cache? ICache DCache (L1) (L1) L2 Cache CPU • Claim: “The register file is really the lowest level cache” L3, L4, …? What are reasons in favor and against this statement? Main memory 7 8

What does an address refer to? Virtual memory: Main idea The old way: CPU works with (fake) virtual addresses. • Address refers to a specific byte in main memory (DRAM). Operating system translates to physical addresses. • This is called a physical address. Advantages: Problems with this: CPU CPU Virtual address Physical OS Translation address New challenge: Physical address Cache Cache Memory Memory 9 10 Pages and virtual address translation Page Tables • Translation from virtual to physical pages stored in page table. • Virtual AND physical addresses divided into blocks called pages. • Typical page size is 4KiB (means 12 bits for offset) Cache Disk Memory 11 12

Pages: virtual memory blocks Address Translation Terminology: • Page faults: the data is not in memory, retrieve it from disk •Cache block ฀ – huge miss penalty (slow disk), thus •Cache miss ฀ •Cache tag ฀ • pages should be fairly •Byte offset ฀ • Replacement strategy: – can handle the faults in software instead of hardware • Writeback or write-through? 13 14 Making Address Translation Fast Virtual Memory Take-Aways • A cache for address translations: translation lookaside buffer (TLB) • CPU/programs deal with virtual addresses (virtual page number + page offset). • Translated to physical addresses (physical page # + page offset) between CPU and cache. • Memory is divided into blocks called pages, commonly 4KiB (therefore 12 bits for page offset). • Page tables, managed by the operating system for each process, store virtual->physical page number mapping, as well as that process’s permissions (read/write). • TLB is a special CPU cache for page table lookups. • Physical addresses can reside in DRAM (typical), or be stored on disk (making RAM “look” larger to CPU), or can even refer to other devices (memory-mapped I/O). Typical values: 16-512 PTEs (page table entries), miss-rate: .01% - 1% 15 16 miss-penalty: 10 – 100 cycles

Modern Systems Program Design 2D array layout • Consider this C declaration: int A[4][3] = { {10, 11, 12}, {20, 21, 22}, {30, 31, 32}, {40, 41, 42} }; • How is this array stored in memory? 17 20 Program Design for Caches – Example 1 Program Design for Caches – Example 2 • Option #1 • Why might this code be problematic? for (j = 0; j < 20; j++) int A[1024][1024]; for (i = 0; i < 200; i++) int B[1024][1024]; x[i][j] = x[i][j] + 1; for (i = 0; i < 1024; i++) for (j = 0; j < 1024; j++) • Option #2 A[i][j] += B[i][j]; for (i = 0; i < 200; i++) for (j = 0; j < 20; j++) x[i][j] = x[i][j] + 1; • How to fix it? 21 22

Concluding Remarks • Fast memories are small, large memories are slow – We really want fast, large memories – Caching gives this illusion • Principle of locality – Programs use a small part of their memory space frequently • Memory hierarchy – L1 cache ↔ L2 cache ↔ … ↔ DRAM memory ↔ disk • Memory system design is critical for multiprocessors 23

Cache design overview ANY cache can be viewed as k-way associative. - PowerPoint PPT Presentation

Cache design overview ANY cache can be viewed as k-way associative. What are the pros and cons of each? Fully associative: k = N/B IC220 Caching 2: Memory Hierarchy (more from Chapter 5 - specifically 5.7, 5.8) 4-way set associative,

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

lecture 18 cache 2 - TLB (hit and miss) - instruction or data cache - cache (hit and

Introduction to gLite Middleware Malik Ehsanullah (ehsan@barc.gov.in) BARC Mumbai 1

LU Factorization with pivoting What can go wrong with the previous algorithm for LU

USQCD regional grid USQCD regional grid Report to ILDG 14 Report to ILDG 14 US Grid Usage US

ARC Storage Solution, NGIn School Jon K. Nilsen, Dept. Of Physics, Univ. Of Oslo Outline

03 Part I Intro to Database Systems Andy Pavlo AP AP 15-445/15-645 Computer Science Fall

COW File Systems Why COW File Systems? (copy-on-write) Data and metadata not updated in place,

Capturing, Using, and Storing Users Locations W3C Workshop on the Future of Social Networking

Private Cloud Ravi Mula Research Associate Dept. of CS Outline 1. Cloud Classifications -