Cache Performance Associativity Replacement Samira Khan Cache - PDF document

3/28/17 Agenda • Review from last lecture • Cache access Cache Performance • Associativity • Replacement Samira Khan • Cache Performance March 28, 2017 Direct-Mapped Cache: Placement and Access Cache Abstraction and Metrics 00 | 000 | 000 - • Assume byte-addressable memory: 256 bytes, 8-byte blocks A 00 | 000 | 111 à 32 blocks Address • Assume cache: 64 bytes, 8 blocks Tag Store Data Store 01 | 000 | 000 - • Direct-mapped: A block can go to only one location B (is the address (stores 01 | 000 | 111 tag memory index byte in block in the cache? blocks) Tag store Data store + bookkeeping) 2b 3 bits 3 bits Address 10 | 000 | 000 - 10 | 000 | 111 Hit/miss? Data V tag 11 | 000 | 000 - • Cache hit rate = (# hits) / (# hits + # misses) = (# hits) / (# accesses) byte in block =? MUX 11 | 000 | 111 • Average memory access time (AMAT) Hit? Data = ( hit-rate * hit-latency ) + ( miss-rate * miss-latency ) • Addresses with same index contend for the same location 11 | 111 | 000 - • Cause conflict misses 11 | 111 | 111 3 Memory 4 1

3/28/17 Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0b 00 000 xxx A = 0b 00 000 xxx Tag store Tag store B = 0b 01 000 xxx Data store Data store B = 0b 01 000 xxx 0 0 0 1 00 XXXXXXXXX 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 A A 00 000 XXX 00 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits MISS: Fetch A and update tag 8-bit address 8-bit address Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0b 00 000 xxx A = 0b 00 000 xxx Tag store Tag store B = 0b 01 000 xxx Data store Data store B = 0b 01 000 xxx 1 00 XXXXXXXXX 0 0 1 01 YYYYYYYYYY 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 B B 01 000 XXX 01 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits Tags do not match: MISS 8-bit address 8-bit address Fetch block B, update tag 2

3/28/17 Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0x 00 000 xxx A = 0x 00 000 xxx Tag store Tag store B = 0x 01 000 xxx Data store Data store B = 0x 01 000 xxx 1 01 YYYYYYYYYY 0 0 1 00 XXXXXXXXX 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 A A 00 000 XXX 00 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits Tags do not match: MISS 8-bit address 8-bit address Fetch block A, update tag Associativity (and Tradeoffs) Set Associative Cache • Degree of associativity: How many blocks can map to the same index (or A, B, A, B, A, B set)? A = 0b 000 00 xxx Tag store Data store B = 0b 010 00 xxx XXXXXXXXX YYYYYYYYYY 0 1 000 1 010 • Higher associativity 0 1 0 2 0 0 ++ Higher hit rate tag index byte in block 3 0 0 A -- Slower cache access time (hit latency and data access latency) 000 00 XXX -- More expensive hardware (more comparators) =? MUX =? MUX • Diminishing returns from higher Logic byte in block associativity hit rate Data Hit? tag index byte in block HIT 3 bits 2 bits 3 bits 8-bit address associativity 12 3

3/28/17 Issues in Set-Associative Caches Eviction/Replacement Policy • Think of each block in a set having a “priority” • Which block in the set to replace on a cache miss? • Indicating how important it is to keep the block in the cache • Any invalid block first • Key issue: How do you determine/adjust block priorities? • If all are valid, consult the replacement policy • There are three key decisions in a set: • Random • FIFO • Insertion, promotion, eviction (replacement) • Least recently used (how to implement?) • Not most recently used • Insertion: What happens to priorities on a cache fill? • Least frequently used • Where to insert the incoming block, whether or not to insert the block • Hybrid replacement policies • Promotion: What happens to priorities on a cache hit? • Whether and how to change block priority • Eviction/replacement: What happens to priorities on a cache miss? • Which block to evict and how to adjust priorities 13 14 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store LRU LRU MRU -1 MRU -1 MRU -2 MRU MRU -2 MRU Set 0 A B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBD ACCESS PATTERN: ACBDE 15 16 4

3/28/17 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU MRU -1 MRU -1 MRU -2 MRU MRU -2 MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDE ACCESS PATTERN: ACBDE 17 18 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU MRU -2 MRU -2 MRU -2 MRU -1 LRU MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDE ACCESS PATTERN: ACBDE 19 20 5

3/28/17 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU -1 MRU MRU LRU MRU -1 LRU MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDEB ACCESS PATTERN: ACBDEB 21 22 Least Recently Used Replacement Policy Implementing LRU • 4-way • Idea: Evict the least recently accessed block Tag store MRU -1 • Problem: Need to keep track of access ordering of blocks MRU LRU MRU -2 Set 0 E B C D • Question: 2-way set associative cache: • What do you need to implement LRU perfectly? =? =? =? =? • Question: 16-way set associative cache: Logic Hit? • What do you need to implement LRU perfectly? • What is the logic needed to determine the LRU victim? Data store ACCESS PATTERN: ACBDEB 23 24 6

3/28/17 Approximations of LRU Cache Replacement Policy: LRU or Random • Most modern processors do not implement “true LRU” (also • LRU vs. Random: Which one is better? called “perfect LRU”) in highly-associative caches • Example: 4-way cache, cyclic references to A, B, C, D, E • 0% hit rate with LRU policy • Set thrashing: When the “ program working set ” in a set is • Why? larger than set associativity • True LRU is complex • Random replacement policy is better when thrashing occurs • LRU is an approximation to predict locality anyway (i.e., not the best • In practice: possible cache management policy) • Depends on workload • Average hit rate of LRU and Random are similar • Examples: • Best of both Worlds: Hybrid of LRU and Random • Not MRU (not most recently used) • How to choose between the two? Set sampling • See Qureshi et al., “ A Case for MLP-Aware Cache Replacement, “ ISCA 2006. 25 26 What’s In A Tag Store Entry? Handling Writes (I) • Valid bit n When do we write the modified data in a cache to the next level? Write through: At the time the write happens • • Tag • Write back: When the block is evicted • Replacement policy bits • Write-back + Can consolidate multiple writes to the same block before eviction • Dirty bit? • Potentially saves bandwidth between cache levels + saves energy • Write back vs. write through caches -- Need a bit in the tag store indicating the block is “ dirty/modified ” • Write-through + Simpler + All levels are up to date. Consistent -- More bandwidth intensive; no coalescing of writes 27 28 7

Cache Performance Associativity Replacement Samira Khan Cache - PDF document

3/28/17 Agenda Review from last lecture Cache access Cache Performance Associativity Replacement Samira Khan Cache Performance March 28, 2017 Direct-Mapped Cache: Placement and Access Cache Abstraction and Metrics 00 | 000

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Cache Performance Samira Khan March 28, 2017 Agenda Review from last lecture Cache

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

Expressions and Assignment COS 301: Programming Languages UMAINE CIS COS 301 Programming

Induction Bernd Schr oder logo1 Bernd Schr oder Louisiana Tech University, College of

Associative arrays Associative arrays map a key to a value Keys and values can be different

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

The Mathematics behind the Property of Associativity An invitation to study the many variants of

IC220: Caching 1 (Chapter 5) 1 Memory, Cost, and Performance Ideal World: we want a memory

Programming Abstractions Week 4-1: Combinators and combinatory logic Stephen Checkoway An early

Tutorial 9 : cache memory Why use a cache ? Main memory (VRAM/DRAM) is slow ! To deal with

Sambuz

Useful Links

Newsletter

Mail Us