cache performance
play

Cache Performance Associativity Replacement Samira Khan Cache - PDF document

3/28/17 Agenda Review from last lecture Cache access Cache Performance Associativity Replacement Samira Khan Cache Performance March 28, 2017 Direct-Mapped Cache: Placement and Access Cache Abstraction and Metrics 00 | 000


  1. 3/28/17 Agenda • Review from last lecture • Cache access Cache Performance • Associativity • Replacement Samira Khan • Cache Performance March 28, 2017 Direct-Mapped Cache: Placement and Access Cache Abstraction and Metrics 00 | 000 | 000 - • Assume byte-addressable memory: 256 bytes, 8-byte blocks A 00 | 000 | 111 à 32 blocks Address • Assume cache: 64 bytes, 8 blocks Tag Store Data Store 01 | 000 | 000 - • Direct-mapped: A block can go to only one location B (is the address (stores 01 | 000 | 111 tag memory index byte in block in the cache? blocks) Tag store Data store + bookkeeping) 2b 3 bits 3 bits Address 10 | 000 | 000 - 10 | 000 | 111 Hit/miss? Data V tag 11 | 000 | 000 - • Cache hit rate = (# hits) / (# hits + # misses) = (# hits) / (# accesses) byte in block =? MUX 11 | 000 | 111 • Average memory access time (AMAT) Hit? Data = ( hit-rate * hit-latency ) + ( miss-rate * miss-latency ) • Addresses with same index contend for the same location 11 | 111 | 000 - • Cause conflict misses 11 | 111 | 111 3 Memory 4 1

  2. 3/28/17 Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0b 00 000 xxx A = 0b 00 000 xxx Tag store Tag store B = 0b 01 000 xxx Data store Data store B = 0b 01 000 xxx 0 0 0 1 00 XXXXXXXXX 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 A A 00 000 XXX 00 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits MISS: Fetch A and update tag 8-bit address 8-bit address Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0b 00 000 xxx A = 0b 00 000 xxx Tag store Tag store B = 0b 01 000 xxx Data store Data store B = 0b 01 000 xxx 1 00 XXXXXXXXX 0 0 1 01 YYYYYYYYYY 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 B B 01 000 XXX 01 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits Tags do not match: MISS 8-bit address 8-bit address Fetch block B, update tag 2

  3. 3/28/17 Direct-Mapped Cache: Placement and Access Direct-Mapped Cache: Placement and Access A, B, A, B, A, B A, B, A, B, A, B A = 0x 00 000 xxx A = 0x 00 000 xxx Tag store Tag store B = 0x 01 000 xxx Data store Data store B = 0x 01 000 xxx 1 01 YYYYYYYYYY 0 0 1 00 XXXXXXXXX 0 0 1 1 2 0 2 0 tag index byte in block tag index byte in block 3 0 3 0 A A 00 000 XXX 00 000 XXX 4 0 4 0 5 0 5 0 6 0 6 0 7 0 7 0 byte in block byte in block =? MUX MUX =? tag index byte in block tag index byte in block Hit? Data Hit? Data 2 bits 3 bits 3 bits 2 bits 3 bits 3 bits Tags do not match: MISS 8-bit address 8-bit address Fetch block A, update tag Associativity (and Tradeoffs) Set Associative Cache • Degree of associativity: How many blocks can map to the same index (or A, B, A, B, A, B set)? A = 0b 000 00 xxx Tag store Data store B = 0b 010 00 xxx XXXXXXXXX YYYYYYYYYY 0 1 000 1 010 • Higher associativity 0 1 0 2 0 0 ++ Higher hit rate tag index byte in block 3 0 0 A -- Slower cache access time (hit latency and data access latency) 000 00 XXX -- More expensive hardware (more comparators) =? MUX =? MUX • Diminishing returns from higher Logic byte in block associativity hit rate Data Hit? tag index byte in block HIT 3 bits 2 bits 3 bits 8-bit address associativity 12 3

  4. 3/28/17 Issues in Set-Associative Caches Eviction/Replacement Policy • Think of each block in a set having a “priority” • Which block in the set to replace on a cache miss? • Indicating how important it is to keep the block in the cache • Any invalid block first • Key issue: How do you determine/adjust block priorities? • If all are valid, consult the replacement policy • There are three key decisions in a set: • Random • FIFO • Insertion, promotion, eviction (replacement) • Least recently used (how to implement?) • Not most recently used • Insertion: What happens to priorities on a cache fill? • Least frequently used • Where to insert the incoming block, whether or not to insert the block • Hybrid replacement policies • Promotion: What happens to priorities on a cache hit? • Whether and how to change block priority • Eviction/replacement: What happens to priorities on a cache miss? • Which block to evict and how to adjust priorities 13 14 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store LRU LRU MRU -1 MRU -1 MRU -2 MRU MRU -2 MRU Set 0 A B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBD ACCESS PATTERN: ACBDE 15 16 4

  5. 3/28/17 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU MRU -1 MRU -1 MRU -2 MRU MRU -2 MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDE ACCESS PATTERN: ACBDE 17 18 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU MRU -2 MRU -2 MRU -2 MRU -1 LRU MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDE ACCESS PATTERN: ACBDE 19 20 5

  6. 3/28/17 Least Recently Used Replacement Policy Least Recently Used Replacement Policy • 4-way • 4-way Tag store Tag store MRU MRU -1 MRU MRU LRU MRU -1 LRU MRU -1 Set 0 E B C D Set 0 B C D E =? =? =? =? =? =? =? =? Logic Hit? Logic Hit? Data store Data store ACCESS PATTERN: ACBDEB ACCESS PATTERN: ACBDEB 21 22 Least Recently Used Replacement Policy Implementing LRU • 4-way • Idea: Evict the least recently accessed block Tag store MRU -1 • Problem: Need to keep track of access ordering of blocks MRU LRU MRU -2 Set 0 E B C D • Question: 2-way set associative cache: • What do you need to implement LRU perfectly? =? =? =? =? • Question: 16-way set associative cache: Logic Hit? • What do you need to implement LRU perfectly? • What is the logic needed to determine the LRU victim? Data store ACCESS PATTERN: ACBDEB 23 24 6

  7. 3/28/17 Approximations of LRU Cache Replacement Policy: LRU or Random • Most modern processors do not implement “true LRU” (also • LRU vs. Random: Which one is better? called “perfect LRU”) in highly-associative caches • Example: 4-way cache, cyclic references to A, B, C, D, E • 0% hit rate with LRU policy • Set thrashing: When the “ program working set ” in a set is • Why? larger than set associativity • True LRU is complex • Random replacement policy is better when thrashing occurs • LRU is an approximation to predict locality anyway (i.e., not the best • In practice: possible cache management policy) • Depends on workload • Average hit rate of LRU and Random are similar • Examples: • Best of both Worlds: Hybrid of LRU and Random • Not MRU (not most recently used) • How to choose between the two? Set sampling • See Qureshi et al., “ A Case for MLP-Aware Cache Replacement, “ ISCA 2006. 25 26 What’s In A Tag Store Entry? Handling Writes (I) • Valid bit n When do we write the modified data in a cache to the next level? Write through: At the time the write happens • • Tag • Write back: When the block is evicted • Replacement policy bits • Write-back + Can consolidate multiple writes to the same block before eviction • Dirty bit? • Potentially saves bandwidth between cache levels + saves energy • Write back vs. write through caches -- Need a bit in the tag store indicating the block is “ dirty/modified ” • Write-through + Simpler + All levels are up to date. Consistent -- More bandwidth intensive; no coalescing of writes 27 28 7

Recommend


More recommend