Caching 3 1 last time tag / index / ofgset lookup in associative - PowerPoint PPT Presentation

Caching 3 1

last time tag / index / ofgset lookup in associative caches replacement policies least recently used — best miss rate assuming locality random — simplest to implement write policies: write-through versus write-back write-allocate versus write-no-allocate hit time, miss penalty, miss rate average memory access time (AMAT) cache design tradeofgs 2

making any cache look bad 1. access enough blocks, to fjll the cache 2. access an additional block, replacing something 3. access last block replaced 4. access last block replaced 5. access last block replaced … 3 but — typical real programs have locality

cache optimizations miss rate worse? ? better LRU replacement worse? — ??? writeback worse? — better write-allocate better — — add secondary cache — hit time miss penalty increase cache size better worse increase associativity worse better worse worse? increase block size depends worse 4 average time = hit time + miss rate × miss penalty

cache optimizations by miss type fewer misses (assuming other listed parameters remain constant) fewer misses more misses — increase block size — — capacity increase associativity — fewer misses fewer misses increase cache size compulsory confmict 5

prefetching seems like we can’t really improve cold misses… have to have a miss to bring value into the cache? solution: don’t require miss: ‘prefetch’ the value before it’s accessed remaining problem: how do we know what to fetch? 6

common pattern with instruction fetches and array accesses common access patterns suppose recently accessed 16B cache blocks are at: 0x48010, 0x48020, 0x48030, 0x48040 guess what’s accessed next 7

common access patterns suppose recently accessed 16B cache blocks are at: 0x48010, 0x48020, 0x48030, 0x48040 guess what’s accessed next 7 common pattern with instruction fetches and array accesses

prefetching idea look for sequential accesses bring in guess at next-to-be-accessed value if right: no cache miss (even if never accessed before) if wrong: possibly evicted something else — could cause more misses fortunately, sequential access guesses almost always right 8

split caches; multiple cores data (shared between cores) L3 cache (core 2) L2 cache unifjed (core 1) L2 cache unifjed (core 2) cache (core 2) instr. cache instr. (core 1) cache instr. (core 1) cache data (core 1) cache 9

hierarchy and instruction/data caches typically separate data and instruction caches for L1 (almost) never going to read instructions as data or vice-versa avoids instructions evicting data and vice-versa can optimize instruction cache for difgerent access pattern easier to build fast caches: that handles less accesses at a time 10

inclusive versus exclusive no extra work on eviction makes less sense with multicore (contains cache eviction victims) sometimes called victim cache avoid duplicated data exclusive policy: caches? shared by multiple L L easier to explain when but duplicated data inclusive policy: L2 inclusive of L1 L2 cache L1 cache probably evicting from L1 adds to L2 adding to L1 must remove from L2 L2 contains difgerent data than L1 L2 exclusive of L1 L2 cache L1 cache adding to L1 also adds to L2 everything in L1 cache duplicated in L2 11

inclusive versus exclusive inclusive policy: makes less sense with multicore (contains cache eviction victims) sometimes called victim cache avoid duplicated data exclusive policy: easier to explain when but duplicated data no extra work on eviction L2 cache L2 inclusive of L1 L1 cache probably evicting from L1 adds to L2 adding to L1 must remove from L2 L2 contains difgerent data than L1 L2 exclusive of L1 L2 cache L1 cache adding to L1 also adds to L2 everything in L1 cache duplicated in L2 11 L k shared by multiple L ( k − 1) caches?

inclusive versus exclusive no extra work on eviction makes less sense with multicore (contains cache eviction victims) sometimes called victim cache avoid duplicated data exclusive policy: caches? shared by multiple L L easier to explain when but duplicated data inclusive policy: L2 inclusive of L1 L2 cache L1 cache probably evicting from L1 adds to L2 adding to L1 must remove from L2 L2 contains difgerent data than L1 L2 exclusive of L1 L2 cache L1 cache adding to L1 also adds to L2 everything in L1 cache duplicated in L2 11

average memory access time efgective speed of memory 12 AMAT = hit time + miss penalty × miss rate

AMAT exercise (1) 90% cache hit rate hit time is 2 cycles 30 cycle miss penalty what is the average memory access time? 5 cycles suppose we could increase hit rate by increasing its size, but it would increase the hit time to 3 cycles how much do we have to increase the hit rate for this to be worthwhile? to at least 13

AMAT exercise (1) 90% cache hit rate hit time is 2 cycles 30 cycle miss penalty what is the average memory access time? 5 cycles suppose we could increase hit rate by increasing its size, but it would increase the hit time to 3 cycles how much do we have to increase the hit rate for this to be worthwhile? 13 to at least 10% − 1 / 30 ≈ 94%

exercise: AMAT and multi-level caches suppose we have L1 cache with 3 cycle hit time 90% hit rate and an L2 cache with 10 cycle hit time 80% hit rate (for accesses that make this far) (assume all accesses come via this L1) and main memory has a 100 cycle access time what is the average memory access time for the L1 cache? cycles L1 miss penalty is cycles 14

exercise: AMAT and multi-level caches suppose we have L1 cache with 3 cycle hit time 90% hit rate and an L2 cache with 10 cycle hit time 80% hit rate (for accesses that make this far) (assume all accesses come via this L1) and main memory has a 100 cycle access time what is the average memory access time for the L1 cache? L1 miss penalty is cycles 14 3 + 0 . 1 · (10 + 0 . 2 · 100) = 6 cycles

exercise: AMAT and multi-level caches suppose we have L1 cache with 3 cycle hit time 90% hit rate and an L2 cache with 10 cycle hit time 80% hit rate (for accesses that make this far) (assume all accesses come via this L1) and main memory has a 100 cycle access time what is the average memory access time for the L1 cache? 14 3 + 0 . 1 · (10 + 0 . 2 · 100) = 6 cycles L1 miss penalty is 10 + 0 . 2 · 100 = 30 cycles

exercise (1) initial cache: 64-byte blocks, 64 sets, 8 ways/set If we leave the other parameters listed above unchanged, which will program? (Multiple may be correct.) A. quadrupling the block size (256-byte blocks, 64 sets, 8 ways/set) B. quadrupling the number of sets C. quadrupling the number of ways/set 15 probably reduce the number of capacity misses in a typical

exercise (2) initial cache: 64-byte blocks, 8 ways/set, 64KB cache If we leave the other parameters listed above unchanged, which will program? (Multiple may be correct.) A. quadrupling the block size (256-byte block, 8 ways/set, 64KB cache) B. quadrupling the number of ways/set C. quadrupling the cache size 16 probably reduce the number of capacity misses in a typical

exercise (3) initial cache: 64-byte blocks, 8 ways/set, 64KB cache If we leave the other parameters listed above unchanged, which will (Multiple may be correct.) A. quadrupling the block size (256-byte block, 8 ways/set, 64KB cache) B. quadrupling the number of ways/set C. quadrupling the cache size 17 probably reduce the number of confmict misses in a typical program?

cache accesses and C code (1) int scaleFactor; int scaleByFactor( int value) { } scaleByFactor: movl scaleFactor, %eax imull %edi, %eax ret exericse: what data cache accesses does this function do? 4-byte read of scaleFactor 8-byte read of return address 18 return value * scaleFactor;

possible scaleFactor use for ( int i = 0; i < size; ++i) { array[i] = scaleByFactor(array[i]); } 19

misses and code (2) scaleByFactor: movl scaleFactor, %eax imull %edi, %eax ret suppose each time this is called in the loop: return address located at address 0x7ffffffe43b8 scaleFactor located at address 0x6bc3a0 with direct-mapped 32KB cache w/64 B blocks, what is their: return address scaleFactor tag index ofgset 20

Caching 3 1 last time tag / index / ofgset lookup in associative - PowerPoint PPT Presentation

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies least recently used best miss rate assuming locality random simplest to implement write policies: write-through versus write-back

Agenda Caching Caching Gitlab Demo Caching Demos Mirroring Caching Limitations Manual

Web Proxy Web Proxy Caching Caching Caching Web Proxy Web Proxy Caching By Miquel Company

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching

Web Caching Web Caching and wireless networks Next generation Wireless Networks Helsinki

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson

1 Harvest Harvest- -Style ICP Hierarchies Style ICP Hierarchies Issues for Cache Hierarchies

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Understanding Optimal Caching and Opportunistic Caching at The Edge of Information Centric

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

Region Caching: Motivation Region Caching: Motivation High Level Languages influence the

Advance Caching 1 Today quiz 5 recap quiz 6 recap advanced caching Hand a

$$$ $$$ Cache Memory 2 $$$ 2 Schedule This week

Memory Hierarchy & Caching CS 351: Systems Programming Michael Saelee <lee@iit.edu>

Caches and Memory Hierarchy: Review UCSB CS240A, Fall 2017 1 Motivation Most applications

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction Principle of Locality

CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Low-Level Memory Optimisations at the High-Level with Ownership-like Annotations Do you want

Caching 3 1 last time tag / index / ofgset lookup in associative - PowerPoint PPT Presentation

Caching 3 1 last time tag / index / ofgset lookup in associative caches replacement policies least recently used best miss rate assuming locality random simplest to implement write policies: write-through versus write-back

Agenda Caching Caching Gitlab Demo Caching Demos Mirroring Caching Limitations Manual

Web Proxy Web Proxy Caching Caching Caching Web Proxy Web Proxy Caching By Miquel Company

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web

Web Caching based on: Web Caching , Geoff Huston Web Caching and Zipf-like Distributions:

Scaling Your Cache &amp; Caching at Scale Alex Miller @puredanger Mission Why does caching

Web Caching Web Caching and wireless networks Next generation Wireless Networks Helsinki

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&amp;D Engineer Thomson

1 Harvest Harvest- -Style ICP Hierarchies Style ICP Hierarchies Issues for Cache Hierarchies

1 Web Traffic Characterization Zipf Web Traffic Characterization Zipf [Breslau/Cao99] and

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Slide 2 Caching is both the most effective AND the most cost-effective method for schools to

Understanding Optimal Caching and Opportunistic Caching at The Edge of Information Centric

CACHING BEYOND RAM CACHING BEYOND RAM memcached.org/blog @dormando WHY RAM? WHY RAM?

Region Caching: Motivation Region Caching: Motivation High Level Languages influence the

Advance Caching 1 Today quiz 5 recap quiz 6 recap advanced caching Hand a

$$$ $$$ Cache Memory 2 $$$ 2 Schedule This week

Memory Hierarchy &amp; Caching CS 351: Systems Programming Michael Saelee &lt;lee@iit.edu&gt;

Caches and Memory Hierarchy: Review UCSB CS240A, Fall 2017 1 Motivation Most applications

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Chapter 5 Large and Fast: Exploiting Memory Hierarchy 5.1 Introduction Principle of Locality

CSE 502: Computer Architecture Memory Hierarchy &amp; Caches Motivation 10000 Performance

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Low-Level Memory Optimisations at the High-Level with Ownership-like Annotations Do you want

Scaling Your Cache & Caching at Scale Alex Miller @puredanger Mission Why does caching

Temporal Temporal Radiance Caching Radiance Caching Pascal Gautron R&D Engineer Thomson

Memory Hierarchy & Caching CS 351: Systems Programming Michael Saelee <lee@iit.edu>

CSE 502: Computer Architecture Memory Hierarchy & Caches Motivation 10000 Performance