caches and memory
play

Caches and Memory Anne Bracy CS 3410 Computer Science Cornell - PowerPoint PPT Presentation

Caches and Memory Anne Bracy CS 3410 Computer Science Cornell University Slides by Anne Bracy with 3410 slides by Professors Weatherspoon, Bala, McKee, and Sirer. See P&H Chapter: 5.1-5.4, 5.8, 5.10, 5.13, 5.15, 5.17 1 Programs 101 C


  1. MEMORY Simulation #2: addr data 4-byte, DM Cache 0000 A 0001 B tag|index 0010 C CACHE XXXX 0011 D V data 0100 E index tag 1 11 N 0101 F 00 1 11 O 01 0110 G 0 xx X 0111 H 10 0 xx X 1000 J 11 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Miss 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 34

  2. MEMORY Simulation #2: addr data 4-byte, DM Cache 0000 A 0001 B tag|index 0010 C CACHE XXXX 0011 D V data 0100 E index tag 1 01 E 0101 F 00 1 11 O 01 0110 G 0 11 X 0111 H 10 0 11 X 1000 J 11 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Miss 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 35

  3. MEMORY Simulation #2: addr data 4-byte, DM Cache 0000 A 0001 B tag|index 0010 C CACHE XXXX 0011 D V data 0100 E index tag 1 01 E 0101 F 00 1 11 O 01 0110 G 0 11 X 0111 H 10 0 11 X 1000 J 11 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Miss 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 Miss • Check valid bit 1101 O 1110 P 1111 Q 36

  4. MEMORY Simulation #2: addr data 4-byte, DM Cache 0000 A 0001 B tag|index 0010 C CACHE XXXX 0011 D V data 0100 E index tag 1 11 N 0101 F 00 1 11 O 01 0110 G 0 11 X 0111 H 10 0 11 X 1000 J 11 1001 K load 0x1100 1010 L Miss cold Disappointed! load 0x1101 Miss cold 1011 M L load 0x0100 cold Miss 1100 N load 0x1100 Miss 1101 O 1110 P 1111 Q 37

  5. Reducing Cold Misses by Increasing Block Size Leveraging Spatial Locality 38

  6. MEMORY Increasing Block Size addr data 0000 A 0001 B CACHE 0010 C offset index V data tag 0011 D XXXX 00 0 x A | B 0100 E 01 0 x C | D 0101 F 10 0 x E | F 0110 G 11 0 x G | H 0111 H 1000 J • Block Size: 2 bytes 1001 K 1010 L • Block Offset: least significant bits 1011 M indicate where you live in the block 1100 N • Which bits are the index? tag? 1101 O 1110 P 1111 Q 39

  7. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 0 x X | X 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 1011 M load 0x0100 • Check tag 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 40

  8. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 1 N | O 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 1011 M load 0x0100 • Check tag 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 41

  9. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 1 N | O 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 42

  10. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 1 N | O 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 43

  11. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 0 E | F 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 44

  12. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A index 0001 B CACHE 0010 C tag| |offset index V data tag 0011 D XXXX 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 0 E | F 0110 G 11 0 x X | X 0111 H 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 Miss • Check valid bit 1101 O 1110 P 1111 Q 45

  13. MEMORY Simulation #3: addr data 8-byte, DM Cache 0000 A 0001 B CACHE 0010 C index V data tag 0011 D 00 0 x X | X 0100 E 01 0 x X | X 0101 F 10 1 0 E | F 0110 G 11 0 x X | X 0111 H 1000 J 1001 K cold load 0x1100 Miss 1 hit, 3 misses 1010 L load 0x1101 Hit! 3 bytes don’t fit in 1011 M load 0x0100 an 8 byte cache? Miss cold 1100 N load 0x1100 conflict Miss 1101 O 1110 P 1111 Q 46

  14. Removing Conflict Misses with Fully-Associative Caches 47

  15. MEMORY 8 byte, fully-associative addr data Cache 0000 A 0001 B XXXX XXXX XXXX 0010 C tag|offset offset 0011 D CACHE 0100 E 0101 F V data V data V data V data tag tag tag tag 0110 G 0 X | X 0 X | X 0 X | X 0 X | X xxx xxx xxx xxx 0111 H 1000 J What should the offset be? 1001 K What should the index be? 1010 L 1011 M What should the tag be? 1100 N 1101 O 1110 P 1111 Q 48

  16. MEMORY Simulation #4: addr data 8-byte, FA Cache 0000 A 0001 B XXXX 0010 C tag|offset 0011 D CACHE 0100 E 0101 F V data V data V data V data tag tag tag tag 0110 G 0 X | X 0 X | X 0 X | X 0 X | X xxx xxx xxx xxx 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 1011 M load 0x0100 • Check tags 1100 N load 0x1100 • Check valid bits 1101 O 1110 P 1111 Q 49 LRU Pointer

  17. MEMORY Simulation #4: addr data 8-byte, FA Cache 0000 A 0001 B XXXX 0010 C tag|offset 0011 D CACHE 0100 E 0101 F V data V data V data V data tag tag tag tag 0110 G 1 110 N | O 0 xxx X | X 0 X | X 0 X | X xxx xxx 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tags 1100 N load 0x1100 • Check valid bits 1101 O 1110 P 1111 Q 50

  18. MEMORY Simulation #4: addr data 8-byte, FA Cache 0000 A 0001 B XXXX 0010 C tag|offset 0011 D CACHE 0100 E 0101 F V data V data V data V data tag tag tag tag 0110 G 1 110 N | O 0 xxx X | X 0 X | X 0 X | X xxx xxx 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tags Miss 1100 N load 0x1100 • Check valid bits 1101 O 1110 P 1111 Q 51 LRU Pointer

  19. MEMORY Simulation #4: addr data 8-byte, FA Cache 0000 A 0001 B XXXX 0010 C tag|offset 0011 D CACHE 0100 E 0101 F V data V data V data V data tag tag tag tag 0110 G 1 110 N | O 1 010 E | F 0 X | X 0 X | X xxx xxx 0111 H 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tags Miss 1100 N load 0x1100 Hit! • Check valid bits 1101 O 1110 P 1111 Q 52 LRU Pointer

  20. Pros and Cons of Full Associativity + No more conflicts! + Excellent utilization! But either: Parallel Reads – lots of reading! Serial Reads – lots of waiting t avg = t hit + % miss * t miss = 4 + 5% x 100 = 6 + 3% x 100 = 9 cycles = 9 cycles 53

  21. Pros & Cons Direct Mapped Fully Associative Tag Size Smaller Larger SRAM Overhead Less More Controller Logic Less More Speed Faster Slower Price Less More Scalability Very Not Very # of conflict misses Lots Zero Hit Rate Low High Pathological Cases Common ?

  22. Reducing Conflict Misses with Set-Associative Caches Not too conflict-y. Not too slow. … Just Right! 55

  23. MEMORY 8 byte, 2-way addr data set associative Cache 0000 A 0001 B XXXX XXXX XXXX 0010 C tag||offset offset index 0011 D CACHE 0100 E 0101 F index V data V tag data tag 0110 G 0 0 xx E | F 0 xx N | O 0111 H 1 0 xx C | D 0 xx P | Q 1000 J What should the offset be? 1001 K 1010 L What should the index be? 1011 M 1100 N What should the tag be? 1101 O 1110 P 1111 Q 56

  24. MEMORY 8 byte, 2-way addr data set associative Cache 0000 A 0001 B XXXX 0010 C tag||offset index 0011 D CACHE 0100 E 0101 F index V data V tag data tag 0110 G 0 0 xx X | X 0 xx X | X 0111 H 1 0 xx X | X 0 xx X | X 1000 J 1001 K Lookup: load 0x1100 1010 L Miss • Index into $ load 0x1101 1011 M load 0x0100 • Check tag 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 58 LRU Pointer

  25. MEMORY 8 byte, 2-way addr data set associative Cache 0000 A 0001 B XXXX 0010 C tag||offset index 0011 D CACHE 0100 E 0101 F index V data V tag data tag 0110 G 0 1 11 N | O 0 xx X | X 0111 H 1 0 xx X | X 0 xx X | X 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 59 LRU Pointer

  26. MEMORY 8 byte, 2-way addr data set associative Cache 0000 A 0001 B XXXX 0010 C tag||offset index 0011 D CACHE 0100 E 0101 F index V data V tag data tag 0110 G 0 1 11 N | O 0 xx X | X 0111 H 1 0 xx X | X 0 xx X | X 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 • Check valid bit 1101 O 1110 P 1111 Q 60 LRU Pointer

  27. MEMORY 8 byte, 2-way addr data set associative Cache 0000 A 0001 B XXXX 0010 C tag||offset index 0011 D CACHE 0100 E 0101 F index V data V tag data tag 0110 G 0 1 11 N | O 1 01 E | F 0111 H 1 0 xx X | X 0 xx X | X 1000 J 1001 K Lookup: load 0x1100 Miss 1010 L • Index into $ load 0x1101 Hit! 1011 M load 0x0100 • Check tag Miss 1100 N load 0x1100 Hit! • Check valid bit 1101 O 1110 P 1111 Q 61 LRU Pointer

  28. Eviction Policies Which cache line should be evicted from the cache to make room for a new line? • Direct-mapped: no choice, must evict line selected by index • Associative caches • Random: select one of the lines at random • Round-Robin: similar to random • FIFO: replace oldest line • LRU: replace line that has not been used in the longest time 62

  29. Misses: the Three C’s • Cold (compulsory) Miss: never seen this address before • Conflict Miss: cache associativity is too low • Capacity Miss: cache is too small 63

  30. Miss Rate vs. Block Size 64

  31. Block Size Tradeoffs • For a given total cache size, Larger block sizes mean…. – fewer lines – so fewer tags, less overhead – and fewer cold misses (within-block “prefetching”) • But also… – fewer blocks available (for scattered accesses!) – so more conflicts – can decrease performance if working set can’t fit in $ – and larger miss penalty (time to fetch block)

  32. Miss Rate vs. Associativity 66

  33. ABCs of Caches t avg = t hit + % miss * t miss + Associativity: ⬇ conflict misses J ⬆ hit time L + Block Size: ⬇ cold misses J ⬆ conflict misses L + Capacity: ⬇ capacity misses J ⬆ hit time L 67

  34. Which caches get what properties? t avg = t hit + % miss * t miss Design with Fast speed in mind L1 Caches More Associative L2 Cache Bigger Block Sizes Larger Capacity L3 Cache Design with miss Big rate in mind 68

  35. Roadmap • Things we have covered: – The Need for Speed – Locality to the Rescue! – Calculating average memory access time – $ Misses: Cold, Conflict, Capacity – $ Characteristics: Associativity, Block Size, Capacity • Things we will now cover: – Cache Figures – Cache Performance Examples – Writes 69

  36. More Slides Coming…

  37. 2-Way Set Associative Cache (Reading) Tag Index Offset = = line select 64bytes word select 32bits 71 hit? data

  38. 3-Way Set Associative Cache (Reading) Tag Index Offset = = = line select 64bytes word select 32bits 72 hit? data

  39. How Big is the Cache? Tag Index Offset n bit index, m bit offset, N-way Set Associative Question: How big is cache? • Data only ? (what we usually mean when we ask “how big” is the cache) • Data + overhead? 73

  40. Cache Performance Example t avg = t hit + % miss * t miss t avg = for accessing 16 words? Memory Parameters (very simplified): • Main Memory: 4GB – Data cost: 50 cycle for first word, plus 3 cycles per subsequent word • L1: 512 x 64 byte cache lines, direct mapped – Data cost: 3 cycle per word access – Lookup cost: 2 cycle Performance if %hit = 90%? Performance if %hit = 95%? Note: here t hit splits up lookup vs. data cost. Why are there two ways? 75

  41. Performance Calculation with $ Hierarchy t avg = t hit + % miss * t miss • Parameters – Reference stream: all loads – D$: t hit = 1ns, % miss = 5% – L2: t hit = 10ns, % miss = 20% (local miss rate) – Main memory: t hit = 50ns • What is t avgD$ without an L2? – t missD$ = – t avgD$ = • What is t avgD$ with an L2? – t missD$ = – t avgL2 = – t avgD$ = 77

  42. Performance Summary Average memory access time (AMAT) depends on: • cache architecture and size • Hit and miss rates • Access times and miss penalty Cache design a very complex problem: Cache size, block size (aka line size) • Number of ways of set-associativity (1, N, ¥ ) • Eviction policy • Number of levels of caching, parameters for each • Separate I-cache from D-cache, or Unified cache • Prefetching policies / instructions • Write policy • 79

  43. Takeaway Direct Mapped à fast, but low hit rate Fully Associative à higher hit cost, higher hit rate Set Associative à middleground Line size matters. Larger cache lines can increase performance due to prefetching. BUT, can also decrease performance is working set size cannot fit in cache. Cache performance is measured by the average memory access time (AMAT), which depends cache architecture and size, but also the access time for hit, miss penalty, hit rate. 80

  44. What about Stores? Where should you write the result of a store? • If that memory location is in the cache? – Send it to the cache – Should we also send it to memory right away? (write-through policy) – Wait until we evict the block (write-back policy) • If it is not in the cache? – Allocate the line (put it in the cache)? (write allocate policy) – Write it directly to memory without allocation? (no write allocate policy)

  45. Cache Write Policies Q: How to write data? addr Cache Memory CPU SRAM DRAM data If data is already in the cache… No-Write writes invalidate the cache and go directly to memory Write-Through writes go to main memory and cache Write-Back CPU writes only to cache cache writes to main memory later (when block is evicted)

  46. Write Allocation Policies Q: How to write data? addr Cache Memory CPU SRAM DRAM data If data is not in the cache… Write-Allocate allocate a cache line for new data (and maybe write-through) No-Write-Allocate ignore cache, just go to main memory

  47. Write-Through Stores 16 byte, byte-addressed memory 4 btye, fully-associative cache: Memory Instructions: 2-byte blocks, write-allocate 0 78 LB $1 ß M[ 1 ] 4 bit addresses: 1 29 LB $2 ß M[ 7 ] 3 bit tag, 1 bit offset 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 1 0 LB $2 ß M[ 10 ] 150 6 162 SB $1 à M[ 5 ] 7 0 0 173 SB $1 à M[ 10 ] 8 18 9 21 Cache Register File 10 33 11 28 $0 12 19 $1 Misses: 0 13 200 $2 Hits: 0 14 210 $3 15 225

  48. Write-Through (REF 1) Memory Instructions: 0 78 LB $1 ß M[ 1 ] 1 29 LB $2 ß M[ 7 ] 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 1 0 LB $2 ß M[ 10 ] 150 6 162 SB $1 à M[ 5 ] 7 0 0 173 SB $1 à M[ 10 ] 8 18 9 21 Cache Register File 10 33 11 28 $0 12 19 $1 Misses: 0 13 200 $2 Hits: 0 14 210 $3 15 225

  49. Write-Through (REF 1) Memory Instructions: 0 78 LB $1 ß M[ 1 ] M 1 Addr: 0001 29 LB $2 ß M[ 7 ] 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 0 1 000 78 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 1 0 173 SB $1 à M[ 10 ] 8 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 1 13 200 $2 Hits: 0 14 210 $3 15 225

  50. Write-Through (REF 2) Memory Instructions: 0 78 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 0 1 000 78 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 1 0 173 SB $1 à M[ 10 ] 8 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 1 13 200 $2 Hits: 0 14 210 $3 15 225

  51. Write-Through (REF 2) Memory Instructions: 0 78 LB $1 ß M[ 1 ] M Addr: 0111 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 0 1 000 78 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 1 1 011 162 173 SB $1 à M[ 10 ] 8 173 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 2 13 173 200 $2 Hits: 0 14 210 $3 15 225

  52. Write-Through (REF 3) Memory Instructions: 0 78 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 1 1 000 78 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 0 1 011 162 173 SB $1 à M[ 10 ] 8 173 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 2 13 173 200 $2 Hits: 0 14 210 $3 15 225

  53. Write-Through (REF 3) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 Addr: 0000 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 71 5 0 1 000 173 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 1 1 011 162 173 SB $1 à M[ 10 ] 8 173 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 2 13 173 200 $2 Hits: 1 14 210 $3 15 225

  54. Write-Through (REF 4) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M Addr: 0101 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 000 173 LB $2 ß M[ 10 ] 150 6 29 162 SB $1 à M[ 5 ] 7 1 1 010 162 71 173 SB $1 à M[ 10 ] 8 150 173 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 2 13 173 200 $2 Hits: 1 14 210 $3 15 225

  55. Write-Through (REF 4) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 1 1 000 173 LB $2 ß M[ 10 ] 150 29 6 29 162 SB $1 à M[ 5 ] 7 0 1 010 71 173 SB $1 à M[ 10 ] 8 150 29 150 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 3 13 173 200 $2 Hits: 1 14 210 $3 15 225

  56. Write-Through (REF 5) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M Addr: 1010 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 1 1 101 173 LB $2 ß M[ 10 ] 29 6 29 162 SB $1 à M[ 5 ] 7 0 1 010 71 173 SB $1 à M[ 10 ] 8 29 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 3 13 173 200 $2 Hits: 1 14 210 $3 15 225

  57. Write-Through (REF 5) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 101 33 LB $2 ß M[ 10 ] 29 M 6 28 162 SB $1 à M[ 5 ] 7 1 1 010 71 173 SB $1 à M[ 10 ] 8 29 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 1 14 210 $3 15 225

  58. Write-Through (REF 6) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M Addr: 0101 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 101 33 LB $2 ß M[ 10 ] 29 29 M 6 28 162 SB $1 à M[ 5 ] 7 1 1 010 71 173 SB $1 à M[ 10 ] 8 29 29 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 1 14 210 $3 15 225

  59. Write-Through (REF 6) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 101 33 LB $2 ß M[ 10 ] 29 29 M 6 28 162 SB $1 à M[ 5 ] Hit 7 1 1 010 71 173 SB $1 à M[ 10 ] 8 29 29 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 2 14 210 $3 15 225

  60. Write-Through (REF 7) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 Addr: 1011 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 101 33 LB $2 ß M[ 10 ] 29 29 M 6 28 162 SB $1 à M[ 5 ] Hit 7 1 1 010 71 173 SB $1 à M[ 10 ] 8 29 29 18 9 21 Cache Register File 10 33 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 2 14 210 $3 15 225

  61. Write-Through (REF 7) Memory Instructions: 0 173 LB $1 ß M[ 1 ] M 1 29 LB $2 ß M[ 7 ] M 2 120 SB $2 à M[ 0 ] Hit 3 123 lru V tag data SB $1 à M[ 5 ] 4 M 71 5 0 1 101 29 LB $2 ß M[ 10 ] 33 29 29 M 6 28 162 SB $1 à M[ 5 ] Hit 7 1 1 010 71 173 SB $1 à M[ 10 ] Hit 8 29 29 18 9 21 Cache Register File 10 33 29 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 3 14 210 $3 15 225

  62. How Many Memory References? Write-through performance • How many memory reads? • How many memory writes? • Overhead? Do we need a dirty bit?

  63. Write-Through (REF 8,9) M Memory Instructions: M 0 173 ... Hit 1 29 SB $1 à M[ 5 ] M 2 120 LB $2 ß M[ 10 ] M 3 123 lru V tag data SB $1 à M[ 5 ] 4 Hit 71 5 0 1 101 29 SB $1 à M[ 10 ] 29 29 29 Hit 6 28 162 SB $1 à M[ 5 ] 7 1 1 010 71 173 SB $1 à M[ 10 ] 8 29 29 29 18 9 21 Cache Register File 10 29 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 3 14 210 $3 15 225

  64. Write-Through (REF 8,9) M Memory Instructions: M 0 173 ... Hit 1 29 SB $1 à M[ 5 ] M 2 120 LB $2 ß M[ 10 ] M 3 123 lru V tag data SB $1 à M[ 5 ] 4 Hit 71 5 0 1 101 29 SB $1 à M[ 10 ] 29 29 Hit 6 28 162 SB $1 à M[ 5 ] Hit 7 1 1 010 71 173 SB $1 à M[ 10 ] Hit 8 29 29 29 18 9 21 Cache Register File 10 29 11 28 $0 12 29 19 $1 Misses: 4 13 33 200 $2 Hits: 5 14 210 $3 15 225

  65. Summary: Write Through Write-through policy with write allocate • Cache miss: read entire block from memory • Write: write only updated item to memory • Eviction: no need to write to memory

  66. Next Goal: Write-Through vs. Write-Back Can we also design the cache NOT to write all stores immediately to memory? – Keep the current copy in cache, and update memory when data is evicted (write-back policy) – Write-back all evicted lines? • No, only written-to blocks

  67. Write-Back Meta-Data (Valid, Dirty Bits) V D Tag Byte 1 Byte 2 … Byte N • V = 1 means the line has valid data • D = 1 means the bytes are newer than main memory • When allocating line: – Set V = 1, D = 0, fill in Tag and Data • When writing line: – Set D = 1 • When evicting line: – If D = 0: just set V = 0 – If D = 1: write-back Data, then set D = 0, V = 0

  68. Write-back Example • Example: How does a write-back cache work? • Assume write-allocate

Recommend


More recommend