prefetching in hybrid main memory systems
play

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil - PowerPoint PPT Presentation

Prefetching in Hybrid Main Memory Systems Subisha V , Varun Gohil , Nisarg Ujjainkar , Manu Awasthi * IIT Gandhinagar * Ashoka University HotStorage 2020 1 2 Outline of the Presentation Background Insights


  1. Prefetching in Hybrid Main Memory Systems Subisha V ⤒ , Varun Gohil ⤒ , Nisarg Ujjainkar ⤒ , Manu Awasthi * ⤒ IIT Gandhinagar * Ashoka University HotStorage 2020 1

  2. 2 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  3. 3 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  4. 4 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years DRAM Density Scaling slowing down Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

  5. 5 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years Neural Nets Genomics In-Memory Virtual Reality Frameworks DRAM Density Scaling slowing down Workloads require higher memory capacity Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018

  6. 6 Emerging Memory Technologies and many more ...

  7. 7 Emerging Memory Technologies Better density Energy efficient

  8. 8 Emerging Memory Technologies Better density Energy efficient Longer access latencies Finite write endurance

  9. 9 Hybrid Main Memory Use DRAM and NVM synergistically

  10. 10 Hybrid Main Memory Use DRAM and NVM synergistically Single Address Space Variant

  11. 11 Hybrid Main Memory Use DRAM and NVM synergistically DRAM as a Cache Variant

  12. 12 Alloy Cache ● State of the art DRAM Cache design

  13. 13 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity

  14. 14 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity ● Cacheline size is 72B

  15. 15 Alloy Cache Page ● 4KB contiguous memory chunk

  16. 16 Alloy Cache Page ● 4KB contiguous memory chunk

  17. 17 Alloy Cache Page ● 4KB contiguous memory chunk Empty Cachelines

  18. 18 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  19. 19 Insights 1 GB Alloy Cache, 64 GB PCM PARSEC

  20. 20 Insights

  21. 21 Insights

  22. 22 Insights Workloads exhibit page-level spatial locality in NVM

  23. 23 Insights

  24. 24 Insights 92% of DRAM Cache pages are completely empty ! Unallocated/Empty Page Allocated Page

  25. 25 Insights A large portion of DRAM Cache is unallocated

  26. 26 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  27. 27 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity

  28. 28 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity ● DRAM Cache is largely unallocated ⇒ Place prefetched pages in DRAM Cache

  29. 29 Prefetcher Design

  30. 30 Prefetcher Design ● When to prefetch?

  31. 31 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache?

  32. 32 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location?

  33. 33 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location? ● How to check if data is in a prefetched page?

  34. 34 When to Prefetch?

  35. 35 When to Prefetch? Prefetch a page if ⇒ #cacheline access ≥ Access Threshold (AT) ⇒ #unique cacheline access ≥ Unique Access Threshold (UAT)

  36. 36 When to Prefetch? NVM Page Classifier (NPC) ⇒ Stores cacheline access history of recently used pages

  37. 37 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  38. 38 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  39. 39 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  40. 40 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  41. 41 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  42. 42 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold

  43. 43 Where to place Prefetched Page?

  44. 44 Where to place Prefetched Page? Last Unallocated DRAM Cache page

  45. 45 Where to place Prefetched Page? Empty Page Classifier (EPC) ⇒ Stores the location of unallocated DRAM Cache pages

  46. 46 Empty Page Classifier (EPC)

  47. 47 Empty Page Classifier (EPC)

  48. 48 Empty Page Classifier (EPC)

  49. 49 Empty Page Classifier (EPC)

  50. 50 Empty Page Classifier (EPC)

  51. 51 Empty Page Classifier (EPC)

  52. 52 Empty Page Classifier (EPC) Page Number = (4096 ✕ Level 1 index) + (64 ✕ Level 2 index) + Level 3 index

  53. 53 Identifying type of data in DRAM Cache

  54. 54 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page

  55. 55 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page

  56. 56 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty

  57. 57 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty Need to distinguish them to ensure correctness

  58. 58 Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page

  59. Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 59

  60. Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 60

  61. Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 61

  62. Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 62

  63. 63 Identifying type of data in DRAM Cache Type Classifier (TC) ⇒ Stores the state of the DRAM Cache location

  64. 64 Type Classifier Entry

  65. 65 Type Classifier Entry

  66. 66 Type Classifier Entry

  67. 67 Checking if data is in a prefetched page

  68. 68 Checking if data is in a prefetched page Page Redirection Table (PRT) ⇒ Hash Table storing tags of prefetched data

  69. 69 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache

  70. 70 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache

  71. 71 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache

  72. 72 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache

  73. 73 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  74. 74 Evaluation ZSim + NVMain ⇒ 1 GB Alloy Cache, 64 GB Phase Change Memory ⇒ 8 core, 2.6 GHz processor ⇒ Use CACTI for access latency of structures ⇒ PARSEC benchmark

  75. 75 Evaluation

  76. 76 Evaluation Sequential access behavior

  77. 77 Evaluation 1.5 ✕ -4 ✕ improvement

  78. 78 Evaluation

  79. 79 Evaluation 7 ✕ speedup

  80. 80 Evaluation 16-40% higher IPC

  81. 81 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work

  82. 82 Future Work Evaluate our prefetcher on ⇒ Memory-intensive SPEC workloads ⇒ Graph workloads having irregular memory access patterns ⇒ Compare with similar recent works

  83. 83 Key Takeaways Link to Paper: ● Prefetch at page granularity to exploit page-level spatial locality. Contact Us: ● Place prefetched page in gohil.varun@iitgn.ac.in DRAM Cache to improve its manu.awasthi@ashoka.edu.in utilization ● We observe 16-40% increase in IPC on PARSEC.

Recommend


More recommend