Prefetching in Hybrid Main Memory Systems Subisha V ⤒ , Varun Gohil ⤒ , Nisarg Ujjainkar ⤒ , Manu Awasthi * ⤒ IIT Gandhinagar * Ashoka University HotStorage 2020 1
2 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
3 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
4 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years DRAM Density Scaling slowing down Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018
5 DRAM Scaling Challenge 2X/1.5 Years 2X/3 Years Neural Nets Genomics In-Memory Virtual Reality Frameworks DRAM Density Scaling slowing down Workloads require higher memory capacity Solving the DRAM Scaling Challenge, Samira Khan, ARM Research Summit 2018
6 Emerging Memory Technologies and many more ...
7 Emerging Memory Technologies Better density Energy efficient
8 Emerging Memory Technologies Better density Energy efficient Longer access latencies Finite write endurance
9 Hybrid Main Memory Use DRAM and NVM synergistically
10 Hybrid Main Memory Use DRAM and NVM synergistically Single Address Space Variant
11 Hybrid Main Memory Use DRAM and NVM synergistically DRAM as a Cache Variant
12 Alloy Cache ● State of the art DRAM Cache design
13 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity
14 Alloy Cache ● State of the art DRAM Cache design ● Acts as a direct mapped cache to NVM ● Fetches data at cacheline granularity ● Cacheline size is 72B
15 Alloy Cache Page ● 4KB contiguous memory chunk
16 Alloy Cache Page ● 4KB contiguous memory chunk
17 Alloy Cache Page ● 4KB contiguous memory chunk Empty Cachelines
18 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
19 Insights 1 GB Alloy Cache, 64 GB PCM PARSEC
20 Insights
21 Insights
22 Insights Workloads exhibit page-level spatial locality in NVM
23 Insights
24 Insights 92% of DRAM Cache pages are completely empty ! Unallocated/Empty Page Allocated Page
25 Insights A large portion of DRAM Cache is unallocated
26 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
27 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity
28 Prefetcher Design ● Page-Level Spatial Locality in NVM ⇒ Prefetch at page granularity ● DRAM Cache is largely unallocated ⇒ Place prefetched pages in DRAM Cache
29 Prefetcher Design
30 Prefetcher Design ● When to prefetch?
31 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache?
32 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location?
33 Prefetcher Design ● When to prefetch? ● Where to place prefetched data in DRAM Cache? ● How to identify type of data at DRAM Cache location? ● How to check if data is in a prefetched page?
34 When to Prefetch?
35 When to Prefetch? Prefetch a page if ⇒ #cacheline access ≥ Access Threshold (AT) ⇒ #unique cacheline access ≥ Unique Access Threshold (UAT)
36 When to Prefetch? NVM Page Classifier (NPC) ⇒ Stores cacheline access history of recently used pages
37 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
38 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
39 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
40 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
41 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
42 NVM Page Classifier Entry N : Max number of pages that can be present in NVM AT : Access Threshold
43 Where to place Prefetched Page?
44 Where to place Prefetched Page? Last Unallocated DRAM Cache page
45 Where to place Prefetched Page? Empty Page Classifier (EPC) ⇒ Stores the location of unallocated DRAM Cache pages
46 Empty Page Classifier (EPC)
47 Empty Page Classifier (EPC)
48 Empty Page Classifier (EPC)
49 Empty Page Classifier (EPC)
50 Empty Page Classifier (EPC)
51 Empty Page Classifier (EPC)
52 Empty Page Classifier (EPC) Page Number = (4096 ✕ Level 1 index) + (64 ✕ Level 2 index) + Level 3 index
53 Identifying type of data in DRAM Cache
54 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page
55 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page
56 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty
57 Identifying type of data in DRAM Cache A DRAM Cache location might be ⇒ Prefetched page ⇒ Alloy Cache Page ⇒ Empty Need to distinguish them to ensure correctness
58 Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page
Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 59
Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 60
Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 61
Identifying type of data in DRAM Cache State 0: Empty Location State 1: Clean Prefetched Page State 2: Alloy Cache Page State 3: Dirty Prefetched Page 62
63 Identifying type of data in DRAM Cache Type Classifier (TC) ⇒ Stores the state of the DRAM Cache location
64 Type Classifier Entry
65 Type Classifier Entry
66 Type Classifier Entry
67 Checking if data is in a prefetched page
68 Checking if data is in a prefetched page Page Redirection Table (PRT) ⇒ Hash Table storing tags of prefetched data
69 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache
70 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache
71 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache
72 Page Redirection Table Entry D : Max number of pages that can be present in DRAM Cache
73 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
74 Evaluation ZSim + NVMain ⇒ 1 GB Alloy Cache, 64 GB Phase Change Memory ⇒ 8 core, 2.6 GHz processor ⇒ Use CACTI for access latency of structures ⇒ PARSEC benchmark
75 Evaluation
76 Evaluation Sequential access behavior
77 Evaluation 1.5 ✕ -4 ✕ improvement
78 Evaluation
79 Evaluation 7 ✕ speedup
80 Evaluation 16-40% higher IPC
81 Outline of the Presentation ● Background ● Insights ● Prefetcher Design ● Evaluation ● Future Work
82 Future Work Evaluate our prefetcher on ⇒ Memory-intensive SPEC workloads ⇒ Graph workloads having irregular memory access patterns ⇒ Compare with similar recent works
83 Key Takeaways Link to Paper: ● Prefetch at page granularity to exploit page-level spatial locality. Contact Us: ● Place prefetched page in gohil.varun@iitgn.ac.in DRAM Cache to improve its manu.awasthi@ashoka.edu.in utilization ● We observe 16-40% increase in IPC on PARSEC.
Recommend
More recommend