Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and Timothy M. Jones Computer Laboratory
Graph500 Search Performance
Current Prefetching Techniques ● Stride ● Software
Exploit Look-ahead! Work List Vertex List Edge List Visited 5 # # False 4 # # True 1 3 # True 2 5 0 True 3 # 6 True 7 # # True ... # # False # # True # # # # #
Problems ● Need address bounds of data structures ● Need to schedule prefetches ● Need to react to variable latency loads
Problems ● Need address bounds of data structures ● Configure them in software! ● Need to schedule prefetches ● Need to react to variable latency loads
Problems ● Need address bounds of data structures ● Configure them in software! ● Need to schedule prefetches ● Use observation hardware – EWMAs. ● Need to react to variable latency loads
Problems ● Need address bounds of data structures ● Configure them in software! ● Need to schedule prefetches ● Use observation hardware – EWMAs. ● Need to react to variable latency loads ● React to arrival of prefetches, not loads!
Graph Prefetcher Work List Vertex List Main To / From L2 Cache Memory Edge List Snoops EWMA Visited List Calculator Prefetch Reqs Address Generator Dcache Prefetched Data Request L2 Cache Queue DTLB Prefetcher Config Core
Work List Vertex List Edge List Visited 5 # # False 4 # # True 1 3 # True 2 5 0 True 3 7 6 True 7 # 0 True ... # 1 False # # True # # # # #
Graph Prefetcher: Microarchitecture Snoops & Prefetched Data Address Bounds Registers From L1 Cache Work List Start Work List End Vertex List Start Vertex List End Address Edge List Start Edge List End Filter Visited List Start Visited List End Prefetch To DTLB Request & L1 Cache Queue EWMA Unit Work List Time EWMA Prefetch Address Data Time EWMA Generator Ratio Register
Results – Graph500
Results – Boost Graph Library
Results – Sequential Iteration
Generalized Prefetching - Databases Bucket Hash Table 12 43 ptr Key Hash(43) = 3 Bucket ( 43, 2, ptr) 12 Lookahead Bucket 62 by striding in the key list 43 13 87 null
Programmable Prefetcher Snoops & Prefetched Data Programmable Registers From L1 Cache Hash XOR Shift Amount Hash Table Start Hash Table End Address Key List Start Key List End Filter Other Data Other Data Prefetch To DTLB Request & L1 Cache Queue EWMA Unit CPU CPU CPU Work List Time EWMA Programmable Units Data Time EWMA CPU CPU CPU Ratio Register
Graph Prefetching Using Data Structure Knowledge Sam Ainsworth and Timothy M. Jones sam.ainsworth@cl.cam.ac.uk timothy.jones@cl.cam.ac.uk Work List Vertex List Main To / From L2 Cache Memory Edge List Snoops EWMA Visited List Calculator Prefetch Reqs Address Generator Dcache Prefetched Data Request L2 Cache Queue DTLB Prefetcher Config Core For more information, see our paper from ICS 2016!
Recommend
More recommend