Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor , Sahil Koladiya, Rajeev Balasubramonian University of Utah Chris Wilkerson, Zeshan Chishti, Seth Pugsley *Intel Labs Variable Length Delta Prefetcher 1
Prefetchers Confirmation Based Prefetchers Immediate Prefetchers • Issue predictions after a few deltas • Aggressive • High Accuracy • Low Accuracy • Short Streams Lose out • Waste DRAM bandwidth and cache capacity Accurate Fast Variable Length Delta Prefetcher 2
Spatial Correlation • Learn Access (Delta) Patterns • Apply patterns when similar conditions re-occur. • Eg: PC, physical address, delta patterns Delta Patterns • Regular Delta Patterns. Eg : ( +1, +1, +1)…, (+2, +2, +2, +2)… • Irregular Delta Patterns. Eg : ( +1, +2, +3 )… Variable Length Delta Prefetcher 3
Long Repeatable Streams of Irregular Deltas Delta patterns for milc Page Num: 479218 Deltas: 1, 9, -8, 1, 8, 1, - 8, 1, 1, 7…….. Variable Length Delta Prefetcher 4
Long Repeatable Streams of Irregular Deltas Deltas : 1, 9, -8, 1, 8, 1, -8, 1, 1, 7, -1, - 5,….. Cache Line: A+1, A+10 , A+2, A+3, A+11 , A+12 , A+4, A+5, A+6, A+13 , A+12, A+7 …… Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7 Stride Prefetcher Coverage: 5/11 Stream2 : A+10, A+11, A+12, A+13 Confirmation Prefetches SandBox Prefetcher Coverage: 9/11 Neither are perfectly timely! Variable Length Delta Prefetcher 5
Variable Length Delta Prefetcher Variable Length Delta Prefetcher 6
Delta Prediction Per Page Tables $ Access Predicted Delta History Core 1 Delta/Offset Offset Prediction Tables Tables Last Level $$ Delta Prediction Per Page Tables $ Access Predicted Core 8 Delta History Delta/Offset Offset Prediction Tables Tables Structure of VLDP Variable Length Delta Prefetcher 7
Delta History Table Tracks delta within a page for (i=0;i<BIGNUM; i++) { Delta = Last Address- Current Address a[i]=b[i]+c[i]; } a, b, c can each belong to different pages So Deltas between pages is meaningless Variable Length Delta Prefetcher 8
Delta History Table Last 4 Num. Times Last Four Prefetched Page Last Last Deltas Used Offsets Num. Add. Predictor Variable Length Delta Prefetcher 9
Delta Prediction Tables Highest Priority (t=3) Lowest Priority (t=1) Deltas (3) Pred. Accuracy Delta(1) Pred. Accuracy 8b 8b 8b 8b 2b 8 b 8 b 2 b 64 Rows per Table … Match? Match? MUX Predicted Delta Variable Length Delta Prefetcher 10
Offset Prediction Table First Page Pred. Accuracy Offset Offset 7 b 7 b 2 b OPT is used only to predict the second access to a page Variable Length Delta Prefetcher 11
Need for Multiple Tables Repeating Delta Pattern- (1, 2, 3, 5, 2, 4)… Table 2 Table 1 Delta Pred. Delta Pred. 1,2 3 1 2 50% 2,3 5 2 3 Accuracy 3,5 2 3 5 5,2 4 5 2 Search for Delta pattern match starts from right most table Variable Length Delta Prefetcher 12
Looking farther than one Delta ahead Repeating Delta Pattern- (1, 2, 3), (1, 2, 3)……. Current Delta Delta Pred. Delta Pred. 1,2 3 1 2 2,3 1 2 3 3,1 2 3 1 Degree 1 Prediction -,- - - - Variable Length Delta Prefetcher 13
Looking farther than one Delta ahead Repeating Delta Pattern- 1, 2, 3, 1, 2, 3……. Current Delta Deg 1 Prediction Delta Pred. Delta Pred. 1,2 3 1 2 Degree 2 Prediction 2,3 1 2 3 3,1 2 3 1 Degree 1 Prediction -,- - - - Use Recursive lookup to look farther than one Delta Variable Length Delta Prefetcher 14
Case Study: Streaming Workloads Repeating Delta Pattern- 1, 1, 1, 1, 1… Table 2 Table 1 Delta Pred. Delta Pred. -,- - 1 1 -,- - - - -,- - - - -,- - - - Patterns learned from one page is applied to another Variable Length Delta Prefetcher 15
Updating the Delta History Tables Evict Not Recently Used Page Last Last 4 Last Num. Last 4 If Page not Num. Add. Deltas Predictor Used Prefetches present, replace LLC Access If Page present, add Last Last 4 Last Num. Last 4 Page Delta Add. Deltas Predictor Used Prefetches Num. Variable Length Delta Prefetcher 16
Updating the Prediction Tables Last Last 3 Last Page Num. Add. Deltas Predictor Can the current state predict Latest Delta? B, C, D E If Prediction is Correct Latest Delta Increment Accuracy Delta Pred. Delta Pred. Delta Pred. If Prediction of Wrong D F? C,D E? B,C,D E? Decrement Accuracy If Accuracy==0 - - - - - - Table 1 Table 2 Table 3 Update + Promote Prediction - - - - - - If Prediction is Missing - - - - - - Seed T1 with prediction Variable Length Delta Prefetcher 17
Populating the Prediction Tables Delta Pred. Delta Pred. Delta Pred. 1 A 1,1,1 C 1,1 B - - - - -,- - Table 1 Table 2 Table 3 - - - - -,- - Table 1 Table 2 Pattern Wrong Wrong Missing - - - - -,- - NRU NRU NRU If mis-predict, a longer Delta history might be needed Variable Length Delta Prefetcher 18
Evaluation Methodology • Simics + USIMM • 8 RISC cores, UltraSPARC III ISA • 3.2 GHz, 4-wide OoO, 128-entry RoB • 32 KB I&D L1 caches, 4 cycles • 8 MB shared (1MB per core) L2 cache, 10 cycles • DRAM Specifications • 2Channels, 2 Ranks per Channel, 8 Banks per Rank • 800MHz DDR3 DRAM • SPEC 2006, NPB, and Cloudsuite • Mix1- milc, astar, lbm, libq ; Mix2- xalancbmk, lbm, zeusmp, milc ; Variable Length Delta Prefetcher 19
VLDP Configuration • Per-Core VLDP • 1 Offset Prediction Table, 64 entry • 3 Delta Prediction Tables, 64 entries each • 16 entry Delta History Table • Only Delta Prediction Tables 2,3 contribute to multi degree prefetch Offset Prediction Table 128 B Delta History Table 222 B Delta Prediction Table 648 B Total 998 B/Core Variable Length Delta Prefetcher 20
Performance Improvement (Vs No PC) FDP SBP AMPM VLDP 2.0 1.8 Speedup 1.6 1.4 1.2 1.0 0.8 VLDP is 6% better than AMPM 9% better than SBP 17% better than FDP Variable Length Delta Prefetcher 21
Performance Improvement (Vs PC) SMS GHB_PC_DC VLDP 2.0 1.8 Speedup 1.6 1.4 1.2 1.0 0.8 VLDP is 7.1% better than GHB 7.6% better than SMS Variable Length Delta Prefetcher 22
Coverage 120% FDP SMS SBP GHB_PC_DC AMPM VLDP 100% Coverage 80% 60% 40% 20% 0% NPB CloudSuite Spec2006 Spec2006-Mix GM FDP 16% GHB 33% SMS 55% AMPM 49% SBP 40% VLDP 61% Variable Length Delta Prefetcher 23
Sensitivity to table size 1.03 1.02 Speedup 1.01 1.00 0.99 0.98 2% increase in performance when DPT size is increased Variable Length Delta Prefetcher 24
Sensitivity number of Delta Prediction Tables 1.5 Speedup DRAM Accesses 1.4 1.3 1.2 1.1 1 1DPT_NoOPT 1DPT+OPT 2DPT+OPT 3DPT+OPT 4DPT+OPT 3DPT improves efficiency despite a modest 1% 1% performance improvement by reducing DRAM requests by 3% 3% Variable Length Delta Prefetcher 25
Conclusions • OPT Issues predictions without confirmation • DPT recognizes Irregular Delta Patterns • Long delta patterns provide high accuracy • Less than 1KB per core overhead • 6% better performance Variable Length Delta Prefetcher 26
Thank You Variable Length Delta Prefetcher 27
Recommend
More recommend