Load Value Approximation: Approaching the Ideal Memory Access Latency Joshua San Miguel Natalie Enright Jerger
Chip Multiprocessor main memory shared caches, network-on-chip private cache miss private cache private cache core core core 2
Approximate Data Many applications can tolerate inexact data values. In approximate computing applications, 40% to nearly 100% of memory data footprint can be approximated [Sampson, MICRO 2013]. Approximate data storage: Reducing SRAM power by lowering supply voltage [Flautner, ISCA 2002]. Reducing DRAM power by lowering refresh rate [Liu, ASPLOS 2011]. Improving PCM performance and lifetime by lowering write precision and reusing failed cells [Sampson, MICRO 2013]. 3
Outline • Load Value Approximation • Approximator Design • Evaluation • Conclusion 4
Load Value Approximation main memory shared caches, network-on-chip private cache private cache private cache core core core 5
Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache core core core 6
Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache miss A private cache core core core 7
Load Value Approximation main memory shared caches, network-on-chip generate A_approx approximator approximator approximator private cache private cache private cache core core core 8
Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache A_approx core core core 9
Load Value Approximation request A_actual main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache A_approx core core core 10
Load Value Approximation main memory shared caches, network-on-chip train with A_actual approximator approximator approximator private cache private cache private cache A_approx core core core 11
Load Value Approximation main memory Takes memory access off critical path. shared caches, network-on-chip approximator approximator approximator private cache private cache private cache core core core 12
Approximator Design load A approximator table tag tag global history buffer instruction ℎ , 1.0 2.2 3.1 address tag PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 tag tag local history buffer 𝑔 4.1 3.9 4.0 tag tag (4.1 + 3.9 + 4.0) / 3 tag A_approx = 4.0 13
Approximator Design Load value approximators overcome the challenges of traditional value predictors: No complexity of tracking speculative values. No rollbacks. High accuracy/coverage with floating-point values. More tolerant to value delay. 14
Evaluation EnerJ framework [Sampson, PLDI 2011]: Program annotations to distinguish approximate data from precise data. Evaluate final output error and approximator coverage. benchmark GHB size LHB size approximator size fft 0 2 49 kB lu 3 1 32 kB raytracer 1 1 32 kB smm 5 1 32 kB sor 0 2 49 kB 15
Evaluation output error approximator coverage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% fft lu raytracer smm sor 16
Conclusion Future work: Further explore approximator design space (dynamic/hybrid schemes, machine learning). Measure speedup of load value approximation using full- system simulations. Measure power savings (low-power caches/NoCs/memory for approximate data). Low-error, high-coverage approximators allow us to approach the ideal memory access latency. 17
Thank you baseline (precise) - raytracer load value approximation - raytracer 18
Recommend
More recommend