memory access latency
play

Memory Access Latency Joshua San Miguel Natalie Enright Jerger - PowerPoint PPT Presentation

Load Value Approximation: Approaching the Ideal Memory Access Latency Joshua San Miguel Natalie Enright Jerger Chip Multiprocessor main memory shared caches, network-on-chip private cache miss private cache private cache core core core


  1. Load Value Approximation: Approaching the Ideal Memory Access Latency Joshua San Miguel Natalie Enright Jerger

  2. Chip Multiprocessor main memory shared caches, network-on-chip private cache miss private cache private cache core core core 2

  3. Approximate Data Many applications can tolerate inexact data values.  In approximate computing applications, 40% to nearly 100% of memory data footprint can be approximated [Sampson, MICRO 2013]. Approximate data storage:  Reducing SRAM power by lowering supply voltage [Flautner, ISCA 2002].  Reducing DRAM power by lowering refresh rate [Liu, ASPLOS 2011].  Improving PCM performance and lifetime by lowering write precision and reusing failed cells [Sampson, MICRO 2013]. 3

  4. Outline • Load Value Approximation • Approximator Design • Evaluation • Conclusion 4

  5. Load Value Approximation main memory shared caches, network-on-chip private cache private cache private cache core core core 5

  6. Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache core core core 6

  7. Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache miss A private cache core core core 7

  8. Load Value Approximation main memory shared caches, network-on-chip generate A_approx approximator approximator approximator private cache private cache private cache core core core 8

  9. Load Value Approximation main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache A_approx core core core 9

  10. Load Value Approximation request A_actual main memory shared caches, network-on-chip approximator approximator approximator private cache private cache private cache A_approx core core core 10

  11. Load Value Approximation main memory shared caches, network-on-chip train with A_actual approximator approximator approximator private cache private cache private cache A_approx core core core 11

  12. Load Value Approximation main memory Takes memory access off critical path. shared caches, network-on-chip approximator approximator approximator private cache private cache private cache core core core 12

  13. Approximator Design load A approximator table tag tag global history buffer instruction ℎ , 1.0 2.2 3.1 address tag PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 tag tag local history buffer 𝑔 4.1 3.9 4.0 tag tag (4.1 + 3.9 + 4.0) / 3 tag A_approx = 4.0 13

  14. Approximator Design Load value approximators overcome the challenges of traditional value predictors:  No complexity of tracking speculative values.  No rollbacks.  High accuracy/coverage with floating-point values.  More tolerant to value delay. 14

  15. Evaluation EnerJ framework [Sampson, PLDI 2011]:  Program annotations to distinguish approximate data from precise data.  Evaluate final output error and approximator coverage. benchmark GHB size LHB size approximator size fft 0 2 49 kB lu 3 1 32 kB raytracer 1 1 32 kB smm 5 1 32 kB sor 0 2 49 kB 15

  16. Evaluation output error approximator coverage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% fft lu raytracer smm sor 16

  17. Conclusion Future work:  Further explore approximator design space (dynamic/hybrid schemes, machine learning).  Measure speedup of load value approximation using full- system simulations.  Measure power savings (low-power caches/NoCs/memory for approximate data). Low-error, high-coverage approximators allow us to approach the ideal memory access latency. 17

  18. Thank you baseline (precise) - raytracer load value approximation - raytracer 18

Recommend


More recommend