load value approximation
play

Load Value Approximation Joshua San Miguel Mario Badr Natalie - PowerPoint PPT Presentation

Load Value Approximation Joshua San Miguel Mario Badr Natalie Enright Jerger Accessing Memory main memory shared caches, directory, network-on-chip L1 cache processor core 2 Accessing Memory main memory shared caches, directory,


  1. Load Value Approximation Joshua San Miguel Mario Badr Natalie Enright Jerger

  2. Accessing Memory main memory shared caches, directory, network-on-chip L1 cache processor core 2

  3. Accessing Memory main memory shared caches, directory, network-on-chip miss L1 cache processor core 3

  4. Accessing Memory main memory Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache! shared caches, directory, network-on-chip miss L1 cache processor core 4

  5. Accessing Memory main memory Accessing memory is 10x – 100x greater latency and energy than accessing L1 cache! shared caches, directory, network-on-chip Higher efficiency via Approximate Computing… miss L1 cache processor core 5

  6. Approximate Computing Not all computations need to be precise. Data mining Computer vision Audio and video processing http://www.zentut.com/ http://www.cc.gatech.edu/~cnieto6/ http://themusicparlour.blogspot.ca/ Gaming Machine learning Dynamical simulation http://www.businessweek.com/ http://www.analyticbridge.com/ http://www.scientific-computing.com/ 6

  7. Approximate Computing execution time energy 7

  8. Approximate Computing execution time energy error 8

  9. Approximate Computing execution time energy error 9

  10. Approximate Computing Many applications can tolerate approximate data.  40% to nearly 100% of data footprint is approximate [Sampson, MICRO 2013]. 10

  11. Approximate Computing Many applications can tolerate approximate data.  40% to nearly 100% of data footprint is approximate [Sampson, MICRO 2013]. Approximate value locality:  Many data values are similar to or can be approximated from previously seen values. 11

  12. Outline • Load Value Approximation • Non-Speculative Operation • Approximator Design • Relaxed Confidence Windows • Approximation Degree • Methodology • Evaluation 12

  13. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache processor core 13

  14. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator processor core 14

  15. Load Value Approximation main memory shared caches, directory, network-on-chip load miss A L1 cache approximator A? processor core 15

  16. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator generate A_approx A? processor core 16

  17. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 17

  18. Load Value Approximation main memory No speculation, no rollbacks. shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 18

  19. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 19

  20. Load Value Approximation main memory fetch A_actual shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 20

  21. Load Value Approximation main memory shared caches, directory, network-on-chip L1 cache approximator train with A_actual A_approx processor core 21

  22. Load Value Approximation main memory Learns past values. Estimates future values. Improves performance and saves energy. shared caches, directory, network-on-chip L1 cache approximator A_approx processor core 22

  23. Approximator Design approximator table tag conf degree LHB global history buffer instruction ℎ , address local history buffer 𝑔 23

  24. Approximator Design time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 24

  25. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 25

  26. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 local history buffer 𝑔 4.1 3.9 4.0 26

  27. Approximator Design load miss A time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address PC ⊕ 1.0 ⊕ 2.2 ⊕ 3.1 local history buffer 𝑔 4.1 3.9 4.0 (4.1 + 3.9 + 4.0) / 3 A_approx = 4.0 27

  28. Approximator Design load miss A do_work(A_approx) time approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 28

  29. Approximator Design load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 29

  30. Approximator Design load miss A do_work(A_approx) time request(A_actual) A_actual = 4.2 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 30

  31. Approximator Design load miss A do_work(A_approx) time request(A_actual) A_actual = 4.2 approximator table tag conf degree LHB global history buffer instruction ℎ , 2.2 3.1 4.2 address local history buffer 𝑔 3.9 4.0 4.2 31

  32. Approximator Design – Other Considerations • Floating-point precision • History buffer sizes • Stale values More details in paper. 32

  33. Approximator Design Relaxed Confidence Windows  How do we avoid making bad approximations?  Trade-off performance and error. Approximation Degree  Do we need to fetch the actual value from memory every time?  Trade-off energy and error. 33

  34. Relaxed Confidence Windows load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 34

  35. Relaxed Confidence Windows load miss A do_work(A_approx) time request(A_actual) A_actual = 9.0! approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 35

  36. Relaxed Confidence Windows tag conf degree LHB When approximating: if conf >= 0: use A_approx else: don’t use A_approx When updating: if A_approx , A_actual differ by <= CONF_WINDOW% : conf ++ else: conf- - 36

  37. Relaxed Confidence Windows – Output Error Varying CONF_WINDOW %: 0% 5% 10% 20% infinite 100% 80% output error 60% 40% 20% 0% 37

  38. Relaxed Confidence Windows – L1-D MPKI Varying CONF_WINDOW %: 1.0 normalized L1-D MPKI 0.8 0.6 0.4 0.2 0.0 0% 5% 10% 20% infinite CONF_WINDOW% 38

  39. Approximator Design Relaxed Confidence Windows  How do we avoid making bad approximations?  Trade-off performance and error. Approximation Degree  Do we need to fetch the actual value from memory every time?  Trade-off energy and error. 39

  40. Approximation Degree load miss A do_work(A_approx) time request(A_actual) approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 40

  41. Approximation Degree load miss A do_work(A_approx) time request(A_actual) A_actual = 4.0 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 41

  42. Approximation Degree load miss A do_work(A_approx) time request(A_actual) A_actual = 4.0 approximator table tag conf degree LHB global history buffer instruction ℎ , 1.0 2.2 3.1 address local history buffer 𝑔 4.1 3.9 4.0 A_approx = 4.0 42

  43. Approximation Degree tag conf degree LHB When approximating: if degree == APPROX_DEGREE : fetch A_actual else: don’t fetch A_actual When updating: if degree == APPROX_DEGREE : degree = 0 else: degree ++ 43

  44. Approximation Degree – Output Error Varying APPROX_DEGREE : 0 1 2 4 8 16 100% 80% output error 60% 40% 20% 0% 44

  45. Approximation Degree – L1-D Fetches Varying APPROX_DEGREE : 1 normalized L1-D fetches 0.8 0.6 0.4 0.2 0 0 1 2 4 8 16 APPROX_DEGREE 45

  46. Methodology Multi-threaded approximate applications  PARSEC benchmark suite [Bienia, Princeton 2011]  Programmer annotations and ISA extensions [Esmaeilzadeh, ASPLOS 2012] Approximator design space exploration  Pin dynamic binary instrumentation tool [Luk, PLDI 2005] Full-system simulation  FeS2 cycle-level x86 simulator [Neelakantam, ASPLOS 2008] Approximator, cache and memory energy consumption  CACTI modeling tool [Thoziyoor, HP 2008] 46

  47. Evaluation application speedup energy savings 16% 14% 12% 10% 8% 6% 4% 2% 0% 0 4 16 APPROX_DEGREE 47

  48. Evaluation application speedup energy savings 16% Up to 28% speedup 14% 12% 10% 8% 6% 4% 2% 0% 0 4 16 APPROX_DEGREE 48

Recommend


More recommend