compilation and hardware support for approximate
play

Compilation and Hardware Support for Approximate Acceleration - PowerPoint PPT Presentation

Compilation and Hardware Support for Approximate Acceleration Thierry Moreau , Adrian Sampson, Andre Baixo, Mark Wyse, Ben Ransford, Jacob Nelson, Hadi Esmaeilzadeh (Georgia Tech), Luis Ceze and Mark Oskin University of Washington


  1. Compilation and Hardware Support for Approximate Acceleration Thierry Moreau , Adrian Sampson, Andre Baixo, Mark Wyse, Ben Ransford, Jacob Nelson, Hadi Esmaeilzadeh (Georgia Tech), Luis Ceze and Mark Oskin University of Washington moreau@uw.edu Theme: 2384.004 1 Thierry Moreau

  2. Approximate Computing Aims to exploit application resilience to trade-off quality for efficiency 2 Thierry Moreau

  3. Approximate Computing 3 Thierry Moreau

  4. Approximate Computing ✅ Accurate ✅ Approximate ❌ Expensive ✅ Cheap 4 Thierry Moreau

  5. 5 Thierry Moreau

  6. 6 Thierry Moreau

  7. 7 Thierry Moreau

  8. Neural Networks as Approximate Accelerators CPU Esmaeilzadeh et al. [MICRO 2012] 8 Thierry Moreau

  9. Neural Acceleration float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } 9 Thierry Moreau

  10. Neural Acceleration compiler-support float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } ACCEPT* *Sampson et. al [UW-TR] 10 Thierry Moreau

  11. Neural Acceleration compiler-support HW-support float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } ACCEPT SNNAP* *Moreau et. al [HPCA2015] 11 Thierry Moreau

  12. Neural Acceleration compiler-support HW-support float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } ACCEPT SNNAP 3.8x speedup and 2.8x efficiency - 10% error 12 Thierry Moreau

  13. Talk Outline Introduction Compiler Support with ACCEPT SNNAP Accelerator design Evaluation & Comparison with HLS 13 Thierry Moreau

  14. Compilation Overview code 1. Region detection annotation 14 Thierry Moreau

  15. Compilation Overview ACCEPT region detection code 1. Region detection & program annotation instrumentation 15 Thierry Moreau

  16. Compilation Overview ACCEPT region detection code 1. Region detection & program annotation instrumentation back prop. 2. ANN Training & topology [training.data] search 16 Thierry Moreau

  17. Compilation Overview ACCEPT region detection code 1. Region detection & program annotation instrumentation back prop. 2. ANN Training & topology [training.data] search ACCEPT code executes SNNAP 3. Code Generation transformation CPU 17 Thierry Moreau

  18. Compilation Overview ACCEPT region detection code 1. Region detection & program annotation instrumentation back prop. 2. ANN Training & topology [training.data] search ACCEPT code executes SNNAP 3. Code Generation transformation CPU 18 Thierry Moreau

  19. Compilation Overview ACCEPT region detection code 1. Region detection & program annotation instrumentation back prop. 2. ANN Training & topology [training.data] search ACCEPT code executes SNNAP 3. Code Generation transformation CPU 19 Thierry Moreau

  20. Programming Model float sobel (float* p); . . . float** src; float** dst; while (true) { sobel src = read_from_camera(); for (y=0; y < h; ++y) { for (x=0; x < w; ++x) { dst[y][x] = sobel(& src[y][x]); } } display(dst); } 20 Thierry Moreau

  21. Programming Model APPROX float sobel (APPROX float* p); . . . APPROX float** src; APPROX float** dst; while (true) { sobel src = read_from_camera(); for (y=0; y < h; ++y) { for (x=0; x < w; ++x) { dst[y][x] = sobel(& src[y][x]); } } display(ENDORSE(dst)); } 21 Thierry Moreau

  22. Programming Model APPROX float sobel (APPROX float* p); . . . APPROX float** src; APPROX float** dst; ✅ no side effects while (true) { sobel ✅ executes often src = read_from_camera(); for (y=0; y < h; ++y) { for (x=0; x < w; ++x) { dst[y][x] = sobel(& src[y][x]); } } display(ENDORSE(dst)); } 22 Thierry Moreau

  23. Checking for Quality annotated program sobel.c 23 Thierry Moreau

  24. Checking for Quality annotated quality program metric sobel.c d ( y, y 0 ) 24 Thierry Moreau

  25. Checking for Quality input data annotated quality program metric sobel.c d ( y, y 0 ) 25 Thierry Moreau

  26. Checking for Quality input data annotated quality program metric test sobel.c d ( y, y 0 ) training 26 Thierry Moreau

  27. Checking for Quality input data annotated quality program metric test sobel.c d ( y, y 0 ) Performance training Output Quality 27 Thierry Moreau

  28. Talk Outline Introduction Compiler Support with ACCEPT SNNAP Accelerator design Evaluation & Comparison with HLS 28 Thierry Moreau

  29. Background: Multi-Layer Perceptrons neural network computing a single layer x 9 = ([ [] ] []) x 7 w 67 w 57 w 47 x 6 x 8 w 68 w 58 w 48 6 f x 5 ! ∑ x7 wi7•xi w 69 w 59 w 49 x 4 i=4 x0 w47 x7 x4 w57 y0 x1 x8 activation function f x5 w67 y1 x2 x9 x6 Output x3 Hidden Layer 0 Hidden Layer 1 Input Layer 29 Thierry Moreau

  30. Background: Systolic Arrays computing a single layer systolic array x 9 = ([ x 6 [] ] []) x 7 w 67 w 57 w 47 x 5 x 6 x 8 w 68 w 58 w 48 x 4 f x 5 w 69 w 59 w 49 x 4 w 49 w 48 w 47 w 59 w 58 w 57 w 69 w 68 w 67 f x 7 x 8 x 9 30 Thierry Moreau

  31. PU Micro-Architecture systolic array processing unit x 6 x 5 x 4 PU control w 49 w 48 w 47 PE w 59 w 58 w 57 PE w 69 w 68 w 67 Storage PE f PE x 7 f x 8 x 9 31

  32. PU Micro-Architecture systolic array processing 1 - processing elements in DSP logic unit x 6 x 5 x 4 PU control w 49 w 48 w 47 PE w 59 w 58 w 57 PE w 69 w 68 w 67 Storage PE f PE x 7 f x 8 x 9 32 Thierry Moreau

  33. PU Micro-Architecture systolic array processing 1 - processing elements in DSP logic unit x 6 x 5 x 4 PU control w 49 w 48 w 47 2 - local storage for synaptic weights PE w 59 w 58 w 57 PE w 69 w 68 w 67 Storage PE f PE x 7 f x 8 x 9 33 Thierry Moreau

  34. PU Micro-Architecture systolic array processing 1 - processing elements in DSP logic unit x 6 x 5 x 4 PU control w 49 w 48 w 47 2 - local storage for synaptic weights PE w 59 w 58 w 57 3 - sigmoid unit implements non- PE w 69 w 68 w 67 linear activation functions Storage PE f PE x 7 f x 8 x 9 34 Thierry Moreau

  35. PU Micro-Architecture systolic array processing 1 - processing elements in DSP logic unit x 6 x 5 x 4 PU control w 49 w 48 w 47 2 - local storage for synaptic weights PE w 59 w 58 w 57 3 - sigmoid unit implements non- PE w 69 w 68 w 67 linear activation functions Storage PE f PE x 7 f 4 - vertically micro-coded sequencer x 8 x 9 35 Thierry Moreau

  36. Multi-Processing Units DMA Master scheduler bus PU PU PU PU control control control control PE PE PE PE PE PE PE PE Storage Storage Storage Storage PE PE PE PE PE PE PE PE f f f f 36 Thierry Moreau

  37. CPU-SNNAP Integration coherent reads custom & writes mastering with accelerator interface coherency port $L2 ACP DMA scheduler low-latency master $L1 event signaling, bus SE WF sleep & CPU wakeup PU PU PU PU 37 Thierry Moreau

  38. Talk Outline Introduction Programming model SNNAP design: • Efficient neural network evaluation • Low-latency communication Evaluation & Comparison with HLS 38 Thierry Moreau

  39. Evaluation Neural acceleration on SNNAP (8x8 configuration, clocked at 1/4 of f CPU ) vs. precise CPU execution application domain error metric blackscholes option pricing MSE fft DSP MSE inversek2j robotics MSE jmeint 3D-modeling miss rate jpeg compression image diff kmeans ML image diff sobel vision image diff 39 Thierry Moreau

  40. Whole-Application Speedup 10.8 38.1 3.8 4.00 Whole Application Speedup 3.00 2.7 2.4 2.3 2.00 1.5 1.3 1.00 0.00 b f i j j k s G n m p f m o t s v E e e b c e e g O i h e r n a s M o l t n e l e s E k s A 2 j N 40 Thierry Moreau

  41. Energy Savings 7.8 28.0 +36% 4.00 Energy = Power * Runtime on 3.00 2.8 (DRAM Energy Savings 2.2 + SoC) 2.00 1.8 1.7 1.1 .9 1.00 0.00 b f i j j k s G n m p f m o t s v E e e b c e e g O h i e r n a o s l M t n e l e s E k s A 2 j N 41 Thierry Moreau

  42. Conclusion float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } 42 Thierry Moreau

  43. Conclusion compiler-support HW-support float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } ACCEPT 43 Thierry Moreau

  44. Conclusion compiler-support HW-support float foo (float a, float b) { AR F NPU … P M G return val; approximation acceleration } ACCEPT SNNAP 3.8x speedup & 2.8x energy savings 44 Thierry Moreau

  45. Compilation and Hardware Support for Approximate Acceleration Thierry Moreau , Adrian Sampson, Andre Baixo, Mark Wyse, Ben Ransford, Jacob Nelson, Luis Ceze and Mark Oskin University of Washington moreau@uw.edu ACCEPT: http://accept.rocks SNNAP: upon request 45 Thierry Moreau

Recommend


More recommend