moving cnn accelerator computations closer to data
play

Moving CNN Accelerator Computations Closer to Data Sumanth - PowerPoint PPT Presentation

1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Cant be used as a


  1. 1 Moving CNN Accelerator Computations Closer to Data Sumanth Gudaparthi Surya Narayanan Rajeev Balasubramonian

  2. Evolution of CNN Accelerators 2 DRISA DianNao, DaDianNao, etc. Higher-cost DRAM, Limited by memory Can’t be used as a bandwidth host memory Moore’s Analog in-situ Law Accelerators Digital in-situ Digital Accelerators Accelerators Complex Analog Transistor scaling is circuits, Lack of coming to an end Flexibility ISAAC, PRIME etc

  3. SRAM based In-Situ Computation Accelerator 3 DA vs SISCA AIA vs SISCA DIA vs SISCA Perform Use SRAM cells Modify the LLC. Computations In- to perform In- Trivial overhead Situ Situ on baseline Cache Computations operations SISCA: Proposed Accelerator DA: Digital Accelerators AIA: Analog In-situ Accelerators DIA: Digital In-situ Accelerators

  4. Logic-In-Memory 4 WL WL 1 0 WLB BLB BL BLB BL WLB Jeloka et al., 2016

  5. Logic-In-Memory 5 WL1 1 0 1 0 0 0 0 Pre-charge the bit-lines 1 1 WLB1 Activate the word-lines Cell1 Discharge of bit-line voltage WL2 through Cell1 Discharge of bit-line voltage 1 0 1 1 1 0 0 through both Cells WLB2 Cell2 Bit-line stays Pre-charged 1 -> 0 1 1 BL BLB Jeloka et al., 2016

  6. Enabling In-Situ Multiplication in Caches 6 W 0-0 W 0-1 W 0-2 W 0-0 W 0-1 W 0-2 I 0-0 I 0-1 I 0-2 * W 0-1 W 0-2 W 0-0 W 0-2 I 0-2 I 0-2 W 0-2 I 0-2 W 0-0 W 0-1 W 0-1 W 0-0 I 0-0 I 0-1 I 0-2 W 0-2 I 0-1 W 0-0 I 0-1 W 0-1 I 0-1 W 0-1 W 0-2 I 0-0 W 0-0 I 0-0 I 0-0 C a-b : bit number-b in a th variable of C

  7. SISCA Organization 7 Banks LC:1 SA1 SA2 RAT1 RAT2 H-Tree RAT3 RAT4 SA3 SA4 Shifter Feature Map Unused Sub-array C a-b : bit number-b in a th variable of C Kernel Entries Entries Entries

  8. SISCA Dataflow 8 Sub Array 1 Input Feature Map (6x6) Sub Array 2 Kernel Maps 2x(3x3) Sub Array 3 Output Feature Maps 2x(4x4)

  9. Energy Improvements 9 6.3x Energy Efficient

  10. Performance Improvements 10 2.7x Higher Throughput

  11. Conclusions and Future Work 11 • SISCA is an SRAM in-situ computation Engine for Convolution Neural Networks • Uses on-chip Last Level Cache (LLC) to perform computations • SISCA is 6.3x Energy efficient, and has 2.7x higher throughput than DaDianNao • Better dataflow and mapping mechanisms can further improve the Energy and Throughput. • Need to work on better scheduling mechanisms to distribute the general purpose workload, and CNN data across the Cache.

  12. 12 Questions?

Recommend


More recommend