a scalable time based integrate and fire neuromorphic
play

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with - PowerPoint PPT Presentation

A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1


  1. A Scalable Time-based Integrate-and-Fire Neuromorphic Core with Brain-Inspired Leak and Local Lateral Inhibition Capabilities Muqing Liu, Luke R. Everson, Chris H. Kim Dept. of ECE, University of Minnesota, Minneapolis, MN liux3300@umn.edu 1

  2. Outline • Background • Time Based Neural Networks • Leaky Neuron and Local Lateral Inhibition • Digit Recognition Application • Measurement Results • Conclusion 2

  3. Neuromorphic Computing http://juanribon.com/design/nerve-cell-body-diagram.php * synaptic weights: excitatory (+) or inhibitory (-) Biological neuron model Artificial neuron model • Biological neuron behavior: Weight multiplication (synapse) → Weight integration (cell body) → Threshold comparison & fire. • Applications: Image recognition/classification, natural language processing, speech recognition, etc. 3

  4. Prior Arts: Deep Learning Processor 4000µm TSMC 65nm LP 1P9M 65nm 1P8M CMOS 3.9 TOPS/W @1.1V (108KB) 235mW @ 1.1V Eyeriss: DCNN Accelerator 4000µm Peak Performance: DNPU: Reconfigurable CNN- 16.8 – 42.0 GOPS RNN Processor (1OP = 1MAC) Power: 278mW @ 1V 21mW @ 1.1V [1] Y.-H. Chen, et al ., ISSCC, 2016. [2] D. Shin, et al ., ISSCC, 2017. • Circuit/Architecture innovations: − Data reuse in convolutional neural network. − Utilize sparsity by data gating/zero skipping. − Reduced weight precision � � binary neural networks. � � 4

  5. Prior Arts: Emerging NVM based Implementation Memresitor based crossbar array [3] PCM based crossbar array [4] • Comparison with CMOS implementation: − Pros: Compact, analog computation. − Cons: Susceptible to noise, immature process. 5 [3] K.-H. Kim, et al ., Nano Lett., Dec. 2011. [4] D. Kuzum, et al ., Nano Lett., Jun. 2011.

  6. Time-based vs. Digital Implementation ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ y = i x i ·w i y = i x i ·w i = Delay 1 + Delay 2 + ··· + Delay i = x 1 ·w 1 +x 2 ·w 2 + ··· + x i ·w i N-bit M-bit Delay 1 Delay 2 Delay i Multipliers Adder x 1 ·w 1 x 2 ·w 2 x i ·w i x 1 w 1 Accumulate x 2 ∑ Activation w 2 x 1 ·w 1 +x 2 ·w 2 + ··· + x i ·w i x i w i Time Time-based Neural Network Digital Neural Network Time-based Digital Programmable delay Core circuits Multipliers & adders circuits Pros Area and power efficient High resolution Large area and power Cons Moderate resolution consumption 6

  7. Comparison with Previous Time- based Neural Network 7

  8. Proposed Time-based Neural Net DCO with 128 Programmable Delay Stages W 0,1 <2:0> W 2,3 <2:0> W 124,125 <2:0> W 126,127 <2:0> X 2, X 3 X 126, X 127 X 0, X 1 X 124, X 125 T DCO = = = = ∑ ∑ Delay i ∑ ∑ SRA SRA SRA SRA SRAM SRAM SRAM SRAM SRAM SRAM SRAM SRAM ∑ ∑ ∑ ∑ M M M M X i w i ∝ ∝ ∝ ∝ ⋅ ⋅ ⋅ ⋅ EN_DCO SRA SRA SRA SRA SRAM SRAM SRAM SRAM M M M M SRAM SRAM SRAM SRAM 8 Threshold Compare & Fire C 7 C 6 C 1 C 0 8b Counter Q D Q D Q D Q D SPIKE QB QB QB QB rst rst rst rst SPIKE Neuron control logic LEAK LLI Leaky Integrate & Fire, Local Lateral Inhibition 8

  9. Proposed Time-based Neural Net Programmable Delay Stage Unit cell layout (2 stages) 8.1µm *BL,BLB omitted WL for simplicity 3 SRAM cells W i <2:0> SRA SRA SRA M M X i M w i <2> w i <1> w i <0> SRA 5.9µm SRAM X i X i SRAM X i M 4C 2C C 3 SRAM cells • Input pixel: X i − Determines whether a stage is activated or not. • Weight: W i <2:0> − Determines how many capacitors are turned on as load in that stage. 9

  10. 64x128 Time-based Neural Network • 8 DCO cores are grouped together to implement local lateral inhibition. • 64 DCO neuromorphic cores in total. • 121 out of 128 DCO stages are used as programmable inputs. • Remaining 7 stages are reserved for calibration. 10

  11. Frequency Calibration and Linearity Test Frequency (a.u.) Frequency (a.u.) 0 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 • Frequency variation between 10 DCOs − Before calibration: 1.17%, after calibration: 0.10%. 11

  12. Bio-Inspired Features: Leaky Neuron and Local Lateral Inhibition (LLI) Electrical modeling of cell membrane [3] Lateral inhibition: Mach band illusion [4] • Leaky neuron: Ions diffuse through the neuron cell . • Local lateral inhibition: Active neuron strives to suppress the activities of its neighbors. [3] W. Gerstner, et al ., Neuronal Dynamics. [4] Wikipedia. 12

  13. Time-based Leak and LLI Time-based Leaky Integrate & Fire Neuron • Leak enabled: From − LSB of every counter Compare & Fire DCO SPIKE C 7 C 6 C 1 C 0 is reset periodically. Q D Q D Q D Q D QB QB QB QB rst rst rst rst LEAK (LSB reset) • LLI enabled: Time-based Local Lateral Inhibition (LLI) − Specific bits in the Neighbor counter neighboring counters bit reset are reset after a DCO LEAK spikes. LLI C − The fastest DCO resets Threshold + + + + + Ʃ Ʃ Ʃ Ʃ Ʃ - - - - - the other DCOs more SPIKE<0> SPIKE<1> SPIKE<2> SPIKE<7> often than it is reset by others. 13

  14. Leak and LLI *None LEAK *None LLI Spike Frequency Spike Frequency Sharper Sharper contrast contrast 0 0 DCO No. DCO No. *None: No leak and no LLI, basic DCO operation. • Leak: Uniformly lower spiking frequency . • LLI: Preferentially lower spiking frequency. • Goal: Higher contrast between different neuron outputs. 14

  15. Handwritten Digit Recognition • Input database: MNIST. • Learning method: Supervised learning. • Learning network: Single-layer & multi-layer perceptron network. 15

  16. Single-layer Digit Recognition • Single-layer architecture: Proof-of-concept for time- based neural network 16

  17. Multi-layer Digit Recognition • Multi-layer architecture: Demonstrates the scalability of the core. 17

  18. Measurement Results 65nm LP CMOS, 1.2V, 25 o C Recognition Accuracy (%) 94 Measured (*None) 92 Measured (Leaky) Simulation 90 88 86 84 82 Single-layer Two-layer Two-layer with 11x11 with 11x11 with 4-patch images images 22x22 images *None: No leak and no LLI, basic DCO operation. • Measured recognition accuracy from hardware is comparable to software simulation results. 18

  19. Measurement Results 65nm LP CMOS, 1.2V, 25 o C 1700 *None LLI 1500 1.7% Spike Count 1300 1100 900 700 17.7% 500 300 100 0 1 2 3 4 5 6 7 8 9 (Target) Digit • Spike count difference between digit “2” and “0” − Without LLI: 1.7%, with LLI: 17.7 %. 19

  20. Measurement Results 65nm LP CMOS, 25 o C 350 100 Frequency (MHz) Power (µW per DCO) 80 280 DCO Frequency (MHz) Power (µW) 60 210 140 40 20 70 0 0 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 Supply voltage (V) • Wide operating range: 0.7V ~ 1.2V. 20

  21. Performance Comparison ISSCC ’ 16 [6] ASSCC ’ 17 [5] VLSI ’ 15 [7] This work Note Hand writing Object detection + Hand writing Object a. N=16 in our measurements. Application recognition recognition intention prediction Recognition b. SOp/s/W: Synaptic operation Multi-layer perceptron Convolutional Deep neural Spiking LCA with Function (SOp). In DCO based time- network classification neural network network domain neural network, one Circuit Type Time-based Time-based Analog + Digital Digital oscillation of DCO is equivalent to 121 SOp. Technology 65nm 65nm 65nm 65nm c. 1GE: 1.44um 2 (65nm). PE: 0.24mm 2 (64 DCOs) 3.61mm 2 (32K PEs) 16.0mm 2 1.8mm 2 Area processing element. Voltage 1.2V 0.45V 1.2V - d. Operation: One operation is defined as one multiplication Frequency 99MHz (nominal DCO freq.) - 250MHz 40MHz (Inference) and accumulation (MAC). In DCO based time-domain neural Power 320.4 µW/DCO - 330mW 3.65mW network, one oscillation of Power 309G ÷ N spikes/s/W 5.7pJ/pixel DCO is equivalent to 121 3-bit 862GOPS/W 48.2TSOp/s/W (N=spiking threshold a ) Efficiency (memory+logic) MAC. Hardware - - - 76.5GE/PE e. Used spiking threshold of 16, Efficiency and only accounted for the 37.4TSOp/s/W b power consumption of core 48.2TSOp/s/W - - logic circuits, memory power is not included, since weight is 16.6GE/PE c 76.5GE/PE - - not updated during the Performance inference. Comparison 37.4TOPS/W d 862GOPS/W - - 5.7pJ/pixel 0.43pJ/pixel (logic) e - - (memory+logic) [5] D. Miyashita, et al ., ASSCC, 2017. [6] K. J. Lee, et al. , ISSCC, 2016. [7] J. K. Kim, et al ., VLSI, 2015. 21

  22. Die Photo and Performance Summary 22

  23. Conclusion • Neural network function is computed in time domain using standard digital circuits with high area and power efficiency. • Implemented brain-inspired leak and local lateral inhibition features to enhance the contrast between neuron outputs. • 65nm test chip measurements confirm 91% hand-written digit recognition accuracy. 23

Recommend


More recommend