1
Neuro-Inspired Processor Design for On-Chip Learning and - - PowerPoint PPT Presentation
Neuro-Inspired Processor Design for On-Chip Learning and - - PowerPoint PPT Presentation
Neuro-Inspired Processor Design for On-Chip Learning and Classification with CMOS and Resistive Synapses Jae-sun Seo School of ECEE, Arizona State University The 13 th Korea-U.S. Forum on Nanotechnology September 26, 2016 1 ML Literature
2
ML Literature (DNN) Neuromorphic (SNN)
Song, PLoS Biol. 2005 Courtesy: Nuance
- Dense connectivity
- Learning done offline
- Back-propagation
(requires labeled data)
- MNIST 99.79%, ImageNet 95%
- What about unlabeled data
- r customization?
- Full computation on each layer
→ high power
- Sparse connectivity
- Online learning
- STDP, SRDP, Reward
(biological evidence)
- MNIST 99.08%, ImageNet N/A
- Cont. learning & detection
- Adaptable for input change
- Sparse spiking, attention
→ low power
3
2.05mm 2.05mm
Base design Slim neuron variant 4-b synapse variant Low leakage variant
64K synapse array 256K synapse array 64K synapse array 64K synapse array
Neuromorphic Core with On-Chip STDP
- Under STDP learning, when neuron K spikes, all
synapses on row K and column K may update
- Transposable SRAM: single-cycle read & write in
both row and col. directions
- Efficient pre- and post-synaptic update
- Near threshold operation
- Pattern recognition
20X
fully functional
retention mode
Seo, CICC, 2011
4
Versatile Learning in Neuromorphic Core
- A versatile neurosynaptic core to support various learning rules,
large fan-in/-out, sparse connectivity
- Triplet STDP (Pfister, J. of Neuroscience, 2006, Gjorgjieva, PNAS 2011)
- post-pre-post: post nrn. spike & pre nrn. timing & post nrn. timing
- pre-post-pre: pre nrn. spike & post nrn. timing & pre nrn. Timing
Various STDP Learning Rules (Feldman, Neuron 2012) Multi-factor Triplet-STDP
N2
N3_0 N3_1
wb0
N3_2 N3_3
wb1 wb2 wb3
N1_0 N1_1
wa0
N1_2 N1_3
wa1 wa2 wa3
spike LTD LTP pre-synaptic neurons post-synaptic neurons cnt. cnt. cnt. cnt. cnt. cnt. cnt. cnt. Δw Δw Δt Δt
When N3 sp wb* synaps subject to L
LTP: w = w + [pre cnt.] + [post cnt.] LTD: w = w – [post cnt.] – [pre cnt.]
when N3 spikes
5
decoder
Axons w/ timing info.
Synapse Array
1024x256
spike packet spike packet
recurrent connection Synapses: TX => Inhib.
Inh. nrn Synapses
- Inh. => RX neuron
256 Neurons
spike timing info.
Layer (i) neurons Layer (i+1) neurons Inhibition
Feedforward Excitation & Inhibition
[1] Diehl, Front. of Neuroscience, 2015
- Joint feed-forward excitation and inhibition
- For a small number of inhibitory neurons,
add pre=>inh, inh=>post synapses
- Balance excitatory & inhibitory synaptic inputs
Vogels, Science, 2011
6
Neural Spike Sorting Processor (for deep brain sensing & stimulation)
- Signals from invasive electrodes: spikes from multiple neurons
- Online, unsupervised neuromorphic spike-sorting processor
Collaboration with Columbia University (ISLPED 2015)
Input: Raw Signal Detection & Alignment Clustering Sorting Processor Output
neuromorphic
Encoder H1 Z1 ZK
...
H2 H3 HN I
...
I1 I2 I3 Im
8 bits 32 samples
- Weight update through STDP
- Start with K=2, automatically
increases # of output neurons if the spike difference is large enough (self-organized map)
7
- Exp. Results: Clustering Accuracy
D2 D3 D4 D4* W-D1 W-D2 20 40 60 80 100 Accuracy(%) Dataset
- Proposed. Avg acc.= 91%
Osort based. Avg acc.=69%
Receptive field of dataset that contains 4 clusters in 3000 spikes
Spike sorting accuracy more reliable than other low-complexity algorithms such as O-sort
- Avg. accuracy: 91% vs. 69%
Synapse Array Input Neurons
Output Neurons Output Neurons Others Decoder
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 0.01 0.1 1 10 100 9.3W/ch 70 spikes/s/neuron ([4]) Frequency(MHz) VDD (V) 2.5 spikes/s/neuron (D2, D3, D4, D4*) 26W/ch
- 65nm GP, high-Vth, 0.5x0.5mm2
- 9.3µW/ch at 0.3V
- Layout of the design is
dominated by memory elements, as well as power.
8
Neuromorphic Computing w/ NVMs
- Emerging NVMs (e.g.
RRAM) could alleviate power/area bottleneck
- f conv. memories
- Read rows in parallel:
weighted sum current
- Peripheral CMOS read:
current-to-digital converter
130nm RRAM array + CMOS read circuits (under testing)
0.50 0.53 0.0 1.5 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 RE RE Vspike Vspike Vin Vin
Time (ns) Voltage (V)
Simulation results for 4ns read timing window
9
Summary
- Neuromorphic computing hardware
- 45nm testchip with on-chip STDP learning
- Versatile learning neuromorphic core & architecture
- 65nm spike clustering processor
- Emerging NVM arrays + peripheral read/write circuits
- Future research with circuit-device-architecture co-
design and optimization
10
Collaborators
- ASU
- Faculty: Yu Cao, Shimeng Yu, Chaitali Chakrabarti, Sarma
Vrudhula, Visar Berisha
- Students: Minkyu Kim, Deepak Kadetotad, Shihui Yin,
Abinash Mohanty, Yufei Ma
- Intel: Gregory Chen, Ram Krishnamurthy
- Columbia University: Mingoo Seok, Qi Wang