Neuro-Inspired Processor Design for On-Chip Learning and Classification with CMOS and Resistive Synapses Jae-sun Seo School of ECEE, Arizona State University The 13 th Korea-U.S. Forum on Nanotechnology September 26, 2016 1
ML Literature (DNN) Neuromorphic (SNN) Courtesy: Nuance Song, PLoS Biol. 2005 ● Dense connectivity ● Sparse connectivity ● Learning done offline ● Online learning ● Back-propagation ● STDP, SRDP, Reward (requires labeled data) (biological evidence) ● MNIST 99.79%, ImageNet 95% ● MNIST 99.08%, ImageNet N/A ● What about unlabeled data ● Cont. learning & detection ● Adaptable for input change or customization? ● Full computation on each layer ● Sparse spiking, attention → high power → low power 2
Neuromorphic Core with On-Chip STDP fully functional 20X retention mode ● Under STDP learning, when neuron K spikes, all Slim neuron Base design variant 2.05mm synapses on row K and column K may update 64K 64K synapse synapse array array ● Transposable SRAM: single-cycle read & write in 2.05mm both row and col. directions 4-b synapse Low leakage ● Efficient pre- and post-synaptic update variant variant 256K 64K ● synapse synapse Near threshold operation array array ● Pattern recognition Seo, CICC, 2011 3
Versatile Learning in Neuromorphic Core Various STDP Learning Rules (Feldman, Neuron 2012) pre-synaptic post-synaptic neurons neurons cnt. N1_0 N3_0 cnt. LTP LTD wa0 wb0 cnt. N1_1 N3_1 cnt. wa1 wb1 N2 wa2 wb2 cnt. N1_2 N3_2 cnt. wb3 wa3 spike When N3 sp cnt. N1_3 N3_3 cnt. wb* synaps when Δ w Δ w subject to L N3 LTP: spikes w = w + [pre cnt.] + [post cnt.] Δ t Δ t LTD: w = w – [post cnt.] – [pre cnt.] Multi-factor Triplet-STDP ● A versatile neurosynaptic core to support various learning rules, large fan-in/-out, sparse connectivity ● Triplet STDP ( Pfister, J. of Neuroscience, 2006, Gjorgjieva, PNAS 2011 ) ● post-pre-post: post nrn. spike & pre nrn. timing & post nrn. timing ● pre-post-pre: pre nrn. spike & post nrn. timing & pre nrn. Timing 4
Feedforward Excitation & Inhibition Layer (i+1) neurons Axons w/ timing info. Synapses: TX => Inhib. Inhibition Synapse Array decoder Layer (i) neurons 1024x256 spike packet connection recurrent Synapses Inh. nrn Inh. => RX neuron 256 spike timing spike Neurons info. packet [1] Diehl, Front. of Neuroscience, 2015 ● Joint feed-forward excitation and inhibition ● For a small number of inhibitory neurons, add pre=>inh, inh=>post synapses ● Balance excitatory & inhibitory synaptic inputs 5 Vogels, Science, 2011
Neural Spike Sorting Processor (for deep brain sensing & stimulation) Detection & Sorting Input: Clustering Output Alignment Processor Raw Signal neuromorphic ● Signals from invasive electrodes: spikes from multiple neurons ● Online, unsupervised neuromorphic spike-sorting processor Collaboration with Columbia University (ISLPED 2015) I 1 H 1 I 2 ● Weight update through STDP I 3 H 2 Z 1 ● Start with K=2, automatically Encoder increases # of output neurons if ... H 3 I 8 bits ... the spike difference is large enough (self-organized map) Z K 32 samples I m H N 6
Exp. Results: Clustering Accuracy Receptive field of dataset that contains 4 clusters in 3000 spikes 100 Proposed. Avg acc.= 91% Osort based. Avg acc.=69% 100 26 W/ch Input Neurons Frequency(MHz) 10 Output 80 Neurons 70 spikes/s/neuron ([4]) Accuracy(%) 60 1 2.5 spikes/s/neuron 9.3 W/ch Others 40 Synapse (D2, D3, D4, D4*) 0.1 Array 20 Decoder Output 0.01 0 Neurons 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 D2 D3 D4 D4* W-D1 W-D2 VDD (V) Dataset • 65nm GP, high-Vth, 0.5x0.5mm 2 Spike sorting accuracy more • 9.3µW/ch at 0.3V reliable than other low-complexity algorithms such as O-sort • Layout of the design is dominated by memory Avg. accuracy: 91% vs. 69% elements, as well as power. 7
Neuromorphic Computing w/ NVMs ● Emerging NVMs (e.g. RRAM) could alleviate power/area bottleneck of conv. memories ● Read rows in parallel: weighted sum current ● Peripheral CMOS read: current-to-digital converter 0.53 130nm Voltage (V) V in V in 0.50 RRAM array + 1.5 RE CMOS read circuits RE (under testing) V spike V spike 0.0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Time (ns) Simulation results for 4ns read timing window 8
Summary ● Neuromorphic computing hardware ● 45nm testchip with on-chip STDP learning ● Versatile learning neuromorphic core & architecture ● 65nm spike clustering processor ● Emerging NVM arrays + peripheral read/write circuits ● Future research with circuit-device-architecture co- design and optimization 9
Collaborators ● ASU ● Faculty: Yu Cao, Shimeng Yu, Chaitali Chakrabarti, Sarma Vrudhula, Visar Berisha ● Students: Minkyu Kim, Deepak Kadetotad, Shihui Yin, Abinash Mohanty, Yufei Ma ● Intel: Gregory Chen, Ram Krishnamurthy ● Columbia University: Mingoo Seok, Qi Wang 10
Recommend
More recommend