why decoding
play

Why decoding? Understanding the neural code. Neural Decoding Given - PowerPoint PPT Presentation

Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the stimulus? What aspects of the stimulus does the system encode? (capacity is limited) Mark van Rossum What information can be extracted from spike trains:


  1. Why decoding? Understanding the neural code. Neural Decoding Given spikes, what was the stimulus? What aspects of the stimulus does the system encode? (capacity is limited) Mark van Rossum What information can be extracted from spike trains: By “downstream” areas? Homunculus. School of Informatics, University of Edinburgh By the experimenter? Ideal observer analysis. What is the coding quality? January 2012 Design of neural prosthetic devices Related to encoding, but encoding does not answer above questions explicitly. 0 Acknowledgements: Chris Williams and slides from Gatsby Liam Paninski. Version: January 31, 2018 1 / 63 2 / 63 Decoding examples Overview Stimulus reconstruction (single spiking neuron, dynamic stimuli) 1 Hippocampal place cells: how is location encoded? Spike train discrination (spike based) 2 Retinal ganglion cells: what information is sent to the brain? What Stimulus discrimination (single neuron, rate based, static stimulus 3 is discarded? s = { s a , s b } ) Motor cortex: how can we extract as much information as possible Population decoding (multiple neurons, rate based, static stimulus 4 from a collection of M1 cells? s ∈ R ) Dynamic population decoding ( s ( t ) ∈ R ) 5 3 / 63 4 / 63

  2. 1. Spike train decoding Acausal Minimization Let r ( t ) = � δ ( t − t i ) Dayan and Abbott §3.4, Rieke Chap 2 and Appendix Mimimizing squared error (similar to Wiener kernels) gives implicit Estimate the stimulus from spike times t i to minimize e.g. equation for optimal K � s ( t ) − s est ( t ) � 2 � ∞ First order reconstruction: d τ ′ Q rr ( τ − τ ′ ) K ( τ ′ ) = Q rs ( τ − τ 0 ) � � −∞ s est ( t − τ 0 ) = K ( t − t i ) − � r � d τ K ( τ ) where t i � T The second term ensures that � s est ( t ) � = 0 1 Q rr ( τ − τ ′ ) dt � ( r ( t − τ ) − � r � )( r ( t − τ ′ ) − � r � ) � = Delay τ 0 can be included to make decoding easier: predict T 0 stimulus at time t − τ 0 based on spikes up to time t (see causal Q rs ( τ − τ 0 ) = � r � C ( τ 0 − τ ) decoding below) � where C ( τ ) = � 1 i s ( t i − τ ) � is STA from encoding slides. n 5 / 63 6 / 63 Quality of reconstruction Or use Fourier space ˜ Q rs ( ω ) exp( i ωτ 0 ) [Gabbiani and Koch, 1998][non-causal kernel] ˜ K ( ω ) = ˜ Q rr ( ω ) Note, one can design the stimulus (e.g. Gaussian white noise), but one can not design the response r ( t ) . If Q rr ( τ ) ≈ � r � δ ( τ ) (tends to happen at low rates, hence not very relevant) then K is the STA, so decoder equals encoder � n � � K ( τ ) = 1 s ( t i + τ − τ 0 ) � n � i = 1 Define reconstruction quality as : γ = 1 − [ � ( s est − s ) 2 � ] 1 / 2 . σ s An I&F transmits more information than Poisson (cf. encoding). Note, for constant Poisson process Q rr ( τ ) ≈ � r � δ ( τ ) 7 / 63 8 / 63

  3. Causal decoding H1 neuron of the fly Solid line is Organism faces causal (on-line) decoding problem. reconstruction Prediction of the current/future stimulus requires temporal using acausal correlation of the stimulus. Example: in head-direction system filter neural code correlates best with future direction. Note, Requires K ( t − t i ) = 0 for t ≤ t i . reconstruction � � quality will s est ( t − τ 0 ) = K ( t − t i ) − � r � d τ K ( τ ) depend on t i stimulus Delay τ 0 buys extra time [Dayan and Abbott (2001) after Rieke et al (1997)] 9 / 63 10 / 63 Causal decoding Causality Finding optimal kernel while imposing causality analytically is harder. Hope that K ( τ ) = 0 for τ < 0 and τ 0 sufficiently large. Wiener-Hopf method (spectral factorization) Expand K ( τ ) using a causal basis Use discrete formulation Delay τ 0 = 160 ms. (C: full (non-causal) kernel) [Dayan and Abbott (2001)] At time t estimate s ( t − τ 0 ) : Spikes 1..4: contribute because stimulus is correlated (right tail of K) Spikes 5..7: contribute because of τ 0 Spikes 8, 9,... : have not occurred yet. 11 / 63 12 / 63

  4. Higher order reconstruction Conclusion stimulus reconstruction Build a library of spike patterns (up to triplets). Measure mean and covariance of P ( s |{ t 0 , t 1 , .. } ) . Reconstruct with weighted sum of means, §A6 [Rieke et al., 1996] Stimulus reconstruction similar to encoding problem. But Response is given, can not be choosen to be white Imposing causality adds realism but reduces quality The reconstruction problem can be ill-posed. It is not always possible to reconstruct stimulus (cf dictionary). For instance: complex cell. Still, the cell provides information about the stimulus. Could try to read the code, rather than reconstruct the stimulus (e.g. ideal observer) 13 / 63 14 / 63 2. Spike train discrimination Spike distances Simpler algorithm: Given two spike trains. How similar are they, or how they compare to f ( t ) = � Convolve (filter) with a exponential ˜ template? t i < t exp( − ( t − t i ) / t c ) and calculate L 2 distance Problem: very high dimensional space. � T D 2 = 1 dt [˜ g ( t )] 2 f ( t ) − ˜ t c 0 Cricket auditory neuron in response to 2 songs, 5 repeats/song [Machens et al., 2003] ’Edit distance’: two processes [Victor and Purpura, 1997] Deleting/inserting a spike costs 1 Moving a spike costs 1 2 [ 1 − exp( −| δ t | /τ )] , with parameter τ . Similar to coherence of between trains [van Rossum, 2001] 15 / 63 16 / 63

  5. Spike distances Application to cricket auditory neurons: Play songs repeatedly and discriminate [Machens et al., 2003] Using spike distance to measure intrinsic noise: Optimal discrimination when τ similar to neural integration time 17 / 63 18 / 63 3. Stimulus Discrimination SNR and ROC curves Discriminate between response distributions P ( r 1 ) and P ( r 2 ) . ROC: vary decision threshold and measure error rates. Larger area under curve means better discriminability Shape relates to underlying distributions. Dayan and Abbott §3.2 p ( s | r ) , where r is response across neurons and/or time In general s can be continuous, e.g. speed First, discrimination i.e. distinguishing between two (or more) alternatives (e.g. stimulus or no stimulus) For now no time-dependent problems. For Gaussian distributed responses define single number [ � r 1 � − � r 2 � ] 2 SNR = 2 var ( r 1 ) + var ( r 2 ) |� r 1 �−� r 2 �| Note, SNR = 2 sd ( r 1 )+ sd ( r 2 ) is also used, neither is principled when var ( r 1 ) � = var ( r 2 ) . 19 / 63 20 / 63

  6. Readout of a single MT neuron Readout of Object Identity from Macaque IT Cortex [Hung et al., 2005] Recording from ∼ 300 sites in the Inferior Temporal (IT) cortex Present images of 77 stimuli (of different objects) at various locations and scales in the visual field. Task is to categorize objects into 8 classes, or identify all 77 objects [Britten et al., 1992] Predictions based on one-vs-rest linear SVM classifiers, using Some single neurons do as well as animal! data in 50 ms bins from 100 ms to 300 ms after stimulus onset Possibility for averaging might be limited due to correlation? Population might still be faster [Cohen and Newsome, 2009] 21 / 63 22 / 63 What does this tell us? Performance of such classifiers provides a lower bound on the information available in the population activity If neurons were measured independently (paper is unclear), correlations are ignored. Correlation could limit or enhance information... Distributed representation Linear classifier can plausibly be implemented in neural hardware [Hung et al., 2005] 23 / 63 24 / 63

  7. Visual system decoding: independence 4. Population Encoding [Abbott et al., 1996] Face cells, rate integrated over 500ms, extrapolated to large #stimuli. Extract face identity from population response. Dayan and Abbott §3.3 Population encoding uses a large number of neurons to represent information Advantage 1: reduction of uncertainty due to neuronal variability (Improves reaction time). Advantage 2: Ability to represent a number of different stimulus attributes simultaneously (e.g. in V1 location and orientation). Coding is almost independent! (for these small ensembles) 25 / 63 26 / 63 Cricket Cercal System Let c a denote a unit vector in the direction of s a , and v be a unit vector parallel to the wind velocity � f ( s ) � = [ v · c a ] + r max Crickets are Cartesian, 4 directions 45 ◦ , 135 ◦ , − 135 ◦ , − 45 ◦ Population vector is defined as � r � [Dayan and Abbott (2001) after Theunissen and Miller (1991)] 4 � At low velocities, information about wind direction is encoded by just v pop = c a r max four interneurons � f ( s ) � a a = 1 = [cos( s − s a )] + r max Note, rate coding assumed. 27 / 63 28 / 63

  8. Primary Motor Cortex (M1) Vector method of decoding Certain neurons in M1 of the monkey can be described by cosine functions of arm movement direction (Georgopoulos et al, 1982) Similar to cricket cercal system, but note: Non-zero offset rates r 0 � f ( s ) − r 0 � = v · c a r max Non-orthogonal: there are many thousands of M1 neurons that have arm-movement-related tuning curves [Dayan and Abbott (2001) after Salinas and Abbott (1994)] 29 / 63 30 / 63 Optimal Decoding Calculate p ( s | r ) = p ( r | s ) p ( s ) p ( r ) Maximum likelihood decoding (ML): ˆ s = argmax s p ( r | s ) Maximum a posteriori (MAP): ˆ s = argmax s p ( s ) p ( r | s ) Bayes: mimimize loss � s B = argmin s ∗ L ( s , s ∗ ) p ( s | r ) ds s For squared loss L ( s , s ∗ ) = ( s − s ∗ ) 2 , optimal s ∗ is posterior � mean, s B = s p ( s | r ) s . [Dayan and Abbott (2001) after Kandel et al (1991)] 31 / 63 32 / 63

Recommend


More recommend