Neural Decoding Matthias Hennig School of Informatics, University of Edinburgh January 2019 0 Acknowledgements: Mark van Rossum, Chris Williams and slides from Gatsby Liam Paninski. 1 / 54
Decoding brain activity Classi ✁ cation Which one? Reconstruction: Homunculus 2 / 54
The Homunculus Flynculus [Rieke et al., 1996] 3 / 54
Overview Stimulus discrimination, signal detection theory 1 Maximum likelihood and MAP decoding 2 Bounds and Fisher information 3 Spike train decoding and GLMs 4 4 / 54
Why decoding? Understanding the neural code. Given spikes, what was the stimulus? What aspects of the stimulus does the system encode? (capacity is limited) What information can be extracted from spike trains: By “downstream” areas? Homunculus. By the experimenter? Ideal observer analysis. What is the coding quality? Design of neural prosthetic devices Related to encoding, but encoding does not answer above questions explicitly. 5 / 54
Decoding examples Hippocampal place cells: how is location encoded? Retinal ganglion cells: what information is sent to the brain? What is discarded? Motor cortex: how can we extract as much information as possible from a collection of M1 cells? 6 / 54
Decoding theory Probability of the stimulus, prior: P ( s ) Probability of a measured neural response: P ( r ) Joint probability of stimulus and response: P ( r , r ) Conditional probabilities: P ( r | s ) , P ( s | r ) Marginal: P ( r ) = � s P ( r | s ) P ( s ) Note: P ( r , s ) = P ( r | s ) P ( s ) = P ( s | r ) P ( r ) Bayes theorem: P ( s | r ) = P ( r | s ) P ( s ) P ( r ) Note that we need to know the stimulus statistics. 7 / 54
Example: Discrimination between two stimuli Test subject report left or right motion (2 alternative forced choice, 2AFC). See [Dayan and Abbott, 2002], chapter 3.2. 8 / 54
MT neurons in this task [Britten et al., 1992] Some single neurons do as well as animal! Possibility for averaging might be limited due to correlation? Population might still be better/faster? [Cohen and Newsome, 2009] 9 / 54
[Britten et al., 1992] Assuming rate histograms are Gaussian with equal variance σ 2 , the discriminability is d ′ = < r > + − < r > − σ 10 / 54
Discriminate between response distributions P ( r − ) and P ( r + ) . (directions + and − ), and discrimination threshold z on firing rate: Hit rate: β ( z ) = P ( r ≥ z | +) False alarm rate: α ( z ) = P ( r ≥ z |− ) stimulus correct False + β 1 − β - 1 − α α Probability of correct answer: ( β ( z ) + 1 − α ( z )) / 2 Can be used to find the optimal z . 11 / 54
ROC curves Discriminate between response distributions P ( r − ) and P ( r + ) . The Receiver Operating Characteristic (ROC) gives graphical intuition: Vary decision threshold and measure error rates. Larger area under curve means better discriminability. Shape relates to underlying distributions. 12 / 54
[Britten et al., 1992] � 1 P ( correct ) = β d α 0 When responses are Gaussian: � < r > − − < r > + � P ( correct ) = 1 2erfc 2 σ 13 / 54
Population decoding [Dayan and Abbott (2001) after Theunissen and Miller (1991)] Cricket Cercal System: Information about wind direction is encoded by four types of neurons � f ( s ) � = [cos( s − s a )] + r max 14 / 54
Let c a denote a unit vector in the direction of s a , and v be a unit vector parallel to the wind velocity � f ( s ) � = [ v · c a ] + r max Crickets are Cartesian, 4 directions 45 ◦ , 135 ◦ , − 135 ◦ , − 45 ◦ Population vector is defined as � r 4 � � v pop = c a r max a a = 1 15 / 54
Vector method of decoding [Dayan and Abbott (2001) after Salinas and Abbott (1994)] 16 / 54
Primary Motor Cortex (M1) Certain neurons in M1 of the monkey can be described by cosine functions of arm movement direction (Georgopoulos et al, 1982) Similar to cricket cercal system, but note: Non-zero offset rates r 0 � f ( s ) − r 0 � = v · c a r max Non-orthogonal: there are many thousands of M1 neurons that have arm-movement-related tuning curves 17 / 54
[Dayan and Abbott (2001) after Kandel et al (1991)] 18 / 54
Optimal Decoding p ( s | r ) = p ( r | s ) p ( s ) p ( r ) Maximum likelihood decoding (ML): ˆ s = argmax s p ( r | s ) Maximum a posteriori (MAP): ˆ s = argmax s p ( s ) p ( r | s ) Note these two are equivalent if p ( s ) is flat. Bayes: mimimize loss � s B = argmin s ∗ L ( s , s ∗ ) p ( s | r ) ds s For squared loss L ( s , s ∗ ) = ( s − s ∗ ) 2 , optimal s ∗ is posterior � mean, s B = s p ( s | r ) s . 19 / 54
Optimal Decoding for the cricket For the cercal system, assuming indep. noise � p ( r | s ) = p ( r a | s ) a where each p ( r a | s ) is modelled as a Gaussian with means and variances p ( s ) is uniform (hence MAP=ML) ML decoding finds a peak of the likelihood Bayesian method finds posterior mean These methods improve performance over the vector method (but not that much, due to orthogonality...) 20 / 54
Cricket Cercal System [Dayan and Abbott (2001) after Salinas and Abbott (1994)] 21 / 54
General Consideration of Population Decoding [Dayan and Abbott (2001)] Gaussian tuning curves. 22 / 54
Poisson firing model over time T , count n a = r a T spikes. N ( f a ( s ) T ) n a � p ( r | s ) = exp( − f a ( s ) T ) n a ! a = 1 N � log p ( r | s ) = n a log f a ( s ) + . . . a = 1 The terms in . . . are independent of s , and we assume � a f a ( s ) is independent of s (all neurons sum to the same average firing rate). 23 / 54
ML decoding s ML is stimulus that maximizes log p ( r | s ) , determined by N f ′ a ( s ML ) � r a f a ( s ML ) = 0 a = 1 If all tuning curves are Gaussian f a = A exp[ − ( s − s a ) 2 / 2 σ 2 w ] then � a r a s a s ML = � a r a which is simple and intuitive, known as Center of Mass (cf population vector) 24 / 54
Accuracy of the estimator Bias and variance of an estimator s est b est ( s ) = � s est � − s σ 2 � ( s est − � s est � ) 2 � est ( s ) = � ( s − s est ) 2 � b 2 est ( s ) + σ 2 = est Thus for an unbiased estimator, MSE � ( s − s est ) 2 � is given by σ 2 est , the variance of the estimator 25 / 54
Fisher information Fisher information is a measure of the curvature of the log likelihood near its peak − ∂ 2 log p ( r | s ) d r p ( r | s ) ∂ 2 log p ( r | s ) � � � I F ( s ) = = − ∂ s 2 ∂ s 2 s (the average is over trials measuring r while s is fixed) Cramér-Rao bound says that for any estimator [Cover and Thomas, 1991] est ≥ ( 1 + b ′ est ( s )) 2 σ 2 I F ( s ) est ( s )) 2 est = ( 1 + b ′ efficient estimator if σ 2 . I F ( s ) In the bias-free case an efficient estimator σ 2 est = 1 / I F ( s ) . ML decoder is typically efficient when N → ∞ . 26 / 54
Fisher information In homogeneous systems I F indep. of s . − ∂ 2 log p ( r | s ) � � More generally Fisher matrix ( I F ) ij ( s ) = s . ∂ s i ∂ s j Taylor expansion of Kullback-Leibler D KL ( P ( s ) , P ( s + δ s )) ≈ � ij δ s i δ s j ( I F ) ij Not a Shannon information measure (not in bits), but related to Shannon information in special cases,e.g. [Brunel and Nadal, 1998, Yarrow et al., 2012]. 27 / 54
Fisher information for a population For independent Poisson spikers − ∂ 2 log p ( r | s ) �� f ′ � � 2 − f ′′ � � a ( s ) a ( s ) � I F ( s ) = = T � r a � ∂ s 2 f a ( s ) f a ( s ) a For dense, symmetric tuning curves, the second term sums to zero. Using f a ( s ) = � r a � we obtain ( f ′ a ( s )) 2 � I F ( s ) = T f a ( s ) a For dense f a ( s ) = A e − ( s − s 0 + a . ds ) 2 / 2 σ 2 w with density ρ = 1 / ds , sum becomes integral √ I F = 2 π TA ρ/σ w 28 / 54
For Gaussian tuning curves [Dayan and Abbott (2001)] Note that Fisher information vanishes at peak as f ′ a ( s ) = 0 there. Can be used to create optimal tuning curves, [Dayan and Abbott, 2002] chapter 3.3. Discriminability d ′ = ∆ s � I F ( s ) for a small ∆ F . 29 / 54
FI predicts human performance [Dayan and Abbott (2001)] Orientation discrimination for stimuli of different size (different N ) Solid line: estimated minimum standard deviation at the Cramer Rao bound Triangles: human standard deviation as function of stimulus size (expressed in N ) 30 / 54
Slope as strategy From paper on bat echo location [Yovel et al., 2010] ) 31 / 54
Spike train decoding Dayan and Abbott §3.4 Estimate the stimulus from spike times t i to minimize e.g. � s ( t ) − s est ( t ) � 2 First order reconstruction: � � s est ( t − τ 0 ) = K ( t − t i ) − � r � d τ K ( τ ) t i The second term ensures that � s est ( t ) � = 0 Delay τ 0 can be included to make decoding easier: predict stimulus at time t − τ 0 based on spikes up to time t 32 / 54
Causal decoding Organism faces causal (on-line) decoding problem. Prediction of the current/future stimulus requires temporal correlation of the stimulus. Example: in head-direction system neural code correlates best with future direction. Requires K ( t − t i ) = 0 for t ≤ t i . � � s est ( t − τ 0 ) = K ( t − t i ) − � r � d τ K ( τ ) t i Delay τ 0 buys extra time 33 / 54
Recommend
More recommend