Population codes Population Coding • High dimensionality (cells × stimulus × time). – usually limited to simple rate codes. – even prosthetic work assumes instantaneous (lagged) coding • Limited empirical data Peter Latham / Maneesh Sahani – can record 10s - 100s of neurons. – population size more like 10 4 - 10 6 . Gatsby Computational Neuroscience Unit – theoretical inferences, based on single-cell and aggregate (fMRI, LFP , optical) University College London measurements . Term 1, Autumn 2013 Common approach Rate coding In the rate coding context, we imagine that the firing rate of a cell r represents a The most common sort of questions asked of population codes: single (possibly multidimensional) stimulus value s at any one time: • given assumed encoding functions, how well can we (or downstream areas) de- r = f ( s ) . code the encoded stimulus value? • what encoding schemes would be optimal, in the sense of allowing decoders to estimate stimulus values as well as possible. Even if s and r are embedded in time-series we assume: 1. that coding is instantaneous (with a fixed lag), 2. that r (and therefore s ) is constant over a short time ∆ . Before considering populations, we need to formulate some ideas about rate coding The actual number of spikes n produced in ∆ is then taken to be distributed around in the context of single cells. r ∆ , often according to a Poisson distribution.
Tuning curves Measuring the performance of rate codes: Discrete choice Suppose we want to make a binary choice based on firing rate: The function f ( s ) is known as a tuning curve. • present / absent (signal detection) Commonly assumed forms: • up / down � � • horizontal / vertical − 1 2 σ 2 ( x − x pref ) 2 • Gaussian r 0 + r max exp Call one potential stimulus s 0 , the other s 1 . P ( n | s ) : • Cosine r 0 + r max cos ( θ − θ pref ) � � probability density � − 1 2 σ 2 ( θ − θ pref − 2 π n ) 2 • Wrapped Gaussian r 0 + r max exp P(n|s 0 ) P(n|s 1 ) n � � • von Mises (“circular Gaussian”) r 0 + r max exp κ cos ( θ − θ pref ) response ROC curves ROC curves probability density probability density P(n|s 0 ) P(n|s 1 ) P(n|s 0 ) P(n|s 1 ) response response 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 hit rate hit rate 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 false alarm rate false alarm rate
Summary measures Continuous estimation Now consider a (one dimensional) stimulus that takes on continuous values (e.g. angle). • contrast • area under the ROC curve • orientation – given n 1 ∼ P ( n | s 1 ) and n 0 ∼ P ( n | s 0 ) , this equals P ( n 1 > n 0 ) • motion direction • movement speed • discriminability d ′ – for equal variance Gaussians d ′ = µ 1 − µ 0 . σ – for any threshold d ′ = Φ − 1 (1 − FA ) − Φ − 1 (1 − HR ) where Φ is a standard normal Suppose a neuron fires n spikes in response to stimulus s according to some distri- cdf. bution – definition unclear for non-Gaussian distributions. P ( n | f ( s ) ∆ ) Given an observation of n , how well can we estimate s ? Continuous estimation Continuous estimation Now, Taylor expand the KL divergence in s around s ∗ : Useful to consider a limit given N → ∞ measurements n i all generated by the same � � stimulus s ∗ . P ( n | s ∗ ) � P ( n | s ) KL � � � � The posterior over s is log P ( n | s ∗ ) = − log P ( n | s ) n | s ∗ + n | s ∗ � d log P ( n | s ) � � d 2 log P ( n | s ) � � � � � � � s ∗ − 1 � log P ( s |{ n i } ) = log P ( n i | s ) + log P ( s ) − log Z ( { n i } ) log P ( n | s ∗ ) n | s ∗ − ( s − s ∗ ) 2( s − s ∗ ) 2 = − � � s ∗ + . . . ds ds 2 s ∗ s ∗ i � � log P ( n | s ∗ ) + Taking N → ∞ we have n | s ∗ � d 2 log P ( n | s ) � � = − 1 � 2( s − s ∗ ) 2 � � � s ∗ + . . . 1 n | s ∗ + 0 − log Z ( s ∗ ) ds 2 s ∗ N log P ( s |{ n i } ) → log P ( n | s ) = 1 2( s − s ∗ ) 2 J ( s ∗ ) + . . . and so P ( s |{ n i } ) → e N � log P ( n | s ) � n | s ∗ / Z So in asymptopia, the posterior → N ( s ∗ , 1 / J ( s ∗ )) . J ( s ∗ ) is called the Fisher Information . = e − N KL [ P ( n | s ∗ ) � P ( n | s ) ] / Z � d 2 log P ( n | s ) � �� d log P ( n | s ) � 2 � � � � � J ( s ∗ ) = − � s ∗ = � ds 2 ds s ∗ s ∗ s ∗ (Note: Z is being redefined as we go, but never depends on s ) (You will show that these are identical in the homework.)
Cram´ er-Rao bound Fisher Info and tuning curves n = r ∆ + noise ; r = f ( s ) ⇒ �� d � 2 � � � The Fisher Information is important even outside the large data limit due to a deeper J ( s ∗ ) = � s ∗ log P ( n | s ) ds result that is due to Cram´ er and Rao. s ∗ �� d � 2 � � � f ( s ∗ ) log P ( n | r ∆ ) ∆ f ′ ( s ∗ ) � = dr ∆ This states that for any N , any unbiased estimator ˆ s ( { n i } ) of s will have the property s ∗ that = J noise ( r ∆ ) ∆ 2 f ′ ( s ∗ ) 2 f(s) � s ( { n i } ) − s ∗ ) 2 � 1 J(s) (ˆ n i | s ∗ ≥ J ( s ∗ ) . firing rate / Fisher info Thus, Fisher Information gives a lower bound on the variance of any unbiased esti- mator. This is called the Cram´ er-Rao bound. � s ( { n i } ) − s ∗ ) 2 � n i | s ∗ ≥ (1 + b ′ ( s ∗ )) 2 [For estimators with bias b ( s ∗ ) = � ˆ s ( { n i } ) − s ∗ � the bound is + b 2 ( s ∗ ) ] (ˆ J ( s ∗ ) The Fisher Information will be our primary tool to quantify the performance of a population code. s Fisher info for Poisson neurons Coding a continuous variable For Poisson neurons Scalar coding Labelled Line P ( n | r ∆ ) = e − r ∆ ( r ∆ ) n n ! so �� d � 2 � � firing rate firing rate � J noise [ r ∆ ] = � r ∗ ∆ log P ( n | r ∆ ) dr ∆ �� d s ∗ � 2 � � � � = r ∗ ∆ − r ∆ + n log r ∆ − log n ! dr ∆ s s s ∗ �� � 2 � Distributed encoding − 1 + n / r ∗ ∆ = s ∗ � ( n − r ∗ ∆ ) 2 � = ( r ∗ ∆ ) 2 s ∗ r ∗ ∆ ( r ∗ ∆ ) 2 = 1 firing rate firing rate [ not surprising! � r ∗ ∆ = n and V ar [ n ] = r ∗ ∆ ] = r ∗ ∆ and, referred back to the stimulus value: J [ s ∗ ] = f ′ ( s ∗ ) 2 ∆ / f ( s ∗ ) s s
Coding a continuous variable Coding in multiple dimensions Cartesian Multi-D distributed All of these schemes have been found in biological systems. Issues: s 2 s 2 1. redundancy and robustness (not scalar) 2. efficiency/resolution (not labelled line) 3. local computation (not scalar or scalar distributed) 4. multiple values (not scalar) s 1 s 1 • efficient • represent multiple values • problems with multiple values • may require more neurons Cricket cercal system Motor cortex (simplified) c T 1 c 2 = 0 Cosine tuning, randomly distributed preferred directions. r a ( s ) = r max [cos( θ − θ a )] + = r max [ c T a v ] + c 3 = − c 1 a a In general, population vector decoding works for c 4 = − c 2 • cosine tuning r a = r a / r max So, writing ˜ : a � ˜ � � � • cartesian or dense ( tight ) directions c T r 1 − ˜ r 3 1 = v c T But: r 2 − ˜ ˜ r 4 � ˜ 2 � � r 1 − ˜ r 3 • is it optimal? v = ( c 1 c 2 ) r 1 c 1 − ˜ r 3 c 3 + ˜ r 2 c 2 − ˜ r 4 c 4 = r a c a = ˜ ˜ ˜ r 2 − ˜ r 4 • does it generalise? (Gaussian tuning curves) a • how accurate is it? This is called population vector decoding.
Bayesian decoding Bayesian decoding Take n a ∼ Poisson [ f a ( s ) ∆ ] , independently for different cells. Then � e − f a ( s ) ∆ ( f a ( s ) ∆ ) n a P ( n | s ) = n a ! Now, consider f a ( s ) = e − ( s − s a ) 2 / 2 σ 2 , so f ′ a ( s ) = − ( s − s a ) /σ 2 e − ( s − s a ) 2 / 2 σ 2 a and set the derivative to 0: and � � n a ( s − s a ) /σ 2 = 0 log P ( s | n ) = − f a ( s ) ∆ + n a log ( f a ( s ) ∆ ) − log n a ! + log P ( s ) a � a a n a s a � Assume � s MAP = ˆ a f a ( s ) is independent of s for a homogeneous population, and prior is a n a flat. So the MAP estimate is a population average of preferred directions. Not exactly a � ds log P ( s | n ) = d d population vector. n a log ( f a ( s ) ∆ ) ds a � n a f a ( s ) ∆ f ′ = a ( s ) ∆ a Population Fisher Info Optimal tuning properties A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce Fisher Informations for independent random variates add: one such argument (from Zhang and Sejnowski, 1999). � � − d 2 Consider a population of cells that codes the value of a D dimensional stimulus, ds 2 log P ( n | s ) J n ( s ) = s . Let the a th cell emit r spikes in an interval τ with probability distribution that is � � conditionally independent of the other cells (given s ) and has the form � − d 2 = log P ( n a | s ) P a ( r | s , τ ) = S ( r , f a ( s ) , τ ) . ds 2 a � � � � − d 2 The tuning curve of the a th cell, f a ( s ) , has the form = ds 2 log P ( n a | s ) = J n a ( s ) . a a � D � � ( ξ a ) 2 � i = s i − c a f ′ a ( s ) 2 ( ξ a ) 2 = f a ( s ) = F · φ ( ξ a i ) 2 ; ξ a i ; , = ∆ [for Poisson cells] σ f a ( s ) i a where F is a maximal rate and the function φ is monotically decreasing. The param- eters c a and σ give the centre of the a th tuning curve and the (common) width.
Recommend
More recommend