population coding
play

Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby - PowerPoint PPT Presentation

Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010 Coding so far ... Time-series for both spikes and stimuli Empirical estimate function(s)


  1. Population Coding Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2010

  2. Coding so far ... • Time-series for both spikes and stimuli • Empirical — estimate function(s) given measured stimuli or movements and spikes

  3. Population codes • High dimensionality (cells × stimulus × time). – usually limited to simple rate codes. – even prosthetic work assumes instantaneous (lagged) coding • Limited empirical data – can record 10s - 100s of neurons. – population size more like 10 4 - 10 6 . – theoretical inferences, based on single-cell and aggregate (fMRI, LFP , optical) measurements .

  4. Common approach The most common sort of questions asked of population codes: • given assumed encoding functions, how well can we (or downstream areas) de- code the encoded stimulus value? • what encoding schemes would be optimal, in the sense of allowing decoders to estimate stimulus values as well as possible. Before considering populations, we need to formulate some ideas about rate coding in the context of single cells.

  5. Rate coding In the rate coding context, we imagine that the firing rate of a cell r represents a single (possibly multidimensional) stimulus value s at any one time: r = f ( s ) . Even if s and r are embedded in time-series we assume: 1. that coding is instantaneous (with a fixed lag), 2. that r (and therefore s ) is constant over a short time ∆ . The actual number of spikes n produced in ∆ is then taken to be distributed around r ∆ , often according to a Poisson distribution.

  6. Tuning curves The function f ( s ) is known as a tuning curve. Commonly assumed forms: � � − 1 2 σ 2 ( x − x pref ) 2 • Gaussian r 0 + r max exp • Cosine r 0 + r max cos ( θ − θ pref ) � � � − 1 2 σ 2 ( θ − θ pref − 2 π n ) 2 • Wrapped Gaussian r 0 + r max exp n � � • von Mises (“circular Gaussian”) r 0 + r max exp κ cos ( θ − θ pref )

  7. Measuring the performance of rate codes: Discrete choice Suppose we want to make a binary choice based on firing rate: • present / absent (signal detection) • up / down • horizontal / vertical Call one potential stimulus s 0 , the other s 1 . P ( n | s ) : probability density P(n|s 0 ) P(n|s 1 ) response

  8. ROC curves probability density P(n|s 0 ) P(n|s 1 ) response 1 0.9 0.8 0.7 0.6 hit rate 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 false alarm rate

  9. ROC curves probability density P(n|s 0 ) P(n|s 1 ) response 1 0.9 0.8 0.7 0.6 hit rate 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 false alarm rate

  10. Summary measures • area under the ROC curve – given n 1 ∼ P ( n | s 1 ) and n 0 ∼ P ( n | s 0 ) , this equals P ( n 1 > n 0 ) • discriminability d ′ – for equal variance Gaussians d ′ = µ 1 − µ 0 . σ – for any threshold d ′ = Φ − 1 (1 − FA ) − Φ − 1 (1 − HR ) where Φ is a standard normal cdf. – definition unclear for non-Gaussian distributions.

  11. Continuous estimation Now consider a (one dimensional) stimulus that takes any real value (or angle). • contrast • orientation • motion direction • movement speed Consider a neuron that fires n spikes in response to a stimulus s , according P ( n | f ( s ) ∆ ) Given n we attempt to estimate s . How well can we do?

  12. Continuous estimation Useful to consider a limit given N → ∞ measurements n i all generated by the same stimulus s ∗ . Then the posterior over s is � log P ( s |{ n i } ) = log P ( n i | s ) + log P ( s ) − log Z ( { n i } ) i and so taking N → ∞ � � 1 n | s ∗ + 0 − log Z ( s ∗ ) N log P ( s |{ n i } ) → log P ( n | s ) and so P ( s |{ n i } ) → e N � log P ( n | s ) � n | s ∗ / Z = e − N KL [ P ( n | s ∗ ) � P ( n | s ) ] / Z

  13. Continuous estimation Now, Taylor expand the KL divergence in s around s ∗ : � � P ( n | s ∗ ) � P ( n | s ) KL � � � � log P ( n | s ∗ ) = − log P ( n | s ) n | s ∗ + n | s ∗ � d log P ( n | s ) � � d 2 log P ( n | s ) � � � � � s ∗ − 1 � � log P ( n | s ∗ ) n | s ∗ − ( s − s ∗ ) 2( s − s ∗ ) 2 � � = − s ∗ + . . . ds 2 ds s ∗ s ∗ � � log P ( n | s ∗ ) + n | s ∗ � d 2 log P ( n | s ) � � = − 1 � 2( s − s ∗ ) 2 � s ∗ + . . . ds 2 s ∗ = 1 2( s − s ∗ ) 2 J ( s ∗ ) + . . . So in asymptopia, the posterior → N ( s ∗ , 1 / J ( s ∗ )) . J ( s ∗ ) is called the Fisher Information . � d 2 log P ( n | s ) � �� d log P ( n | s ) � 2 � � � � � J ( s ∗ ) = − � � s ∗ = ds 2 ds s ∗ s ∗ s ∗ (You will show that these are identical in the homework.)

  14. Cram´ er-Rao bound The Fisher Information is important even outside the large data limit due to a deeper result that is due to Cram´ er and Rao. This states that for any N , any unbiased estimator ˆ s ( { n i } ) of s will have the property that � s ( { n i } ) − s ∗ ) 2 � 1 (ˆ n i | s ∗ ≥ J ( s ∗ ) . Thus, Fisher Information gives a lower bound on the variance of any unbiased es- timator. This is called the Cram´ er-Rao bound. (There is also a version for biased estimators). The Fisher Information will be our primary tool to quantify the performance of a population code.

  15. Fisher Info and tuning curves n = r ∆ + noise ; r = f ( s ) ⇒ �� d � 2 � � � J ( s ∗ ) = � s ∗ log P ( n | s ) ds s ∗ �� d � 2 � � � f ( s ∗ ) log P ( n | r ∆ ) ∆ f ′ ( s ∗ ) � = dr ∆ s ∗ = J noise ( r ∆ ) ∆ 2 f ′ ( s ∗ ) 2 f(s) J(s) firing rate / Fisher info s

  16. Fisher info for Poisson neurons For Poisson neurons P ( n | f ( s )) = e − f ( s ) f ( s ) n n ! so �� d � 2 � � � J noise [ f ( s ∗ )] = � f ( s ∗ ) log P ( n | f ( s )) d f s ∗ �� d � 2 � � � � = f ( s ∗ ) − f ( s ) + n log f ( s ) − log n ! d f s ∗ �� � 2 � − 1 + n / f ( s ∗ ) = s ∗ � ( n − f ( s ∗ )) 2 � = f ( s ∗ ) 2 s ∗ = f ( s ∗ ) 1 [ not surprising! � f ( s ∗ ) 2 = f ( s ) = n ] f ( s ∗ ) and J [ s ∗ ] = f ′ ( s ∗ ) 2 / f ( s ∗ )

  17. Cooperative coding Scalar coding Labelled Line firing rate firing rate s s Distributed encoding firing rate firing rate s s

  18. Cooperative coding All of these are found in biological systems. Issues: 1. redundancy and robustness (not scalar) 2. efficiency (not labelled line) 3. local computation (not scalar or distributed) 4. multiple values (not scalar)

  19. Coding in multiple dimensions Cartesian Distributed s 2 s 2 s 1 s 1 • efficient • represent multiple values • problems with multiple values • may require more neurons

  20. Cricket cercal system c T 1 c 2 = 0 r a ( s ) = r max [cos( θ − θ a )] + = r max [ c T a v ] + c 3 = − c 1 a a c 4 = − c 2 r a = r a / r max So, writing ˜ : a � ˜ � � � c T r 1 − ˜ r 3 1 = v c T r 2 − ˜ ˜ r 4 2 � ˜ � � r 1 − ˜ r 3 v = ( c 1 c 2 ) r 1 c 1 − ˜ r 3 c 3 + ˜ r 2 c 2 − ˜ r 4 c 4 = r a c a = ˜ ˜ r 2 − ˜ ˜ r 4 a This is called population vector decoding.

  21. Motor cortex (simplified) Cosine tuning, randomly distributed preferred directions. In general, population vector decoding works for • cosine tuning • cartesian or dense ( tight ) directions But: • is it optimal? • does it generalise? (Gaussian tuning curves) • how accurate is it?

  22. Bayesian decoding Take n a ∼ Poisson [ f a ( s ) ∆ ] , independently for different cells. Then � e − f a ( s ) ∆ ( f a ( s ) ∆ ) n a P ( n | s ) = n a ! a and � log P ( s | n ) = − f a ( s ) ∆ + n a log ( f a ( s ) ∆ ) − log n a ! + log P ( s ) a Assume � a f a ( s ) is independent of s for a homogeneous population, and prior is flat. � ds log P ( s | n ) = d d n a log ( f a ( s ) ∆ ) ds a � n a f a ( s ) ∆ f ′ = a ( s ) ∆ a

  23. Bayesian decoding Now, consider f a ( s ) = e − ( s − s a ) 2 / 2 σ 2 , so f ′ a ( s ) = − ( s − s a ) /σ 2 e − ( s − s a ) 2 / 2 σ 2 and set the derivative to 0: � n a ( s − s a ) /σ 2 = 0 a � a n a s a � s MAP = ˆ a n a So the MAP estimate is a population average of preferred directions. Not exactly a population vector.

  24. Population Fisher Info Fisher Informations for independent random variates add: � � − d 2 J n ( s ) = ds 2 log P ( n | s ) � � � − d 2 = log P ( n a | s ) ds 2 a � � � � − d 2 = ds 2 log P ( n a | s ) = J n a ( s ) . a a � f ′ a ( s ) 2 = ∆ f a ( s ) a

  25. Optimal tuning properties A considerable amount of work has been done in recent years on finding optimal properties of tuning curves for rate-based population codes. Here, we reproduce one such argument (from Zhang and Sejnowski, 1999). Consider a population of cells that codes the value of a D dimensional stimulus, s . Let the a th cell emit r spikes in an interval τ with probability distribution that is conditionally independent of the other cells (given s ) and has the form P a ( r | s , τ ) = S ( r , f a ( s ) , τ ) . The tuning curve of the a th cell, f a ( s ) , has the form � D � ( ξ a ) 2 � i = s i − c a ( ξ a ) 2 = f a ( s ) = F · φ ( ξ a i ) 2 ; ξ a i ; , σ i where F is a maximal rate and the function φ is monotically decreasing. The param- eters c a and σ give the centre of the a th tuning curve and the (common) width.

Recommend


More recommend