Information Theory Slides Jonathan Pillow
Barlow’s “Efficient Coding Hypothesis”
Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity
Barlow 1961 Efficient Coding Hypothesis: Atick & Redlich 1990 • goal of nervous system: maximize information about environment (one of the core “big ideas” in theoretical neuroscience) mutual information redundancy: channel capacity mutual information: • avg # yes/no questions you can answer about x given y (“bits”) • entropy: response entropy “noise” entropy channel capacity: • upper bound on mutual information • determined by physical properties of encoder
Barlow 1961 Barlow’s original version: Atick & Redlich 1990 mutual information redundancy: mutual information: if responses are noiseless response entropy “noise” entropy
Barlow 1961 Barlow’s original version: Atick & Redlich 1990 response entropy redundancy: mutual information: noiseless system response entropy “noise” entropy brain should maximize response entropy • use full dynamic range • decorrelate (“reduce redundancy”) • mega impact: huge number of theory and experimental papers focused on decorrelation / information-maximizing codes in the brain
basic intuition natural image nearby pixels exhibit strong dependencies neural representation pixels desired neural response i+1 encoding pixel i+1 neural response i pixel i
Application Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete encoding Q: what solution for infomax? Gaussian prior
Application Example: single neuron encoding stimuli from a distribution P(x) stimulus prior noiseless, discrete encoding Q: what solution for infomax? Gaussian prior A: histogram-equalization infomax cdf
Laughlin 1981: blowfly light response • first major validation of Barlow’s theory cdf of light level response data
Atick & Redlich 1990 - extended theory to noisy responses luminance-dependent receptive fields High SNR (“whitening” / decorrelating) weighting Middle SNR (partial whitening) Low SNR (averaging / correlating) space
estimating entropy and MI from data
(Strong et al 1998) 1. the “direct method” repeated stimulus • fix bin size Δ raster • fix word length N 001 eg, Δ =10ms,N=3 samples from 010 2 3 =8 possible words 010 110 i.e., from histogram-based estimate of estimate ... probabilities p(R|S j ), then H = - ∑ P log P (Strong et al 1998)
(Strong et al 1998) 1. the “direct method” repeated stimulus • fix bin size Δ raster • fix word length N 001 eg, Δ =10ms,N=3 samples from 010 2 3 =8 possible words 010 110 estimate ... average over all blocks of size N Estimate is: all words
(Brenner et al 2000) 2. “single-spike information” repeated stimulus raster psth Information per spike: • equal to the information carried by an inhomogeneous Poisson process mean rate
derivation of single-spike information mean rate normalized psth entropy of p(t sp |stim) entropy of Unif([0 T])
derivation of single-spike information mean rate normalized psth entropy of p(t sp |stim) entropy of Unif([0 T])
3. decoding-based methods So far we have focused on the formulation: Decoding-based approaches focus on the alternative version:
3. decoding-based methods Suppose we have decoder to estimate the stimulus from spikes: (e.g., MAP, or Optimal Linear Estimator): Stimulus Response Decoder Data Processing Inequality: Bound #1 Covariance of residual errors entropy of a Gaussian with cov = cov(residual errors) (Maximum Entropy distribution with this covariance) Bound #2
Recommend
More recommend