overview
play

Overview Understanding the neural code Neural Encoding Encoding: - PowerPoint PPT Presentation

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to a given stimulus Decoding (homunculus): Given response, what was the stimulus? Mark van Rossum Given firing pattern, what will be the motor


  1. Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to a given stimulus Decoding (homunculus): Given response, what was the stimulus? Mark van Rossum Given firing pattern, what will be the motor output? (Important for prosthesis) School of Informatics, University of Edinburgh Measuring information rates Books: January 2015 [Rieke et al., 1996] (a very good book on these issues), [Dayan and Abbott, 2002] (chapters 2 and 3) [Schetzen, 2006], [Schetzen, 1981] (review paper on method) 1 / 58 2 / 58 The neural code Encoding: Stimulus-response relation Predict response r to stimulus s . Black box approach. This is a supervised learning problem, but: Understanding the neural code is like building a dictionary. Stimulus s can be synaptic input or sensory stimulus. Translate from outside world (sensory stimulus or motor action) to Responses are noisy and unreliable: Use probabilities. internal neural representation Typically many input (and sometimes output) dimensions Translate from neural representation to outside world Reponses are non-linear 1 Like in real dictionaries, there are both one-to-many and Assume non-linearity is weak. Make series expansion? many-to-one entries in the dictionary (think of examples) Or, impose a parametric non-linear model with few parameters Need to assume causality and stationarity (system remains the same). This excludes adaptation! 1 Linear means: r ( α s 1 + β s 2 ) = α r ( s 1 ) + β r ( s 2 ) for all α , β . 3 / 58 4 / 58

  2. Response: Spikes and rates Paradigm: Early Visual Pathways Response consists of spikes. Spikes are (largely) stochastic. Compute rates by trial-to-trial average and hope that system is stationary and noise is really noise. Initially, we try to predict r , rather than predict the spikes. (Note, there are methods to estimate most accurate histogram from data). [Figure: Dayan and Abbott, 2001, after Nicholls et al, 1992] 5 / 58 6 / 58 Retinal/LGN cell response types V1 cell response types (Hubel & Wiesel) Odd Even On-centre off-surround Off-centre on-surround Simple cells, modelled by Gabor functions Also complex cells, and spatio-temporal receptive fields Higher areas Also colour opponent cells Other pathways (e.g. auditory) 7 / 58 8 / 58

  3. Not all cells are so simple... Not all cells are so simple... The methods work well under limited conditions and for early sensory systems. But intermediate sensory areas (eg. IT) do things like face recognition. Very non-linear; hard with these methods. In even higher areas the receptive field (RF) is not purely sensory. Example: pre-frontal cells that are task dependent [Wallis et al., 2001] 9 / 58 10 / 58 Overview Simple example A thermometer: temperature T gives response r = g ( T ) , r measures cm mercury g ( T ) is monotonic, g − 1 ( r ) probably exists Could be somewhat non-linear Volterra and Wiener expansions Could in principle be noisy. Spike-triggered average & covariance Will not indicate instantaneous T , but its recent history. �� dt ′ T ( t ′ ) k ( t − t ′ ) � Linear-nonlinear-Poisson (LNP) models For example r ( t ) = g Integrate & fire and Generalized Linear models k is called a (filter) kernel. The argument of g () is a convolution dt ′ T ( t ′ ) k ( t − t ′ ) � T ⋆ k ≡ Networks Note, if k ( t ) = δ ( t ) then r ( t ) = g ( T ( t )) 11 / 58 12 / 58

  4. Volterra Kernels Noise and power spectra Inspiration from Taylor expansion: 2 r ′′ ( 0 ) s 2 + . . . = r 0 + r 1 s + 1 2 r 2 s 2 + . . . r ( s ) = r ( 0 ) + r ′ ( 0 ) s + 1 At each timestep draw an independent sample from a zero-mean But include temporal response (Taylor series with memory) Gaussian � s ( t 1 ) � = 0, � s ( t 1 ) . . . s ( t 2 k + 1 ) � = 0 r ( t ) = h 0 � s ( t 1 ) s ( t 2 ) � = C ( t 1 − t 2 ) = σ 2 δ ( t 1 − t 2 ) , � ∞ + d τ 1 h 1 ( τ 1 ) s ( t − τ 1 ) � s ( t 1 ) s ( t 2 ) s ( t 3 ) s ( t 4 ) � = σ 4 [ δ ( t 1 − t 2 ) δ ( t 3 − t 4 ) + δ ( t 1 − t 3 ) δ ( t 2 − t 4 ) + δ ( t 1 − t 4 ) δ ( t 2 − t 3 )] 0 � ∞ � ∞ + d τ 1 d τ 2 h 2 ( τ 1 , τ 2 ) s ( t − τ 1 ) s ( t − τ 2 ) The noise is called white because in the Fourier domain all 0 0 frequencies are equally strong. + . . . h 3 ( τ 1 , τ 2 , τ 3 ) . . . The powerspectrum of the signal and the autocorrelation are directly related via Wiener-Kinchin theorem. Note, h 2 ( τ 1 , τ 2 ) = h 2 ( τ 2 , τ 1 ) Hope that � ∞ S ( f ) = 4 C ( τ ) cos( 2 π f τ ) lim τ i →∞ h k ( τ j ) = 0 0 h k is smooth, h k small for large k . 13 / 58 14 / 58 Wiener Kernels Estimating Wiener Kernels To find kernels, stimulate with Gaussian white noise Wiener kernels are a rearrangement of the Volterra expansion, used g 0 = � r � when s ( t ) is Gaussian white noise with � s ( t 1 ) s ( t 2 ) � = σ 2 δ ( t 1 − t 2 ) 1 0th and 1st order Wiener kernels are identical to Volterra g 1 ( τ ) = σ 2 � r ( t ) s ( t − τ ) � (correlation) 1 r ( t ) = g 0 g 2 ( τ 1 , τ 2 ) = 2 σ 4 � r ( t ) s ( t − τ 1 ) s ( t − τ 2 ) � ( τ 1 � = τ 2 ) � ∞ + d τ 1 g 1 ( τ 1 ) s ( t − τ 1 ) 0 In Wiener, but not Volterra, expansion successive terms are �� ∞ � ∞ independent. Including a quadratic term won’t affect the + d τ 1 d τ 2 g 2 ( τ 1 , τ 2 ) s ( t − τ 1 ) s ( t − τ 2 ) estimation of the linear term, etc. 0 0 � ∞ � − σ 2 Technical point [Schetzen, 1981] : Lower terms do enter in higher d τ 1 g 2 ( τ 1 , τ 1 ) + . . . (1) order correlations, e.g. 0 � r ( t ) s ( t − τ 1 ) s ( t − τ 2 ) � = 2 σ 4 g 2 ( τ 1 , τ 2 ) + σ 2 g 0 δ ( τ 1 − τ 2 ) The predicted rate is given by Eq.(1). 15 / 58 16 / 58

  5. Remarks Wiener Kernels in Discrete Time Model: L − 1 The predicted rate can be <0. � r ( n ∆ t ) = g 0 + g 1 i s (( n − i )∆ t ) + . . . In biology, unlike physics, there is no obvious small parameter that i = 0 justifies neglecting higher orders. Check the accuracy of the approximation post hoc. In discrete time this is just linear/polynomial regression Solve e.g. to minimize squared error, E = ( r − Sg ) T ( r − Sg ) . Averaging and ergodicity E.g. L = 3, g = ( g 0 , g 10 , g 11 , g 12 ) T and r = ( r 1 , r 2 , . . . , r n ) T � x � formally means an average over many realizations over the random variables of the system (both stimuli and internal state).   1 s 1 s 0 s − 1 This definition is good to remember when conceptual problems 1 s 2 s 1 s 0   occur. S =  . .  . .   . . An ergodic system visits all realizations if one waits long enough.   1 s n s n − 1 s n − 2 That means one can measure from a system repeatedly and get the true average. S is a n × ( 1 + L ) matrix (’design matrix’) 17 / 58 18 / 58 Linear case: Fourier domain The least-squares solution ˆ g for any stimulus (differentiate E wrt. g ): g = ( S T S ) − 1 S T r ˆ Convolution becomes simple multiplication in Fourier domain Assume the neuron is purely linear ( g j = 0, j > 1 ), Note that on average for Gaussian noise � S T S � ij = n δ ij ( σ 2 + ( 1 − σ 2 ) δ i 1 ) otherwise Fourier representation is not helpful r ( t ) = r 0 + s ∗ g 1 After substitution we obtain � s ( t ) r ( t + τ ) � = � sr 0 � + � s ( t ) g 1 ∗ s � n 1 � ˆ Now g 1 ( ω ) = � rs � ( ω ) g 0 = r i = � r � n � ss � ( ω ) i = 1 For Gaussian white noise � ss � ( ω ) = σ 2 (note, that � s � = 0) n 1 1 s i − j r i = 1 � ˆ g 1 j = σ 2 corr ( s , r ) 1 σ 2 So g 1 ( ω ) = σ 2 � rs � ( ω ) n i = 1 g 1 can be interpreted as impedance of the system Note parallel with continuous time equations. 19 / 58 20 / 58

  6. Regularization Regularization 0.006 unreg regul 0.004 E 3 f ( x ) 0.002 STA 2 0 1 validation -0.002 0 -0.004 0 20 40 60 80 100 training − 1 0 1 2 3 Time x model complexity Fits with many parameters typically require regularization to Figure: Over-fitting: Left: The stars are the data points. Although the dashed prevent over-fitting line might fit the data better, it is over-fitted. It is likely to perform worse on Regularization: punish fluctuations (smooth prior) new data. Instead the solid line appears a more reasonable model. Right: � rs � ( ω ) When you over-fit, the error on the training data decreases, but the error on Non-white stimulus, Fourier: g 1 ( ω ) = � ss � ( ω )+ λ (prevent division by new data increases. Ideally both errors are minimal. zero as ω → ∞ ) In time-domain: ˆ g = ( S T S + λ I ) − 1 S T r Set λ by hand 21 / 58 22 / 58 Spatio-temporal kernels Higher-order kernels Including higher orders leads to more accurate estimates. [Dayan and Abbott, 2002] Kernel can also be in spatio-temporal domain. This V1 kernel does not respond to static stimulus, but will respond to a moving grating ([Dayan and Abbott, 2002]§2.4 for more motion detectors) Chinchilla auditory system [Temchin et al., 2005] 23 / 58 24 / 58

Recommend


More recommend