neural encoding
play

Neural Encoding Matthias Hennig based on material by Mark van - PowerPoint PPT Presentation

Neural Encoding Matthias Hennig based on material by Mark van Rossum School of Informatics, University of Edinburgh January 2019 1 / 47 From stimulus to behaviour Motor Brain output Sensory input 2 / 47 3 / 47 The brain as a computer


  1. Neural Encoding Matthias Hennig based on material by Mark van Rossum School of Informatics, University of Edinburgh January 2019 1 / 47

  2. From stimulus to behaviour Motor Brain output Sensory input 2 / 47

  3. 3 / 47

  4. The brain as a computer Motor Brain output Sensory input Information processing to extract features and generate outputs Statistical inference Physical implementation irrelevant, possible to replicate in silico ? 4 / 47

  5. The neural code Motor output Sensory input Encoding: Prediction of neural response to a given stimulus: P ( R | S ) Decoding: Given response, what was the stimulus: P ( S | R ) Prosthetics: given firing pattern, what will be the motor output: P ( M | R ) 5 / 47

  6. Understanding the neural code is like building a dictionary. Translate from outside world (sensory stimulus or motor action) to internal neural representation Translate from neural representation to outside world Like in real dictionaries, there are both one-to-many and many-to-one entries in the dictionary 6 / 47

  7. Encoding: Stimulus-response relation Predict response R to stimulus S . Black box approach. This is a supervised learning problem, but: Stimulus S can be synaptic input or sensory stimulus. Responses are noisy and unreliable: Use probabilities. Typically many input (and sometimes output) dimensions Reponses are non-linear 1 Assume non-linearity is weak. Make series expansion? Or, impose a parametric non-linear model with few parameters Need to assume causality and stationarity (system remains the same). This excludes adaptation! 1 Linear means: r ( α s 1 + β s 2 ) = α r ( s 1 ) + β r ( s 2 ) for all α , β . 7 / 47

  8. Response: Spikes and rates Response consists of spikes. Spikes are (largely) stochastic. Compute rates by trial-to-trial average and hope that system is stationary and noise is really noise. Often, we try to predict R , rather than predict the spikes. 8 / 47

  9. Paradigm: Early Visual Pathways [Figure: Dayan and Abbott, 2001, after Nicholls et al, 1992] 9 / 47

  10. Retinal/LGN cell response types On-centre off-surround Off-centre on-surround 10 / 47

  11. Mach bands 11 / 47

  12. V1 cell response types (Hubel & Wiesel) Odd Even Simple cells, modelled by Gabor functions Also complex cells, and spatio-temporal receptive fields Higher areas Other pathways (e.g. auditory) 12 / 47

  13. Not all cells are so simple... Intermediate sensory areas (eg. IT) have face selective neurons. In the limbic system, neurons appear even more specialised [Quiroga et al., 2005]. 13 / 47

  14. Not all cells are so simple... In higher areas the receptive field (RF) is not purely sensory. Example: pre-frontal cells that are task dependent [Wallis et al., 2001] 14 / 47

  15. Model complexity realism Biophysical models linear Hodgkin Huxley Gaussian tractability To study neural encoding, we need a model. There is an inevitable trade-off between realism and complexity. Simple models: normative theories Detailed models: how implemented in the brain 15 / 47

  16. From stimulus to response Response r (spikes/s) 20 15 10 5 0 2 4 6 8 Stimulus s What is the correct P ( R | S , θ ) , where θ is a model parameter? Strategy: Maximise the likelihood P ( R | S , θ ) 16 / 47

  17. General linear model (GLM) Response r (spikes/s) 20 15 10 5 0 2 4 6 8 Stimulus s We assume a Poisson model. For N trials, we write the likelihood N � P ( R | S , θ ) = P ( r i | s i , θ ) i = 1 N 1 � r i !( θ s i ) r i e − θ s i = i = 1 17 / 47

  18. Model likelihood 1e 51 ) 8 Likelihood P ( R | S , 6 4 2 0 0 1 2 3 4 N � P ( R | S , θ ) = P ( r i | s i , θ ) i = 1 N 1 � r i !( θ s i ) r i e − θ s i = i = 1 has a maximum close to 2. 18 / 47

  19. log-likelihood 1e 51 ) ) 8 200 Log-likelihood log P ( R | S , Likelihood P ( R | S , 6 400 600 4 800 2 1000 0 0 1 2 3 4 0 1 2 3 4 In practice, we use the logarithm N � log P ( R | S , θ ) = log P ( r i | s i , θ ) i = 1 N � = r i log θ − θ s i + C i Terms in C does not depend on θ , so can be ignored. 19 / 47

  20. log-likelihood ) 200 Log-likelihood log P ( R | S , 400 600 800 1000 0 1 2 3 4 To find the maximum, differentiate: ∂ log P ( R | S , θ ) ∂θ 20 / 47

  21. log-likelihood ) 200 Log-likelihood log P ( R | S , 400 600 800 1000 0 1 2 3 4 Find the maximum: N � log P ( R | S , θ ) = r i log θ − θ s i + C i ∂ log P ( R | S , θ ) r i � � = θ − s i ∂θ i i 21 / 47

  22. log-likelihood ) 200 Log-likelihood log P ( R | S , 400 600 800 1000 0 1 2 3 4 Find the maximum: ∂ log P ( R | S , θ ) r i � � = θ − s i ∂θ i i � r i ˆ θ = � s i In this example I obtain ˆ θ = 1 . 92, close to the true value θ = 2. 22 / 47

  23. Remarks The predicted rate can be <0. In biology, unlike physics, there is no obvious small parameter that justifies neglecting higher orders. Rectification requires infinite orders, for instance. Check the accuracy of the approximation post hoc. Averaging and ergodicity � r � formally means an average over many realizations over the random variables of the system (both stimuli and internal state). This definition is good to remember when conceptual problems occur. An ergodic system visits all realizations if one waits long enough. That means one can measure from a system long enough, true averages can be obtained. This however requires stationarity, internal states are not allowed to change. 23 / 47

  24. A more realistic response Response r (spikes/s) 20 15 10 5 0 10 5 0 5 Stimulus s 24 / 47

  25. A more realistic response Response r (spikes/s) 20 15 10 5 0 10 5 0 5 Stimulus s This requires a non-linear transformation r ( s ) ∼ Poisson ( f ( θ s )) . 25 / 47

  26. Neural responses depend on the stimulus history Introducing a linear temporal kernel k ( t ) with r ( t ) = Poisson ( f ( � dt ′ s ( t ′ ) k ( t − t ′ )) ) . 26 / 47

  27. Poisson Generalised Linear Model (also GLM!) [Pillow et al., 2005] r ( t ) = Poisson ( f ( � dt ′ s ( t ′ ) k ( t − t ′ ) ) Linear: spatial and temporal filter kernel k Non-linear function giving output spike probability: rectification, saturation Poisson spikes p spike ( t ) = λ ( t ) (noisy) 27 / 47

  28. Fitting a linear model 4 Stimulus 2 0 2 Spike rate (Hz) 7.5 5.0 2.5 0.0 0 20 40 60 80 100 Time bin 1 1 6 0.3 2 Measured rate 0 = * 0.2 0 4 k 2 k 0 0.1 2 3 1 0.0 1 0 Response T oeplitz Kernel 0 5 10 15 2 1 0 1 2 Time bin Predicted rate (lin. Gauss) matrix � dt ′ s ( t ′ ) k ( t − t ′ )) r ( t ) = Gaussian ( This has closed form MLE: ˆ k = ( S T S ) − 1 S T R Data comes from model with exponential nonlinearity. The model recovers the kernel well, but cannot predict the rates. 28 / 47

  29. Spike triggered average (STA) Spike times t i , r ( t ) = � δ ( t − t i ) 1 1 g 1 ( τ ) = σ 2 � r ( t ) s ( t − τ ) � = � t i s ( t i − τ ) σ 2 29 / 47

  30. Linear models for spiking neurons Application on H1 neuron [Rieke et al., 1996]. Prediction (solid), and actual firing rate (dashed). Prediction captures the slow modulations, but not faster structure. This is often the case. 30 / 47

  31. Fitting a non-linear model exp(Gauss) k 0.3 GLM Measured rate 4 k lin 0.2 k GLM k 2 0.1 0.0 0 0 2 4 0 5 10 15 Time bin Predicted rate Poisson GLM log-likelihood has no closed form MLE: � � log P ( R | S , θ ) = r i log f ( k ∗ s i ) − f ( k ∗ s i ) i i Use numerical minimisation of the neg. log-likelihood (scipy.optimize.fmin or fminsearch in Matlab) This recovers the kernel and rates correctly. 31 / 47

  32. Fitting non-linear models Poisson GLM log-likelihood: � � log P ( R | S , θ ) = r i log f ( k ∗ s i ) − f ( k ∗ s i ) i i Bernoulli GLM log-likelihood: � � log P ( R | S , θ ) = r i log f ( k ∗ s i ) + ( 1 − r i ) log( 1 − f ( k ∗ s i )) i i For f ( x ) = 1 / ( 1 + exp( − x )) , this is logistic regression. When f is convex ( log ( f ) is concave) in parameters, e.g. f ( x ) = [ x ] + , or f ( x ) = exp( x ) , then log L is concave, hence a global maximum exists. 32 / 47

  33. Regularization ✸ ❊ ❢ ✭ ① ✮ ✷ ✶ ✈� ✁✂✄ � ☎✂✆✝ ✵ ☎ t� ✂✝✂✝✞ ✲ ✶ ✵ ✶ ✷ ✸ ① ♠ ✆✄ ✟ ✁ ✠ ✆ ♠✡ ✁ ✟☛ ✂☎ ☞ Figure: Over-fitting: Left: The stars are the data points. Although the dashed line might fit the data better, it is over-fitted. It is likely to perform worse on new data. Instead the solid line appears a more reasonable model. Right: When you over-fit, the error on the training data decreases, but the error on new data increases. Ideally both errors are minimal. 33 / 47

  34. Regularization 0.006 unreg regul 0.004 0.002 STA 0 -0.002 -0.004 0 20 40 60 80 100 Time Fits with many parameters/short data typically require regularization to prevent over-fitting Regularization: punish fluctuations (smooth prior, ridge regression) ˆ k = ( S T S + λ I ) − 1 S T r Regulariser λ has to be set by hand 34 / 47

  35. Poisson GLM results [Chichilnisky, 2001] Colors are the kernels for the different RGB channels 35 / 47

Recommend


More recommend