Overcomplete models & Lateral interactions and Feedback Teppo Niinimäki April 22, 2010
Contents Overcomplete models 1 Overcomplete basis Energy based models Lateral interaction and feedback 2 Feedback and Bayesian inference End-stopping Predictive coding Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 2 / 26
Contents Overcomplete models 1 Overcomplete basis Energy based models Lateral interaction and feedback 2 Feedback and Bayesian inference End-stopping Predictive coding Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 3 / 26
Motivation So far Sparse coding models: feature detector weights orthogonal Generative models: A invertible ⇒ square matrix ⇒ no. of features ≤ no. of dimensions in data ≤ no. of pixels Why more features? processing location independent ⇒ same set of features for every location no. of simple cells in V1 ≫ no. of retinal gaglion cells ( ≈ 25 times) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 4 / 26
Overcomplete basis: Generative model Generative model: m ∑ I ( x , y ) = A i ( x , y ) s i i = 1 basis vectors: A i features: s i no. of features: m > | I | (or m > dimension of data) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 5 / 26
Overcomplete basis: Generative model Generative model: m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) i = 1 basis vectors: A i features: s i no. of features: m > | I | (or m > dimension of data) Gaussian noise: N ( x , y ) ⇒ simplifies computations Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 5 / 26
Overcomplete basis: Computation of features m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) i = 1 How to compute the coefficients s i for I ? A not invertible more unknowns than equations ⇒ many (infinite number of) different solutions Find the sparsest solution (most s i are close to 0): assume sparse distribution for s i find the most probable values for s i Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 6 / 26
Overcomplete basis: Computation of features Aim: Find s which maximizes p ( s | I ) . By Bayes’ rule we get p ( s | I ) = p ( I | s ) p ( s ) p ( I ) Ignore constant p ( I ) and maximize logarithm instead: log p ( s | I ) = log p ( I | s )+ log p ( s )+ const. For prior distribution p ( s ) assume sparsity and independence ⇒ m ∑ log p ( s ) = G ( s i ) i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 7 / 26
Overcomplete basis: Computation of features m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) Next compute log p ( I | s ) . i = 1 log p ( s | I ) = log p ( I | s )+ log p ( s )+ const. Probability of I ( x , y ) given s is Gaussian pdf of m ∑ N ( x , y ) = I ( x , y ) − A i ( x , y ) s i . i = 1 Insert above into � � 1 − 1 2 σ 2 N ( x , y ) 2 p ( N ( x , y )) = √ exp 2 π to get � 2 � m log p ( I ( x , y ) | s ) = − 1 − 1 ∑ I ( x , y ) − A i ( x , y ) s i 2 log2 π . 2 σ 2 i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 8 / 26
Overcomplete basis: Computation of features Because the noise is independent in pixels, we can sum over x , y to get the pdf for whole image � 2 � m log p ( I | s ) = − 1 − n 2 σ 2 ∑ ∑ I ( x , y ) − A i ( x , y ) s i 2 log2 π . x , y i = 1 Combining above: Find s that maximizes � 2 � m m log p ( s | I ) = − 1 2 σ 2 ∑ ∑ ∑ I ( x , y ) − A i ( x , y ) s i + G ( s i )+ const. x , y i = 1 i = 1 ⇒ numerical optimization ⇒ non-linear cell activities s i How about learning A i ? Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 9 / 26
Overcomplete basis: Basis estimation Assume flat prior for the A i ⇒ above p ( s | I ) is actually p ( s , A | I ) . Maximize the probability (likelihood) of A i over independent image samples I 1 , I 2 ,..., I 3 : � 2 � T T m 1 ∑ t = 1 ∑ ∑ ∑ log p ( s ( t ) , A | I t ) = − I t ( x , y ) − A i ( x , y ) s i 2 σ 2 t = 1 x , y i = 1 T m ∑ ∑ + G ( s i ( t ))+ const. t = 1 i = 1 At the same time we compute basis vectors A i cell outputs s i ( t ) . Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 10 / 26
Energy based models Another approach: no generative model instead relax ICA to add more linear feature detectors W i ⇒ not basis, but overcomplete representation In ICA we maximized: m T G i ( v T ∑ ∑ log L ( v 1 ,..., v m ; z 1 ,..., z T ) = T log | det ( V ) | + i z t ) i = 1 t = 1 Recall z t ∼ I t , v i ∼ W i , m = n and G i ( u ) = log p i ( u ) . If m > n then log | det ( V ) | is not defined. Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 11 / 26
Energy based models: estimation Actually log | det ( V ) | is a normalization constant. Replace it and instead maximize: m T ∑ ∑ G i ( v T log L ( v 1 ,..., v m ; z 1 ,..., z T ) = − T log | Z ( V ) | + i z t ) i = 1 t = 1 where n Z ∏ exp ( G i ( v T Z ( V ) = i z )) d z . i = 1 Above integral extremely difficult to evaluate. However it can be estimated or the model can be estimated directly: score matching and contrastive divergence Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 12 / 26
Energy based models: results Estimated overcomplete representation with energy based model G i ( u ) = α i logcosh ( u ) score matching patches of 16 x 16 = 256 pixels preprocessing ⇒ n = 128 m = 512 receptive fields (Fig 13.1: Random sample of W i .) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 13 / 26
Contents Overcomplete models 1 Overcomplete basis Energy based models Lateral interaction and feedback 2 Feedback and Bayesian inference End-stopping Predictive coding Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 14 / 26
Motivation So far considered ”bottom-up” or feedforward frameworks In reality there are also ”top-down” connections ⇒ feedback lateral (horizontal) interactions How to model them too? ⇒ using Bayesian inference! Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 15 / 26
Feedback as Bayesian inference: contour integrator Why feedback connections? to enhance responses consistent with the broader visual context to reduce noise (activity inconsistent with the model) ⇒ combine bottom-up sensory information with top-down priors Example: contour cells and complex cells Define generative model: K ∑ c k = a ki s i + n k i = 1 where n k is Gaussian noise. Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 16 / 26
Feedback as Bayesian inference: contour integrator K ∑ c k = a ki s i + n k i = 1 Now we just model the feedback! First calculate s for given image: compute c normally (feedforward) 1 find s = ˆ s that maximizes log p ( s | c ) 2 ⇒ should be non-linear in c (why?) Then reconstruct complex cell outputs using the linear generative model, but ignoring the noise: K ∑ c k = ˆ a ki ˆ s i i = 1 � ∑ K � (for instance by sending feedback signal u ki = i = 1 a ki ˆ − c k ) s i Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 17 / 26
Feedback as Bayesian inference: contour integrator example Example results: left: patches with random Gabor functions (three collienar in upper case) middle: c k right: ˆ c k (based on contour-coding unit activities s i ) ⇒ noise reduction empasizes collinear (Fig. 14.1) activations but suppresses others Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 18 / 26
Feedback as Bayesian inference: higher-order activities How to estimate higher order activities ˆ s = argmax s p ( s | c ) ? Like before, using Bayes’ rule we get log p ( s | c ) = log p ( c | s )+ log p ( s )+ const. Again we assume that log p ( s ) is sparse. Analogously to overcomplete basis: � 2 � K m m log p ( s | c ) = − 1 ∑ ∑ ∑ c k − a ki s i + G ( s i )+ const. 2 σ 2 k = 1 i = 1 i = 1 Next assume A is invertible and orthogonal ⇒ multiplying c − As by A T in above square sum � c − As � we get � A T c − s � without changing the norm: � 2 � m K m log p ( s | c ) = − 1 ∑ ∑ ∑ a ki c k − s i + G ( s i )+ const. 2 σ 2 i = 1 k = 1 i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 19 / 26
Feedback as Bayesian inference: higher-order activities Maximize separately each: � 2 � K log p ( s i | c ) = − 1 ∑ a ki c k − s i + G ( s i )+ const. 2 σ 2 k = 1 Maximum point can be represented as � � K ∑ s i = f ˆ a ki c k k = 1 where f depends on G = log p ( s i ) . √ 2 σ 2 , 0 ) . ex. for Laplacian distribution f ( y ) = sign ( y ) max ( | y |− Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 20 / 26
Recommend
More recommend