Overcomplete models & Lateral interactions and Feedback Teppo - PowerPoint PPT Presentation

Overcomplete models & Lateral interactions and Feedback Teppo Niinimäki April 22, 2010

Contents Overcomplete models 1 Overcomplete basis Energy based models Lateral interaction and feedback 2 Feedback and Bayesian inference End-stopping Predictive coding Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 2 / 26

Motivation So far Sparse coding models: feature detector weights orthogonal Generative models: A invertible ⇒ square matrix ⇒ no. of features ≤ no. of dimensions in data ≤ no. of pixels Why more features? processing location independent ⇒ same set of features for every location no. of simple cells in V1 ≫ no. of retinal gaglion cells ( ≈ 25 times) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 4 / 26

Overcomplete basis: Generative model Generative model: m ∑ I ( x , y ) = A i ( x , y ) s i i = 1 basis vectors: A i features: s i no. of features: m > | I | (or m > dimension of data) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 5 / 26

Overcomplete basis: Generative model Generative model: m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) i = 1 basis vectors: A i features: s i no. of features: m > | I | (or m > dimension of data) Gaussian noise: N ( x , y ) ⇒ simplifies computations Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 5 / 26

Overcomplete basis: Computation of features m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) i = 1 How to compute the coefficients s i for I ? A not invertible more unknowns than equations ⇒ many (infinite number of) different solutions Find the sparsest solution (most s i are close to 0): assume sparse distribution for s i find the most probable values for s i Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 6 / 26

Overcomplete basis: Computation of features Aim: Find s which maximizes p ( s | I ) . By Bayes’ rule we get p ( s | I ) = p ( I | s ) p ( s ) p ( I ) Ignore constant p ( I ) and maximize logarithm instead: log p ( s | I ) = log p ( I | s )+ log p ( s )+ const. For prior distribution p ( s ) assume sparsity and independence ⇒ m ∑ log p ( s ) = G ( s i ) i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 7 / 26

Overcomplete basis: Computation of features m ∑ I ( x , y ) = A i ( x , y ) s i + N ( x , y ) Next compute log p ( I | s ) . i = 1 log p ( s | I ) = log p ( I | s )+ log p ( s )+ const. Probability of I ( x , y ) given s is Gaussian pdf of m ∑ N ( x , y ) = I ( x , y ) − A i ( x , y ) s i . i = 1 Insert above into � � 1 − 1 2 σ 2 N ( x , y ) 2 p ( N ( x , y )) = √ exp 2 π to get � 2 � m log p ( I ( x , y ) | s ) = − 1 − 1 ∑ I ( x , y ) − A i ( x , y ) s i 2 log2 π . 2 σ 2 i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 8 / 26

Overcomplete basis: Computation of features Because the noise is independent in pixels, we can sum over x , y to get the pdf for whole image � 2 � m log p ( I | s ) = − 1 − n 2 σ 2 ∑ ∑ I ( x , y ) − A i ( x , y ) s i 2 log2 π . x , y i = 1 Combining above: Find s that maximizes � 2 � m m log p ( s | I ) = − 1 2 σ 2 ∑ ∑ ∑ I ( x , y ) − A i ( x , y ) s i + G ( s i )+ const. x , y i = 1 i = 1 ⇒ numerical optimization ⇒ non-linear cell activities s i How about learning A i ? Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 9 / 26

Overcomplete basis: Basis estimation Assume flat prior for the A i ⇒ above p ( s | I ) is actually p ( s , A | I ) . Maximize the probability (likelihood) of A i over independent image samples I 1 , I 2 ,..., I 3 : � 2 � T T m 1 ∑ t = 1 ∑ ∑ ∑ log p ( s ( t ) , A | I t ) = − I t ( x , y ) − A i ( x , y ) s i 2 σ 2 t = 1 x , y i = 1 T m ∑ ∑ + G ( s i ( t ))+ const. t = 1 i = 1 At the same time we compute basis vectors A i cell outputs s i ( t ) . Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 10 / 26

Energy based models Another approach: no generative model instead relax ICA to add more linear feature detectors W i ⇒ not basis, but overcomplete representation In ICA we maximized: m T G i ( v T ∑ ∑ log L ( v 1 ,..., v m ; z 1 ,..., z T ) = T log | det ( V ) | + i z t ) i = 1 t = 1 Recall z t ∼ I t , v i ∼ W i , m = n and G i ( u ) = log p i ( u ) . If m > n then log | det ( V ) | is not defined. Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 11 / 26

Energy based models: estimation Actually log | det ( V ) | is a normalization constant. Replace it and instead maximize: m T ∑ ∑ G i ( v T log L ( v 1 ,..., v m ; z 1 ,..., z T ) = − T log | Z ( V ) | + i z t ) i = 1 t = 1 where n Z ∏ exp ( G i ( v T Z ( V ) = i z )) d z . i = 1 Above integral extremely difficult to evaluate. However it can be estimated or the model can be estimated directly: score matching and contrastive divergence Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 12 / 26

Energy based models: results Estimated overcomplete representation with energy based model G i ( u ) = α i logcosh ( u ) score matching patches of 16 x 16 = 256 pixels preprocessing ⇒ n = 128 m = 512 receptive fields (Fig 13.1: Random sample of W i .) Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 13 / 26

Motivation So far considered ”bottom-up” or feedforward frameworks In reality there are also ”top-down” connections ⇒ feedback lateral (horizontal) interactions How to model them too? ⇒ using Bayesian inference! Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 15 / 26

Feedback as Bayesian inference: contour integrator Why feedback connections? to enhance responses consistent with the broader visual context to reduce noise (activity inconsistent with the model) ⇒ combine bottom-up sensory information with top-down priors Example: contour cells and complex cells Define generative model: K ∑ c k = a ki s i + n k i = 1 where n k is Gaussian noise. Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 16 / 26

Feedback as Bayesian inference: contour integrator K ∑ c k = a ki s i + n k i = 1 Now we just model the feedback! First calculate s for given image: compute c normally (feedforward) 1 find s = ˆ s that maximizes log p ( s | c ) 2 ⇒ should be non-linear in c (why?) Then reconstruct complex cell outputs using the linear generative model, but ignoring the noise: K ∑ c k = ˆ a ki ˆ s i i = 1 � ∑ K � (for instance by sending feedback signal u ki = i = 1 a ki ˆ − c k ) s i Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 17 / 26

Feedback as Bayesian inference: contour integrator example Example results: left: patches with random Gabor functions (three collienar in upper case) middle: c k right: ˆ c k (based on contour-coding unit activities s i ) ⇒ noise reduction empasizes collinear (Fig. 14.1) activations but suppresses others Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 18 / 26

Feedback as Bayesian inference: higher-order activities How to estimate higher order activities ˆ s = argmax s p ( s | c ) ? Like before, using Bayes’ rule we get log p ( s | c ) = log p ( c | s )+ log p ( s )+ const. Again we assume that log p ( s ) is sparse. Analogously to overcomplete basis: � 2 � K m m log p ( s | c ) = − 1 ∑ ∑ ∑ c k − a ki s i + G ( s i )+ const. 2 σ 2 k = 1 i = 1 i = 1 Next assume A is invertible and orthogonal ⇒ multiplying c − As by A T in above square sum � c − As � we get � A T c − s � without changing the norm: � 2 � m K m log p ( s | c ) = − 1 ∑ ∑ ∑ a ki c k − s i + G ( s i )+ const. 2 σ 2 i = 1 k = 1 i = 1 Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 19 / 26

Feedback as Bayesian inference: higher-order activities Maximize separately each: � 2 � K log p ( s i | c ) = − 1 ∑ a ki c k − s i + G ( s i )+ const. 2 σ 2 k = 1 Maximum point can be represented as � � K ∑ s i = f ˆ a ki c k k = 1 where f depends on G = log p ( s i ) . √ 2 σ 2 , 0 ) . ex. for Laplacian distribution f ( y ) = sign ( y ) max ( | y |− Teppo Niinimäki () Overcomplete models&Lateral interactions and Feedback April 22, 2010 20 / 26

Overcomplete models & Lateral interactions and Feedback Teppo - PowerPoint PPT Presentation

Overcomplete models & Lateral interactions and Feedback Teppo Niinimki April 22, 2010 Contents Overcomplete models 1 Overcomplete basis Energy based models Lateral interaction and feedback 2 Feedback and Bayesian inference

Learning sparsely used overcomplete dictionaries Alekh Agarwal Microsoft Research Joint work

Lateral Violence Presentation 1) What is lateral violence? brainstorm Provide definition

Learning Overcomplete Latent Variable Models through Tensor Methods Anima Anandkumar UC Irvine

Learning Overcomplete Latent Variable Models through Tensor Methods Majid Janzamin UC Irvine

Preliminary Hearing Polk County Ditch 80 Improvements; and Establishment of Lateral 1 and Lateral

Understanding Lateral Tunneling Understanding Lateral Tunneling Accelerometer and The

Lateral Stability and Tail Sizing Lecture 11 ME EN 415 Andrew Ning aning@byu.edu Lateral

Kagome / Netafim Twin Lateral Trial Kagome / Netafim Twin Lateral Trial The Problem

EMERGING POLICIES: EMERGING POLICIES: LATERAL REPLACEMENTS ON LATERAL REPLACEMENTS ON STREET

Vegetation Lateral Selection Methodology Comparison (1) Lateral selection vs. (2) Substation

Background Large amounts of reciprocal connectivity between cortical layers Lateral

Sparse Overcomplete, Shift- and Transform-Invariant Representations Class 15. 14 Oct 2009

Baby Got Feedback: How to Give and Take Feedback Like A Boss Sarah Hagan @thesarahhagan Sarah

Sew er Lateral Insurance Repair Program: Rates By: Ben von Harz Council Workshop September

Lateral Ankle Anatomy 2 1 7/13/2020 3 The Calcaneofibular . Ligament 4 2 7/13/2020

RLatSM Reduced Lateral Separation T rial NATS PRIVATE 24.8.15 Introduction Slide pack to give

Dr. R. S. Wadbude Associate Professor Let U and V be two vector spaces over the same field F. A

Evaluation of HDR Coding Pipelines Maryam Azimi 1 , Ronan Boitard 1 , Mahsa Pourazad 1,2 , and

30/05/2013 Stochastic [Spectral] Methods in the Context of Hydrocarbon Reservoir History Matching

Range queries Fenwick trees Yaseen Mowzer 2nd IOI Training Camp 2017 (4 February 2017)

To Correctness through Proof Dale Miller (Team Leader) and Kaustuv Chaudhuri, Jo elle

Group classification of Schr odinger equations C elestin Kurujyibwami Department of Applied

Face2F ace2Face: ace: Real-time Face Capture and Reenactment of RGB-Videos Justus Thies 1 ,

Inequality and Financial Literacy Annamaria Lusardi The George Washington School of Business