The Center for Brains, Minds and Machines I-tutorial Learning of Invariant Representations in Sensory Cortex tomaso poggio CBMM McGovern Institute, BCS, LCSL, CSAIL MIT
I-theory Learning of Invariant Representations in Sensory Cortex 1.Intro and background 2.Mathematics of invariance 3.Biophysical mechanisms for tuning and pooling 4.Retina and V1: eccentricity dependent RFs; V2 and V4: pooling, crowding and clutter 5.IT: Class-specific approximate invariance and remarks 2
Class 24 Mon Dec 1 Learning Invariant Representations: Retina and V1: eccentricity dependent RFs; V2 and V4: pooling, crowding and clutter 3
Summary of previous class 4
Unsupervised ¡tuning ¡(during ¡development) ¡ and ¡eigenvectors ¡of ¡covariance ¡matrix Hebb synapses imply that the tuning of the neuron converges to the top eigenvector of the covariance matrix of the “frames” of the movie of objects transforming. The convergence follows the Oja flow Different cells are exposed (during development) to translations in different directions.
Linking ¡Conjecture • Predicts Gabor-like tuning of simple cells in V1 • Qualitatively predicts tuning in V2/V4 • Predicts/justifies mirror-symmetric tuning of cells in face patch AL
Properties of the Ventral Stream The ventral stream hierarchy: V1, V2, V4, IT A gradual increase in the receptive field size, in the complexity of the preferred stimulus, in tolerance to position and scale changes Kobatake & Tanaka, 1994
End Summary 8
Note: ¡we ¡focus ¡on ¡the ¡ sampling ¡layout ¡of ¡the ¡ retinal ¡ganglion ¡cells ¡ (RGCs) ¡-‑ ¡the ¡ outputs ¡of ¡ the ¡retina. (Also: ¡focusing ¡on ¡the ¡Parvo ¡ pathway, ¡ignoring ¡Magno.)
Receptive field size vs. eccentricity - HW Hubel and Wiesel, 1971
Scatter of receptive field sizes in V1 Schiller, P ., Finlay, B., Volman S. Quantitative Studies of Single Cells Properties in monkey striate cortex, 1976
Retina and V1: eccentricity dependent RFs - Inverted truncated pyramid - Fovea and foveola - Scale and position invariance
to have invariant representation Usual recipe: • memorize a set of images/objects called templates and for each template memorize observed transformations • to generate an invariant signature - compute dot products of transformations with image - compute histogram of the resulting values 13
s-x space: definitions
Geometry of scaling 15
The magic window for pooling over scale and x-y shifts 16
Sampling in the window 17
Sampling in the window 18
Magic window in V1 5 degree! total 40x40 units 25’ !!! total 40x40 units 19
“Prediction” of Anstis observation Anstis, 1974
V2 and V4: pooling, crowding and clutter - Why multilayer pooling - Decimating the array - Bouma’s law
We All Live In A Yellow Subroutine 11 Empfeh len 22
Hierarchies of magic HW modules: key property is covariance l=4 l=3 l=2 HW module l=1
24
Why V1, V2, V4, IT? • Compositionality: signatures for wholes and for parts of different size at different locations • Minimizing clutter e ff ects • Invariance for certain non-global affine transformations • Retina to V1 map
V2 and V4: pooling, crowding and clutter - Why multilayer pooling - Decimating the array - Bouma’s law
¡ ¡Top ¡module ∑ = signature ⋅ vector ⋅ Associative memory
V1 and V2… 28
Magic theory “predicts” eccentricity dependence of M in V1, V2, V4, IT (from ¡Freeman ¡& ¡Simoncelli ¡2011)
30
Predictions • Very small foveola ~25’ • Scale invariance more important than position invariance • Uniform scale invariance at “all” eccentricities • Shift invariance proportional to spatial frequency • Bouma’s law for crowding d= b x • Role of V2 (b=0.5)
Hierarchical network l=4 l=3 l=2 HW module l=1
Predictions • Very small foveola ~25’ • Scale invariance more important than position invariance • Uniform scale invariance at “all” eccentricities • Shift invariance proportional to spatial frequency • Crowding in the fovea d=2’40” in fovea • Bouma’s law for peripheral crowding d= b x (r ole of V2 b=0.5)
Predictions on crowding The predictions are: a. Consider a small target, such as a 5’ width letter, placed in the center of the fovea, activating the smallest simple cells at the bottom of the inverted pyramid. The smallest critical distance to avoid interference should be the size of a complex cell at the smallest scale, that is d=1’ 20” in V1 and d=2’40” in V2. If the letter is made larger, then the activation of the simple cells shifts to a larger scale and thus does the critical spacing which is proportional to the size of the target. It is remarkable that both these predictions match quite well Figure 10 in Levi and Carney, 2011. b. Usually the target is just large enough to be visible at that eccentricity (positive say). The critical separation for avoiding crowding outside the foveola is 12) d ~ b x since the RF size of the complex cells increases linearly with eccentricity, with depending on the cortical area responsible for the recognition signal. Thus the theory ``predicts'' Bouma's law , (Bouma, 1970) of crowding!
35
36
37
38
Collaborators (MIT-IIT, LCSL) in recent work F. Anselmi, J. Mutch , J. Leibo, L. Rosasco, A. Tacchetti, Q. Liao + + Evangelopoulos, Zhang, Voinea Also: ¡ ¡L. ¡Isik, ¡S. ¡Ullman, ¡S. ¡Smale, ¡ ¡C. ¡Tan, ¡M. ¡Riesenhuber, ¡T. ¡Serre, ¡G. ¡Kreiman, ¡S. ¡Chikkerur, ¡ A. ¡Wibisono, ¡J. ¡Bouvrie, ¡M. ¡Kouh, ¡ ¡ ¡J. ¡DiCarlo, ¡ ¡C. ¡Cadieu, ¡S. ¡Bileschi, ¡ ¡L. ¡Wolf, ¡ ¡ D. ¡Ferster, ¡I. ¡Lampl, ¡N. ¡Logothe[s, ¡H. ¡Buelthoff
We All Live In A Yellow Subroutine 40
Recommend
More recommend