From old HMAX to present HMAX (a special case of full i-theory) How the new version of the model evolved from the original one 1. The two key operations: Operations for selectivity and invariance, originally computed in a simplified and idealized form (i.e., a multivariate Gaussian and an exact max, see Section 2) have been replaced by more plausible operations, normalized dot-product and softmax 2. S1 and C1 layers: In [Serre and Riesenhuber, 2004] we found that the S1 and C1 units in the original model were too broadly tuned to orientation and spatial frequency and revised these units accordingly. In particular at the S1 level, we replaced Gaussian derivatives with Gabor filters to better fit parafoveal simple cells’ tuning properties. We also modified both S1 and C1 receptive field sizes. 3. S2 layers: They are now learned from natural images. S2 units are more complex than the old ones (simple 2 °— 2 combinations of orientations). The introduction of learning, we believe, has b een the key factor for the model to achieve a high-level of performance on natural images, see [Serre et al., 2002]. 4. C2 layers: Their receptive field sizes, as well as range of invariances to scale and position have been decreased so that C2 units now better fit V4 data. 5. S3 and C3 layers: They were recently added and constitute the top-most layers of the model along with the S2b and C2b units (see Section 2 and above). The tuning of the S3 units is also learned from natural images. 6. S2b and C2b layers: We added those two layers to account for the bypass route (that projects directly from V1/V2 to PIT, thus bypassing V4 [see Nakamura et al., 1993]).
Serre & Riesenhuber 2004
1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Feedforward hierarchical models 5. Beyond hierarchical models
Vision: ¡what ¡is ¡where ¡ • Human ¡Brain ¡ – 10 10 -‑10 11 ¡neurons ¡ ¡(~1 ¡million ¡flies) ¡ – 10 14 -‑ ¡10 15 ¡synapses ¡ • Neuron – Fundamental space dimensions: • fine dendrites : 0.1 µ diameter; lipid bilayer membrane : 5 nm thick; specific proteins : pumps, channels, receptors, enzymes – Fundamental time length : 1 msec • Ventral ¡stream ¡in ¡rhesus ¡monkey ¡ – ~10 9 ¡neurons ¡in ¡the ¡ventral ¡stream ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ (350 ¡10 6 ¡ in ¡each ¡emisphere) ¡ – ~15 ¡10 6 ¡neurons ¡in ¡AIT ¡(Anterior ¡ InferoTemporal) ¡cortex ¡ Van Essen & Anderson, 1990
Vision: ¡what ¡is ¡where ¡ Source: Lennie, Maunsell, Movshon
The Ventral Stream The ventral stream hierarchy: V1, V2, V4, IT A gradual increase in the receptive field size, in the complexity of the preferred stimulus, in tolerance to position and scale changes Kobatake & Tanaka, 1994
(Thorpe and Fabre-Thorpe, 2001)
V1: hierarchy of simple and complex cells LGN-type Simple Complex cells cells cells (Hubel & Wiesel 1959)
1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Feedforward hierarchical models 5. Beyond hierarchical models
¡ ¡Recogni3on ¡in ¡the ¡Ventral ¡Stream: ¡‘’classical ¡model” *Modified from (Gross, 1998) [software available online Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu with CNS (for GPUs)] Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007
A “feedforward” version of the problem: rapid categorization (RVSP) Biederman 1972; Potter 1975; Thorpe et al 1996
Two key computations, suggested by physiology Unit Pooling Computation Operation types Gaussian- Selectivity / tuning / Simple template matching AND-like Soft-max / Complex Invariance or-like
Gaussian tuning Gaussian tuning in Gaussian tuning in IT V1 for orientation around 3D views Hubel & Wiesel 1958 Logothetis Pauls & Poggio 1995
Max-like operation Max-like behavior in V4 Max-like behavior in V1 Lampl Ferster Poggio & Riesenhuber 2004 Gawne & Martin 2002 see also Finn Prieber & Ferster 2007
Two operations (~OR, ~AND): disjunctions of conjunctions Ø Tuning operation (Gaussian-like, AND-like) y = e − | x − w | 2 or y ~ x i w | x | Ø Simple units Stage 3 Ø Max-like operation (OR-like) Stage 2 Ø Complex units Stage 1 Each operation ~microcircuits of ~100 neurons
Plausible biophysical implementations • Max and Gaussian-like tuning can be approximated with same canonical circuit using shunting inhibition. Tuning (eg “center” of the Gaussian) corresponds to synaptic weights. (Knoblich Koch Poggio in prep; Kouh & Poggio 2007; Knoblich Bouvrie Poggio 2007)
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡ ¡ circuits ¡and ¡biophysics A canonical microcircuit of spiking neurons? Stage 2 Stage 1 A plausible biophysical implementation for both Gaussian tuning (~AND) + max (~OR): normalization circuits with divisive inhibition ( Kouh, Poggio, 2008; also RP, 1999; Heeger, Carandini, Simoncelli, … )
Basic circuit is closely related to other models Can be implemented by shunting inhibition (Grossberg 1973, Reichardt et al. 1983, Carandini and Heeger, 1994) and spike threshold variability (Anderson et al. 2000, Miller and Troyer, 2002) Adelson and Bergen (see also Hassenstein and Reichardt, 1956) Of the same form as model of MT (Rust et al., Nature Neuroscience, 2007
Simulation with spiking neurons and realistic synapses
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡ ¡ circuits ¡and ¡biophysics Stage 2 A plausible biophysical implementation of a Gaussian-like tuning ( Kouh, Poggio, 2008) : Stage 1 normalized dot product w ⋅ x | x |
S1 units Gabor filters Parameters fit to V1 data (Serre & Riesenhuber 2004) 17 spatial frequencies (=scales) 4 orientations
C1 units Increase in tolerance to position (and in RF size)
C1 units Increase in tolerance to scale
Serre & Riesenhuber 2004
S2 units Features of moderate complexity (n~1,000 types) Combination of V1-like complex units at different orientations Synaptic weights w learned from natural images 5-10 subunits chosen at random from all possible afferents (~100-1,000)
S2 units homogenous fields stronger facilitation stronger suppression cross- orientation fields
Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975 Neurons in monkey visual area V2 encode combinations of orientations Akiyuki Anzai, Xinmiao Peng & David C Van Essen
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ (from ¡Serre, ¡2007) ¡
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ • Task-specific circuits (from IT to PFC?) - Supervised learning: ~ classifier Overcomplete dictionary of “templates” ~ image “patches” ~ ~ “parts” is learned during an unsupervised learning stage (from ~10,000 natural images) by tuning S units. see also (Foldiak 1991; Perrett et al 1984; Wallis & Rolls, 1997; Lewicki and Olshausen, 1999; Einhauser et al 2002; Wiskott & Sejnowski 2002; Spratling 2005)
Start with S2 layer Units are organized in n … feature maps … Database ~1,000 natural images At each iteration: Ø Present one image Ø Learn k feature maps
w 1 Start with S2 layer … Pick 1 unit from the first map at random … Store in unit synaptic weights the precise pattern of subunits activity, i.e. w=x Image “moves” (looming and shifting) S2 Weight vector w is copied to all units in feature map 1 C1 (across positions and scales)
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ S2 units • Features of moderate complexity (n~1,000 types) • Combination of V1-like complex units at different orientations stronger • Synaptic weights w facilitation learned from natural images • 5-10 subunits chosen at random from all possible afferents (~100-1,000) stronger suppression
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ Sample ¡S2 ¡Units ¡Learned ¡ (from ¡Serre, ¡2007)
Nature Neuroscience - 10, 1313 - 1321 (2007) / Published online: 16 September 2007 | doi:10.1038/nn1975 Neurons in monkey visual area V2 encode combinations of orientations Akiyuki Anzai, Xinmiao Peng & David C Van Essen
Comparison ¡w| ¡V4 Tuning ¡for ¡ curvature ¡and ¡ boundary ¡ conformaJons? Pasupathy & Connor 2001
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ C2 ¡units • Same selectivity as S2 units but increased tolerance to position and size of preferred stimulus • Local pooling over S2 units with same selectivity but different positions and scales
Cerebral Cortex Advance Access published online on June 19, 2006 A Comparative Study of Shape Representation in Macaque Visual Areas V2 and V4 Jay Hegdé and David C. Van Essen
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡learning ¡ Beyond ¡C2 ¡units • Units increasingly complex and invariant • S3/C3 units: • Combination of V4-like units with different selectivities • Dictionary of ~1,000 features = num. columns in IT (Fujita 1992)
A loose hierarchy • Bypass routes along with main routes: • From V2 to TEO (bypassing V4) (Morel & Bullier 1990; Baizer et al 1991; Distler et al 1991; Weller & Steele 1992; Nakamura et al 1993; Buffalo et al 2005) • From V4 to TE (bypassing TEO) (Desimone et al 1980; Saleem et al 1992) • “Replication” of simpler selectivities from lower to higher areas • Rich dictionary of features – across areas -- with various levels of selectivity and invariance
¡ ¡Model: ¡testable ¡at ¡different ¡levels ¡ The ¡ most ¡ recent ¡ version ¡ of ¡ this ¡ straighLorward ¡ class ¡ of ¡ models ¡ is ¡ consistent ¡ with ¡ many ¡ data ¡ at ¡ different ¡ levels ¡ -‑-‑ ¡ from ¡ the ¡ computa(onal ¡ to ¡ the ¡ biophysical ¡ level. ¡ ¡ ¡ Being ¡testable ¡across ¡all ¡these ¡levels ¡ is ¡a ¡high ¡bar ¡and ¡an ¡important ¡one ¡ (too ¡ easy ¡ to ¡ develop ¡ models ¡ that ¡ explain ¡ one ¡ phenomenon ¡ or ¡ one ¡ area ¡ or ¡ one ¡ illusion...these ¡ models ¡ overfit ¡ the ¡ data, ¡ they ¡ are ¡ not ¡ scienJfic)
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡ model ¡accounts ¡for ¡ ¡physiology+ ¡psychophysics Hierarchical ¡Feedforward ¡Models: is ¡consistent ¡with ¡or ¡predict ¡ ¡neural ¡data V1: Simple and complex cells tuning (Schiller et al 1976; Hubel & Wiesel 1965; Devalois et al 1982) MAX-like operation in subset of complex cells (Lampl et al 2004) V2: Subunits and their tuning (Anzai, Peng, Van Essen 2007) V4: Tuning for two-bar stimuli (Reynolds Chelazzi & Desimone 1999) MAX-like operation (Gawne et al 2002) Two-spot interaction (Freiwald et al 2005) Tuning for boundary conformation (Pasupathy & Connor 2001, Cadieu, Kouh, Connor et al., 2007) Tuning for Cartesian and non-Cartesian gratings (Gallant et al 1996) IT: Tuning and invariance properties (Logothetis et al 1995, paperclip objects) Differential role of IT and PFC in categorization (Freedman et al 2001, 2002, 2003) Read out results (Hung Kreiman Poggio & DiCarlo 2005) Pseudo-average effect in IT (Zoccolan Cox & DiCarlo 2005; Zoccolan Kouh Poggio & DiCarlo 2007) Human: Rapid categorization (Serre Oliva Poggio 2007) Face processing (fMRI + psychophysics) (Riesenhuber et al 2004; Jiang et al 2006)
99
¡ ¡Recogni3on ¡in ¡Visual ¡Cortex: ¡ model ¡accounts ¡for ¡ ¡phychophysics
Recommend
More recommend