Where-What Network 3 (WWN-3): Developmental Top-Down Attention for Multiple Foregrounds and Complex Backgrounds Matthew Luciw www.cse.msu.edu/~luciwmat Juyang Weng www.cse.msu.edu/~weng Embodied Intelligence Lab www.cse.msu.edu/ei
Attention with Multiple Contexts • What’s the foreground? • “Find the person in a pink shirt and red hat.” • “Find the person in a brown shirt and gray hat.” Michigan State University 2
General Purpose Attention and Recognition • Major Issues • Complex backgrounds • Binding problem • Lack of constraints • Chicken-egg problem • Recognition requires segmentation • Segmentation requires recognition • Remains an open problem Michigan State University 3
Problems with Many Attention- Recognition Methods • Not utilizing top-down feedback simultaneously with bottom-up, at each layer • As in biological visual systems • Deal with multiple contexts • Learning better features • Border ownership and transparency • Not developmental • Using pre-selected rules to find interest points (e.g., corner detection) • Detect and recognize pre-selected objects • Pre-selected architecture (e.g., # layers and neurons)
Some Other Approaches • Feature Integration Theory (Treisman 1980): a master map for location? • Saliency-based (Itti et al. 1998): feature types pre-selected • Bottom-up: traditional; top-down: gain-tuning (Backer et al. 2001) • Shift circuits (Anderson & Van Essen 1987, Olshausen et al. 1993): how were they developed? • SIFT: requiring pre-selected rule for interest points • Top-down in connectionist models • Visual search and label-based top-down: (Deco & Rolls, 2004): no top- down in training • Selective tuning (Tsostos et al. 1995) using inhibitory top-down • ARTSCAN (Fazl & Grossberg, 2007): excitatory top-down, form fitting ``attentional shroud’’ --- potential difficulty with complex backgrounds Michigan State University 5
Visual System: Rich Bidirectional Connectivity Coritcal area connectivity e.g., as seen in Felleman and Van • Essen’s study (1993)… But… this seems too complicated to model? •
Evidence that Areas are Developed from Statistics (1): Orientation selective neurons: internal representation • (2): Blakemore and Cooper: representation is experience dependent • (3): M. Sur: Input-driven self-organization and feature development • Suggests functional representation is not hardcoded, but developed •
Consider a Single Area with Bottom-Up and Top-Down M V (top-down weight matrix) (bottom-up weight matrix) For a single neuron:
Multilayer Bidirectional Where What Networks
WWN-3 • Two way information flow in both training and testing • Different information flow parameterizations allow different attention modes (i.e., what- imposed, where-imposed) • No hardcoded rules for interest points or features: each area learns through Lobe Component Analysis
Each Layer: Lobe Component Analysis LCA incrementally approximates joint distribution of bottom-up + top-down, in a dually optimal way LCA used for learning bottom-up and top-down weights in each area • Weng & Zheng, WCCI, 2006 • Weng & Luciw, TAMD, 2009
Learned Prototypes (Above): Example training images, from 5 classes with 3 rotation variations in depth Location and Type are imposed at the motors Right: response-weighted input of a slice of V4: shows bottom-up sensitivities Current object representation pathway is limited
Learned Features in IT and PP (a): IT learned type-specific (here: duck) but allows location variations: we show response- weighted input of 4 single neurons here (b): PP learned location-specifc but allows type variation IT spatial representation These effects are enabled by top- down connections in training
Response of a Layer • V: bottom-up weights • M: top-down weights • f: lateral inhibition (approximation), k --- number of nonzero firing units • rho: relative influence of bottom-up to top-down • g: activation function e.g., sigmoid, tanh, linear
WWN Operates Over Multiple Contexts
Type: cat imposed at V4: “Find the Cat” motor From IT (PP has a low weight in search tasks) (right): bottom-up response (below): top-k (40) Integration of bottom-up and top-down Top-k (4): To IT and PP
Type: pig imposed at V4: “Find the Pig” motor From IT (right): bottom-up response (below): top-k (40) Integration of bottom-up and top-down Top-k (4): To IT and PP
Attentional Context
Performance Over Learning Disjoint views used in testing
Performance with Multiple Objects
Future: Multimodal SASE • The SASE (self-aware and self-effecting) architecture describes a highly recurrent architecture of a multi-sensor, multi-effector brain. Multi-sensory and multi-effector integration are achieved through developmental learning.
Conclusions • Novel methods on utilizing top-down excitatory connections in multilayer Hebbian networks • Top-down connections in WWN-3 • Top-down attention and recognition without a master map or internal ``canonical views’’ (combination neurons) • Multilayer synchronization • Top-down context switching based on an internal idea or external percept • Hopefully contributes to the foundations of online learning based in cortex-inspired methods
Thank You • Questions Michigan State University 23
Future: Synaptic Neuromodulation • Background has high variation, foreground has low variation • Automatic receptive field learning for larger recognition hierarchies (e.g., V1 <-> V2 <-> V4) Michigan State University 24
Recommend
More recommend