where what network 3 wwn 3 developmental top down
play

Where-What Network 3 (WWN-3): Developmental Top-Down Attention for - PowerPoint PPT Presentation

Where-What Network 3 (WWN-3): Developmental Top-Down Attention for Multiple Foregrounds and Complex Backgrounds Matthew Luciw www.cse.msu.edu/~luciwmat Juyang Weng www.cse.msu.edu/~weng Embodied Intelligence Lab www.cse.msu.edu/ei


  1. Where-What Network 3 (WWN-3): Developmental Top-Down Attention for Multiple Foregrounds and Complex Backgrounds Matthew Luciw www.cse.msu.edu/~luciwmat Juyang Weng www.cse.msu.edu/~weng Embodied Intelligence Lab www.cse.msu.edu/ei

  2. Attention with Multiple Contexts • What’s the foreground? • “Find the person in a pink shirt and red hat.” • “Find the person in a brown shirt and gray hat.” Michigan State University 2

  3. General Purpose Attention and Recognition • Major Issues • Complex backgrounds • Binding problem • Lack of constraints • Chicken-egg problem • Recognition requires segmentation • Segmentation requires recognition • Remains an open problem Michigan State University 3

  4. Problems with Many Attention- Recognition Methods • Not utilizing top-down feedback simultaneously with bottom-up, at each layer • As in biological visual systems • Deal with multiple contexts • Learning better features • Border ownership and transparency • Not developmental • Using pre-selected rules to find interest points (e.g., corner detection) • Detect and recognize pre-selected objects • Pre-selected architecture (e.g., # layers and neurons)

  5. Some Other Approaches • Feature Integration Theory (Treisman 1980): a master map for location? • Saliency-based (Itti et al. 1998): feature types pre-selected • Bottom-up: traditional; top-down: gain-tuning (Backer et al. 2001) • Shift circuits (Anderson & Van Essen 1987, Olshausen et al. 1993): how were they developed? • SIFT: requiring pre-selected rule for interest points • Top-down in connectionist models • Visual search and label-based top-down: (Deco & Rolls, 2004): no top- down in training • Selective tuning (Tsostos et al. 1995) using inhibitory top-down • ARTSCAN (Fazl & Grossberg, 2007): excitatory top-down, form fitting ``attentional shroud’’ --- potential difficulty with complex backgrounds Michigan State University 5

  6. Visual System: Rich Bidirectional Connectivity Coritcal area connectivity e.g., as seen in Felleman and Van • Essen’s study (1993)… But… this seems too complicated to model? •

  7. Evidence that Areas are Developed from Statistics (1): Orientation selective neurons: internal representation • (2): Blakemore and Cooper: representation is experience dependent • (3): M. Sur: Input-driven self-organization and feature development • Suggests functional representation is not hardcoded, but developed •

  8. Consider a Single Area with Bottom-Up and Top-Down M V (top-down weight matrix) (bottom-up weight matrix) For a single neuron:

  9. Multilayer Bidirectional Where What Networks

  10. WWN-3 • Two way information flow in both training and testing • Different information flow parameterizations allow different attention modes (i.e., what- imposed, where-imposed) • No hardcoded rules for interest points or features: each area learns through Lobe Component Analysis

  11. Each Layer: Lobe Component Analysis LCA incrementally approximates joint distribution of bottom-up + top-down, in a dually optimal way LCA used for learning bottom-up and top-down weights in each area • Weng & Zheng, WCCI, 2006 • Weng & Luciw, TAMD, 2009

  12. Learned Prototypes (Above): Example training images, from 5 classes with 3 rotation variations in depth Location and Type are imposed at the motors Right: response-weighted input of a slice of V4: shows bottom-up sensitivities Current object representation pathway is limited

  13. Learned Features in IT and PP (a): IT learned type-specific (here: duck) but allows location variations: we show response- weighted input of 4 single neurons here (b): PP learned location-specifc but allows type variation IT spatial representation These effects are enabled by top- down connections in training

  14. Response of a Layer • V: bottom-up weights • M: top-down weights • f: lateral inhibition (approximation), k --- number of nonzero firing units • rho: relative influence of bottom-up to top-down • g: activation function e.g., sigmoid, tanh, linear

  15. WWN Operates Over Multiple Contexts

  16. Type: cat imposed at V4: “Find the Cat” motor From IT (PP has a low weight in search tasks) (right): bottom-up response (below): top-k (40) Integration of bottom-up and top-down Top-k (4): To IT and PP

  17. Type: pig imposed at V4: “Find the Pig” motor From IT (right): bottom-up response (below): top-k (40) Integration of bottom-up and top-down Top-k (4): To IT and PP

  18. Attentional Context

  19. Performance Over Learning Disjoint views used in testing

  20. Performance with Multiple Objects

  21. Future: Multimodal SASE • The SASE (self-aware and self-effecting) architecture describes a highly recurrent architecture of a multi-sensor, multi-effector brain. Multi-sensory and multi-effector integration are achieved through developmental learning.

  22. Conclusions • Novel methods on utilizing top-down excitatory connections in multilayer Hebbian networks • Top-down connections in WWN-3 • Top-down attention and recognition without a master map or internal ``canonical views’’ (combination neurons) • Multilayer synchronization • Top-down context switching based on an internal idea or external percept • Hopefully contributes to the foundations of online learning based in cortex-inspired methods

  23. Thank You • Questions Michigan State University 23

  24. Future: Synaptic Neuromodulation • Background has high variation, foreground has low variation • Automatic receptive field learning for larger recognition hierarchies (e.g., V1 <-> V2 <-> V4) Michigan State University 24

Recommend


More recommend