object recognition and hierarchical computation
play

Object recognition and hierarchical computation Challenges in - PDF document

Object recognition and hierarchical computation Challenges in object recognition. Fukushimas Neocognitron View-based representations of objects Poggios HMAX Forward and Feedback in visual hierarchy


  1. Object recognition and hierarchical computation • Challenges in object recognition. • Fukushima’s Neocognitron • View-based representations of objects • Poggio’s HMAX • Forward and Feedback in visual hierarchy • Hierarchical Bayes 15-883 Computational models of neural systems. Visual system lecture 2. Tai Sing Lee. 15-883 Computational models of neural systems. Visual system lecture 2. Tai Sing Lee. 1

  2. Selfridge’s Pandemonium Model (1959) for object recognition or cognitive demons or cognitive demons Challenges in object recognition Challenges in object recognition • • What are the computational logic and concerns? What are the computational logic and concerns? 2

  3. Fukushima’s Neocognitron (1980) A hierarchical multi-layered neural network for visual pattern recognition. Generalization (invariance) vs discrimination (specificity) Generalizing simple-complex cells: alternating S and C layers: S Generalizing simple-complex cells: alternating S and C layers: S - - feature detectors (e.g. simple cells) detect conjunction feature detectors (e.g. simple cells) detect conjunction of features of features (AND, specificity), C - (AND, specificity), C - invariance pooling (e.g. complex cells) (OR, invariance pooling (e.g. complex cells) (OR, MAX, invariance). MAX, invariance). 3

  4. Gradual specificity and invariance S2 C2 C2 S2 S3 S3 AND AND OR OR Conjunction of the C2 feature detectors C2 feature detectors Conjunction of the Hierarchical Features Local features gradually integrated into more global features. With C stage performing `blurring’, the next S stage detects more global features even if the components are deformed and shifted. Not tolerated. Not tolerated. 4

  5. Character recognition Revolt against 3D model • Psychophysical experiments (Buelthoff 1992; Gauthier 1997) and neurophysiological experiment (Logothetis 1995) provided strong support for the view-based representation of objects. • Tested novel object in a particular view, people and monkeys tend to recognize the novel object only within +/- 40 degrees in rotation. • Neurons also exhibit Gaussian tuning curves with peaks 40-50 degrees apart. 5

  6. 6

  7. Basic facts • IT -- neurons coded for object, input to prefrontal cortex. • IT neurons are relatively scale and position invariance, but view-point and lighting dependence. • View-point interpolation between different object views to achieve view-point invariance object recognition. • Learning specific to an individual novel object is not required to be scale and translation invariant. • Recognition can be very fast. 8/seconds, possibly mediated by a feed-forward model of ventral stream processing. Basic motivation for the HMAX model • Generalizing simple cell to complex cell relationship -- invariance to changes in the position of an optimal stimulus (within a range) is obtained in the model by means of a maximum operation (max) performed on the simple cell inputs to the complex cells, where the strongest input determines the cell's output. This preserves feature specificity. • The model alternates layers of units combining simple filters into more complex ones - to increase pattern selectivity with layers based on the max operation - to build invariance to position and scale while preserving pattern selectivity. • The RBF (Radial Basis Function) -like learning network learns a specific task based on a set of cells tuned to example views. 7

  8. Hierarchical model of object recognition (HMAX) Riesenhuber & Poggio, Nature Neuroscience , 1999 Hierarchical model and X (HMAX) Supervised training with one Supervised training with one hidden layer hidden layer 256 units, each max of all 256 units, each max of all scales all positions scales all positions 4 4 neighboring (non-overlapping) neighboring (non-overlapping) C1 units form a conjunctive S2 form a conjunctive S2 C1 units unit - 256 types in each unit - 256 types in each band. band. Spatial pooling (4-12 depending on Spatial pooling (4-12 depending on input RF size), 4 scales band. input RF size), 4 scales band. 4 orientations, 2 pixel apart, 4 orientations, 2 pixel apart, 7x7 to 29x29 scales, 7x7 to 29x29 scales, normalized. normalized. 128x128 input 128x128 input http://maxlab.neuro.georgetown.edu/hmax.html#c2 8

  9. Logothetis, , Pauls Pauls and and Poggio Poggio Current Biology 1995. Current Biology 1995. Logothetis 9

  10. Why does it work at all? • S2 computes conjunction of oriented features within a neighborhood to create 256 types of higher order features for each scale band and at every position. • C2 pools over all positions and scale bands, thus is a bag of features. • This bag of feature approach, also popular in computer vision at the time, apparently is sufficient for discriminating many objects. WHY? • Because there are a lot of features, some of the features are rather unique and discriminative. It is unlikely two objects tested will share the same set of conjunctive features. Highly nonlinear integration of component parts 10

  11. Nonlinear MAX operation approximates physiology better But what does feedback do? • Attention -- feature and spatial attentional selection. • Interactive activation -McClelland and Rumelhart, Adaptive resonance - Grossberg, bringing in global structures and context to influence the interpretation of low-level interpretation and pattern completion. • A modern view: generative model and hierarchical Bayes. 11

  12. A modern view: generative model and hierarchical Bayesian inference. Analysis by Synthesis Analysis by Synthesis Mumford Mumford (1992) (1992) 12

  13. Multi-scale analysis across visual areas. MT receptive field MT receptive field Increase in RF size and complexity of encoded features along the visual hierarchy. Increase in RF size and complexity of encoded features along the visual hierarchy. Lee and Mumford’s conjecture High-resolution buffer hypothesis of V1 V1 is not simply a filter-bank for extracting features, but is a unique high-resolution buffer that is used by higher visual cortex for performing any visual inference that requires spatial precision and high- resolution feature details. Lee, Mumford, Romero and Lamme (1998), Lee and Mumford (2003) 13

  14. The interplay of priors, context and memory in visual inference. The need to see depth and segregate surfaces is so strong that we can hallucinate surface and contours at locations where there is no visual evidence for them. Marr (1981) 14

  15. Spatial response to the illusory contour is found at precisely the expected location at V1 Lee and Nguyen 2001 PNAS. Temporal Response to the Illusory Contours at 100 msec, 40 msec later than V2 15

  16. Shape from shading with simple priors with factor graph and BBP and higher order priors Potetz CVPR (2007). Texton/token primal sketch Zhu and Mumford Mumford (2997) (2997) Zhu and 16

  17. Dictionary of visual primitives One implementation of Primal Sketch One implementation of Primal Sketch 17

  18. Segmentation as simplifying parsing Input Segmentation Synthesis from model I ~ p ( I | W*) Hidden variables describe segments and their texture, allowing both slow and abrupt intensity and texture changes. Hierarchical Generative models Hierarchical Generative models Zhu and Zhu and Mumford Mumford (2007) (2007) Quest for a stochastic grammar for vision Quest for a stochastic grammar for vision 18

  19. Parsing graph Parsing graph in response to object in an image in response to object in an image Generation of clock images by hallucination Generation of clock images by hallucination 19

  20. Summary • Invariance and specificity in object recognition. • Fukushima’s Neocognitron • View-based representations of objects • Poggio’s HMAX • Forward and Feedback in visual hierarchy • Feedback for generating explanation for images • Hierarchical Bayes and generative models • High-resolution buffer hypothesis of V1 Readings • Riesenhuber, M. & Poggio, T. (1999). Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2: 1019- 1025. • Lee, T.S., Mumford, D. (2003) Hierarchical Bayesian inference in the visual cortex. Journal of Optical Society of America, A. . 20(7): 1434-1448. 20

Recommend


More recommend