nips 07 tutorial preliminary
play

NIPS07 tutorial (preliminary) Visual Recognition in Primates and - PowerPoint PPT Presentation

NIPS07 tutorial (preliminary) Visual Recognition in Primates and Machines Tomaso Poggio (with Thomas Serre) McGovern Institute for Brain Research Center for Biological and Computational Learning Department of Brain & Cognitive


  1. NIPS’07 tutorial (preliminary) Visual Recognition in Primates and Machines Tomaso Poggio (with Thomas Serre) McGovern Institute for Brain Research Center for Biological and Computational Learning Department of Brain & Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 USA

  2. Motivation for studying vision: trying to understand how the brain w orks • Old dream of all philosophers and more recently of AI: – understand how the brain works – make intelligent machines

  3. This tutorial: using a class of models to summarize/interpret experimental results • Models are cartoons of reality eg Bohr’s model of the hydrogen atom • All models are “wrong” • Some models can be useful summaries of data and some can be a good starting point for more complete theories

  4. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?

  5. The problem: recognition in natural images (e.g., “is there an animal in the image?”)

  6. How does visual cortex solve this problem? How can computers solve this problem? dorsal stream: “where” ventral stream: “what” Desimone & Ungerleider 1989

  7. A “feedforw ard” version of the problem: rapid categorization SHOW RSVP MOVIE Movie courtesy of Jim DiCarlo Biederman 1972; Potter 1975; Thorpe et al 1996

  8. A model of the ventral stream w hich is also an algorithm *Modified from (Gross, 1998) Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; [software available online] Serre Oliva Poggio 2007

  9. … solves the problem (if mask forces feedforw ard processing) … Model 82% human- observers (n = 24) 80% Human 80% • d’~ standardized error rate • the higher the d’, the better the perf. Serre Oliva & Poggio 2007

  10. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?

  11. Kanade 1974 *Best CVPR’07 paper 10 yrs ago … Object recognition for computer vision: Digit recognition Multi-class / multi-objects Pedestrian detection Car detection Face identification Face detection 1990 Turk & Pentland 1991 personal historical perspective Brunelli & Poggio 1993 Sung & Poggio 1994 Beimer & Poggio 1995 1995 Perona and colleagues 1996-now Osuna & Girosi 1997* LeCun et al. 1998; Schneiderman & Kanade 1998; Rowley Baluja & Kanade 1998 Mohan Papageorgiou & Poggio 1999; Amit and Geman 1999 2000 Schneiderman & Kanade 2000 Viola & Jones 2001 Belongie & Malik 2002; Argawal & Roth 2002; Ullman e the past few years… excellent algorithms in … Many more al 2002 Fergus et al 2003 Torralba et al 2004 …

  12. Examples: Learning Object Detection: Finding Frontal Faces • Training Database • 1000+ Real, 3000+ VIRTUAL • 50,0000+ Non-Face Pattern Sung & Poggio 1995

  13. ~10 year old CBCL computer vision w ork: pedestrian detection system in Mercedes test car now becoming a product (MobilEye)

  14. Hubel & Wiesel 1959 1960 Hubel & Wiesel 1962 Hubel & Wiesel 1965 Gross et al 1969 1970 Zeki 1973 IT-STS Exstrastriate V1 monkey V1 cat Object recognition in cortex: Hubel & Wiesel 1977 Historical perspective 1980 cortex Ungerleider & MIshkin 1982; Perrett Rolls et al 1982 Schwartz et al 1983 Desimone et al 1984 1990 Schiller & Lee 1991 Kobatake & Tanaka 1994 Logothetis et al 1995 past 10 yrs progress in the … Much

  15. Some personal history: First step in developing a model: learning to recognize 3D objects in IT cortex Examples of Visual Stimuli Poggio & Edelman 1990

  16. An idea for a module for view -invariant identification Prediction: neurons become view-tuned VIEW- through learning INVARIANT, OBJECT- Architecture that Regularization SPECIFIC accounts for Network (GRBF) invariances to 3D with Gaussian kernels UNIT effects (>1 view needed to learn!) View Angle Poggio & Edelman 1990

  17. Learning to Recognize 3D Objects in IT Cortex Examples of Visual Stimuli After human psychophysics (Buelthoff, Edelman, Tarr, Sinha, …), which supports models based on view-tuned units... … physiology! Logothetis Pauls & Poggio 1995

  18. Recording Sites in Anterior IT LUN LAT …neurons tuned to STS faces are intermingled nearby…. IOS AMTS LAT STS Ho=0 AMTS Logothetis, Pauls & Poggio 1995

  19. Neurons tuned to object view s as predicted by model Logothetis Pauls & Poggio 1995

  20. A “View -Tuned” IT Cell Target Views -168 o -120 o -108 o -96 o -84 o -72 o -60 -48 o -36 o -24 o -12 o 0 o o -168 -120 -108 -96 -84 -72 -48 -36 -24 -12 0 -60 o o o o o o o o o o o o 12 24 36 48 60 72 84 96 108 120 132 168 36 48 60 108 120 12 24 72 84 96 Distractors 60 spikes/sec 800 msec Logothetis Pauls & Poggio 1995

  21. But also view -invariant object-specific neurons (5 of them over 1000 recordings) Logothetis Pauls & Poggio 1995

  22. View -tuned cells: scale invariance (one training view only) motivates present model Scale Invariant Responses of an IT Neuron 3.25 deg 1.0 deg 1.75 deg 76 2.5 deg 76 76 76 (x 0.4) (x 0.7) (x 1.0) (x 1.3) Spikes/sec Spikes/sec Spikes/sec / ik 0 0 0 0 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 Time (msec) Time (msec) Time (msec) Time (msec) 76 4.0 deg 76 4.75 deg 76 5.5 deg 76 6.25 deg (x 1.6) (x 1.9) (x 2.2) (x 2.5) Spikes/sec Spikes/sec / / S ik S ik 0 0 0 0 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 1000 2000 3000 0 Time (msec) Time (msec) Time (msec) Time (msec) Logothetis Pauls & Poggio 1995

  23. From “ HMAX ” to the model now … Riesenhuber & Poggio 1999, 2000; Serre Kouh Cadieu Knoblich Kreiman & Poggio 2005; Serre Oliva Poggio 2007

  24. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?

  25. Neural Circuits Source: Modified from Jody Culham’s web slides

  26. Neuron basics INPUT= pulses or graded potentials COMPUTATION = Analog spikes OUTPUT = Chemical

  27. Some numbers • Human Brain – 10 11 -10 12 neurons (1 million flies ☺ ) – 10 14 - 10 15 synapses • Neuron – Fundamental space dimension: • fine dendrites : 0.1 µ diameter; lipid bilayer membrane : 5 nm thick; specific proteins : pumps, channels, receptors, enzymes – Fundamental time length : 1 msec

  28. The cerebral cortex Human Macaque Thickness 3 – 4 mm 1 – 2 mm Total surface area ~1600 cm2 ~160 cm2 (both sides) (~50cm diam) (~15cm diam) ~10 ⁵ / mm2 ~ 10 ⁵ / mm2 Neurons /mm² Total cortical neurons ~2 x 1010 ~2 x 109 Visual cortex 300 – 500 cm2 80+cm2 Visual Neurons ~4 x 109 ~109 neurons

  29. Gross Brain Anatomy A large percentage of the cortex devoted to vision

  30. The Visual System [Van Essen & Anderson, 1990]

  31. V1: hierarchy of simple and complex cells LGN-type Simple Complex cells cells cells (Hubel & Wiesel 1959)

  32. V1: Orientation selectivity Hubel & Wiesel movie

  33. V1: Retinotopy

  34. (Thorpe and Fabre-Thorpe, 2001)

  35. Beyond V1: A gradual increase in RF size Reproduced from [Kobatake & Tanaka, 1994] Reproduced from [Rolls, 2004]

  36. Beyond V1: A gradual increase in the complexity of the preferred stimulus Reproduced from (Kobatake & Tanaka, 1994)

  37. AIT: Face cells Reproduced from (Desimone et al. 1984)

  38. AIT: Immediate recognition categorization identification Hung Kreiman Poggio & DiCarlo 2005 See also Oram & Perrett 1992; Tovee et al 1993; Celebrini et al 1993; Ringach et al 1997; Rolls et al 1999; Keysers et al 2001

  39. 1. Problem of visual recognition, visual cortex 2. Historical background 3. Neurons and areas in the visual system 4. Data and feedforward hierarchical models 5. What is next?

  40. The ventral stream Source: Lennie, Maunsell, Movshon

  41. We consider feedforw ard architecture only (Thorpe and Fabre-Thorpe, 2001)

  42. Our present model of the ventral stream: feedforw ard, accounting only for “immediate recognition” • It is in the family of “Hubel-Wiesel” models (Hubel & Wiesel, 1959; Fukushima, 1980; Oram & Perrett, 1993, Wallis & Rolls, 1997; Riesenhuber & Poggio, 1999; Thorpe, 2002; Ullman et al., 2002; Mel, 1997; Wersing and Koerner, 2003; LeCun et al 1998; Amit & Mascaro 2003; Deco & Rolls 2006…) • As a biological model of object recognition in the ventral stream it is perhaps the most quantitative and faithful to known biology (though many details/facts are unknown or still to be incorporated)

  43. Tw o key computations Unit types Pooling Computation Operation Selectivity / Gaussian- Simple template tuning / matching and-like Soft-max / Complex Invariance or-like

  44. � Gaussian-like tuning operation (and-like) � Simple units � Max-like operation (or-like) � Complex units

Recommend


More recommend