going after object recognition peformance to discover how
play

Going after object recognition peformance to discover how the - PowerPoint PPT Presentation

invariance is crux problem Going after object recognition peformance to discover how the ventral stream works. hierarchical, working system James DiCarlo MD, PhD Professor of Neuroscience Head, Department of Brain and Cognitive Sciences


  1. “invariance” is crux problem Going after object recognition peformance to discover how the ventral stream works. hierarchical, working system James DiCarlo MD, PhD Professor of Neuroscience Head, Department of Brain and Cognitive Sciences Investigator, The McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge MA, USA

  2. Systems neuroscience: the non human primate model Ventral visual stream

  3. Systems neuroscience: the non human primate model Ventral visual stream Powerful set of visual features

  4. Systems neuroscience: the non human primate model Ventral visual stream Powerful set of visual features

  5. Understanding the brain and discovering game-changing information processing technology are two sides of the same coin. How the brain works

  6. The convergence of three fields When biological brains perform better than computers New ideas, algorithm parameters New phenomena psychophysics computer science neuroscience How the brain works Attempt to test/ falsify those hypotheses Falsifiable hypotheses When computers perform as well as or better than biological brains

  7. Common physical source (object) leads to many images “identity preserving image variation” View: position, size, pose, illumination Clutter, occlusion, illumination Intraclass Poggio, Ullman, Grossberg, Edleman, Biederman, etc. Deformation, DiCarlo and Cox, TICS (2007); articulation Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

  8. The convergence of three fields New ideas, algorithm parameters New phenomena psychophysics computer science neuroscience How the brain works

  9. Brain-inspired computer algorithms 2. Tolerance 1. Selectivity • Examples: • Hubel & Wiesel (1962) • Fukushima (1980) • Perrett & Oram (1993) • Wallis & Rolls (1997) “AND” “OR” • LeCun et al. (1998) • Risenhuber & Poggio (1999) FROM BIOLOGY: • Serre, Kouh, et al. (2005) • Hierarchy • Spatially local filters • Convolution • Normalization • Threshold NL • Unsupervised learning • ... Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

  10. The convergence of three fields psychophysics computer science neuroscience How the brain works Attempt to test/ e.g. HMAX falsify those hypotheses Falsifiable hypotheses

  11. HMAX successes (~2005) Serre, Kouh, Cadieu, Knoblich, Kreiman & Poggio 2005

  12. HMAX successes (~2007) (under limited human viewing conditions) Serre Oliva & Poggio 2007

  13. Circa 2007 Human level IT population HMAX Performance pixels

  14. ~2008: But HMAX and other models failed to explain neurons Representational similarity analysis Models of Biological ventral stream ventral stream HMAX model Kriegeskorte, Frontiers in Neuroscience (2009)

  15. What went wrong? New ideas, algorithm parameters New phenomena psychophysics computer science neuroscience How the brain works Attempt to test/ falsify those hypotheses Falsifiable hypotheses Stringency of these “Brains vs. Machines” tests was far too weak

  16. ~2008: Tests of performance were not stringent enough. Caltech 101 benchmark Animal vs. Non-animal Performance (%) 100 “V1-like” models SLF (~HMAX) V1-like 75 “HMAX 2.0” (Serre et al. PNAS 2007 ) Humans 50 One problem was insufficient Far-body Head Medium-body Close-body variation in the test sets. Pinto, Majaj, Barhomi, Salomon, Cox, DiCarlo COSYNE 2010 Pinto, Cox, and DiCarlo, PLoS Comp Bio (2008)

  17. Human level IT population V1-like HMAX Performance pixels

  18. 2009: More stringent, but compact tests of “object recognition” Example object recognition task: “car detection” Image generation strategy: Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

  19. 2009: Toward more stringent tests of “object recognition” “car” not “car” Basic car task, variation level: 3 Example object recognition task: “car detection” Image generation strategy: - Parametric control of task demand (esp. invariance) - Few images needed to bring computer vision features to their knees no variation lots of variation more variation lots of variation no variation more variation ... ... n>100 n>700 Pinto, Cox & DiCarlo, PLoS Comp Bol (2008), Pinto, DiCarlo and Cox, ECCV (2008); Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009)

  20. 2010: Machines vs. human brains Δ Machines beat humans! a) “cars vs. planes” task b) controls SLF 100 25 Performance relative to Pixels (%) 90 PHOW PHOG Geometric Blur V1-like Performance (%) SIFT 80 S L F V1-like ( ~ H M P A i X x 0 ) e l s P 70 H O S W Machines lose to I F T P H O humans G 60 chance 50 0 1 2 3 4 6 position (x-axis) 0% 10% 20% 30% 40% 50% 60% position (y-axis) 0% 20% 40% 60% 80% 100% 120% scale 0% 10% 20% 30% 40% 50% 60% in-plane rotation 0° 15° 30° 45° 60° 75° 90° in-depth rotation 0° 15° 30° 45° 60° 75° 90° Increasing Composite Variation Pinto, Barhomi, Cox & DiCarlo, WACV(2010) Data merged here: 48 basic-level tasks (8 labels x 6 level of variation)

  21. Human level IT population V1-like HMAX Performance pixels

  22. Human level IT population V1-like HMAX Performance pixels

  23. Human level HMAX IT population V1-like Performance pixels

  24. Human level IT population simple decode HMAX V1-like Performance pixels

  25. Human level IT population simple decode V4 population HMAX V1-like Performance pixels

  26. ? Zeiler& Human level Super Fergus Vision HMO IT population simple decode V4 population HMAX V1-like Performance pixels

  27. b 0.9 Image Object Yamins, Hong, Soloman, Seibert and DiCarlo (under review) Neural population similarity of images along the ventral stream generalization generalization Current maximum expected explanatory power * Popululation similaritty to IT IT neuronal units HMO model Explanatory power of 0.6 (RDM correlation) HMO model Animals (8) other models Boats (8) Cars (8) Image Chairs (8) IT units split-half 0.3 Pixels Faces (8) Fruits (8) V4 units V2-like V1-like HMAX Planes (8) HMO SIFT Tables (8) Inspired by N. Kriegeskorte et al. (2008, 2009) 0.0 a Pixels V1-like V2-like V4 neuronal units IT neuronal units HMO model Animals (8) Boats (8) Cars (8) Image Chairs (8) Faces (8) Fruits (8) Planes (8) Tables (8)

  28. Predictions of single site IT responses from current best model d Unit 1: r 2 = 0.48 Response of neural site Chairs Animals Boats Cars Faces Fruits Planes Tables Prediction of HMO model Unit 2: r 2 = 0.55 Response of neural site Prediction of HMO Faces Animals Boats Cars Chairs Fruits Planes Tables model Ability to predict IT responses to new images and new objects is dramatically better than previous models. Yamins, Hong, Soloman, Seibert and DiCarlo (under review)

  29. Basic bio-constrained model component inside HMO Pinto, Doukan, DiCarlo & Cox, PLoS Comp Biol (2009) a Basic operations: ���� filter , thr , sat , pool , norm � ������������������������������� � Filter Threshold & Pool Normalize Saturate � � 1 � � 2 ... � � k Neural-like basic operations � � � ��� ��� ��� “Output” is thousands L1 of visual Hierarchical Stacking L2 features L3 Hubel & Wiesel (1962), Fukushima (1980); Perrett & Oram (1993); Wallis & Rolls (1997); LeCun et al. (1998); Riesenhuber & Poggio (1999); Serre, Kouh, et al. (2005), etc....

  30. The better a model performs, the better is explains IT responses. Ability of artificial visual features (2013) 50% to predict IT responses (% variance explained) basic model class Exploration of 0% We are optimizing this way Performance of artificial visual features (% correct)

  31. Today: ? ? Human level Zeiler& HMO Super IT population simple Fergus Vision decode V4 population HMAX V1-like Performance pixels

  32. Follow the performance trail... New ideas, algorithm parameters New phenomena psychophysics computer science neuroscience How the brain works Attempt to test/ falsify those hypotheses Falsifiable hypotheses Stringency of these tests is crucial. Must include “invariance”.

  33. The power of stringent tests to elucidate biological brains • Discover IT neuronal codes that can explain behavior 1) • Demonstrate that other possible codes CANNOT • Demonstrate which computer vision features CANNOT Dan Yamins Ha Hong Ethan Soloman 2) • Driving discovery (“learning?”) of new CV features • These are becoming more and more capable of explaining what the brain is doing Dan Yamins Ha Hong Charles Cadieu Dave Cox Nicolas Pinto

Recommend


More recommend