9.54 Class 16 Features for recognition supervised, unsupervised and innate Shimon Ullman + Tomaso Poggio Danny Harari + Daniel Zysman + Darren Seibert
Visual recognition
The initial input is just image intensities
Object Categories -- We perceive the world in term of objects and classes -- Large variability within a each class
Individual Recognition
Object parts Window Mirror Window Door knob Headlight Back wheel Bumper Front wheel Headlight
Categorization: dealing with class variability
Class Non-class Natural for the brain, difficult computationally
Unsupervised Classification
Features and Classifiers
Features and Classifiers
Image features Classifier Generic Features Simple (wavelets) Complex (Geons)
Visual Class: Similar Configurations of Shared Image Components
What will be optimal image building- blocks for the class?
Optimal Class Components? • Large features are too rare • Small features are found everywhere Find features that carry the highest amount of information
Mutual Information I(C,F) • Definition of MI as the difference between the class entropy and conditional entropy of the class given a feature: I(F,C) = H(C) – H(C|F) • Definition of entropy: ( ) ( ) ( ( )) H C P c Log P c c C • Definition of conditional entropy: ( ) ( ) ( ) H C F p f H C F f f F ( ) ( ) ( ( )) p f P c f Log P c f f F c C
Mutual Information I(C,F) Class: 1 1 0 1 0 1 0 0 Feature 1 0 0 1 1 1 0 0 I(F,C) = H(C) – H(C|F)
Computing MI from Examples • Mutual information can be measured from examples: • 100 Faces 100 Non-faces Feature: 44 times 6 times Mutual information: 0.1525 H(C) = 1, H(C|F) = 0.8475 Simple neural-network approximations
Optimal classification features • Theoretically: maximizing delivered information minimizes classification error Error = H – I(C;F) • In practice: informative object components can be identified in training images
Selecting Fragments Mutual Info vs. forehead Threshold hairline mouth Mutual Info eye nose nosebridge long_hairline 0.00 20.00 40.00 chin Detection threshold twoeyes ‘Imprinting’ many receptive fields and selecting a subset
Adding a New Fragment (Avoiding redundancy by max-min selection) Δ MI ? Compare new fragments Fi to all the previous ones. Select F which maximizes the additional information Max i Min k Δ MI (Fi, Fk) Competition between units with similar responses
Highly Informative Face Fragments Optimal receptive fields for Faces Ullman et al Nature Neuroscience 2002
Informative class features Horse-class features Car-class features
Informative fragments with positions ∑ w k F k > θ On all detected fragments within their regions
Star model Detected fragments ‘vote’ for the center location Find location with maximal vote In variations, a popular state-of-the art scheme
Image parts informative for classification Ullman, Sali 1999 Agarwal, Roth 2002 Fergus, Perona, Zisserman 2003
Variability of Airplanes Detected
Image representation for recognition HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection
Object model using HoG
fMRI Functional Magnetic Resonance Imaging
Looking for Class Features in the Brain: fMRI Lerner, Epshtein Ullman Malach JCON 2008
Class-fragments and Activation Malach et al 2008
EEG
Informative Fragments: ERP Study Harel, Ullman, Epshtein, Bentin
ERP FACE FEATURES FACE FEATURES Posterior-Temporal sites Posterior-Temporal sites Left Hemisphere Left Hemisphere Right Hemisphere Right Hemisphere MI 1 — MI 2 — MI 3 — MI 4 — MI 5 — 0 0 200 200 400 400 600 600 0 0 200 200 400 400 600 600 milliseconds milliseconds milliseconds milliseconds Harel, Ullman,Epshtein, Bentin Vis Res 2007
Features for object segregation: Innate mechanisms for unsupervised learning
Object Segregation Object 2 Object 1 Background
Object segregation is learned [Kellman & Spelke 1983; Spelke 1990; Kestenbaum et al., 1987] Even basic Gestalt cues are initially missing [Schmidt et al. 1986] 5 months
Object segregation is learned Adults
It all begins with motion
It all begins with motion Grouping by common motion precedes figural goodness [Spelke 1990 - review] Motion discontinuities provide an early cue for occlusion boundaries [Granrud et al. 1984]
Our model Motion-based segregation Motion Common Boundary Global discontinuities motion General Object-specific Accurate Complete Noisy Inaccurate Local occlusion Object form Incomplete boundaries Static segregation Dorfman, Harari & Ullman, CogSci 2013
Boundary Intensity edges?
Boundary Occlusion cues T-junctions Convexity Extremal edges [Ghose & Palmer 2010]
Global Familiar object
How does it actually work?
Motion Moving object
Motion Figure Ground Unknown Boundary Global
Boundary Informative boundary features Need many examples for good results (1000+)
Boundary Prediction Figure Figure or or Ground? Ground? Novel object, novel background 78% success Using 100,000 training examples
Boundary Entire image Figure Background
Global Learning an object Standard object recognition algorithm Learns local features and their relative locations
Global Detection
Combining information sources Figure Background Boundary Global Combined Accurate Complete Noisy & Incomplete Inaccurate
More complex algorithms Default GrabCut With segregation cue [Rother et al. 2004]
More complex algorithms Default GrabCut With segregation cue [Rother et al. 2004]
Object segregation - summary • Static segregation is learned from motion • Two simple mechanisms: Boundary Motion discontinuities Occlusion boundaries (Need a rich library, including extremal edges) Global Common motion Object form • These mechanisms work in synergy • This is enough to get started, adult segregation is much more complex
Summary • Features are important for many visual tasks such as object recognition and segregation. • Features can be learned in a supervised manner given labeled examples. • Features can be also learned in an unsupervised manner using statistical regularities or domain-specific cues.
Recommend
More recommend