recognizing and learning object categories
play

Recognizing and Learning Object Categories Based on work and slides - PowerPoint PPT Presentation

Traditional Problem: Single Object Recognition Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A. Zisserman, A. Efros, J. Ponce, S. Lazebnik, C. Schmid, F. DiMaio, and others Most Objects Exhibit


  1. Traditional Problem: Single Object Recognition Recognizing and Learning Object Categories Based on work and slides by R. Fergus, P. Perona, A. Zisserman, A. Efros, J. Ponce, S. Lazebnik, C. Schmid, F. DiMaio, and others Most Objects Exhibit Considerable Some object Intra-Class Variability categories Learn from just examples Difficulties: f Size variation Background clutter f Occlusion f Intra-class variation f f Viewpoint variation Task: Recognition of object categories Illumination variation f 1

  2. Approach 1: Discriminative Methods Chairs Object detection and recognition is formulated as a classification problem The image is partitioned into a set of overlapping windows … and a decision is taken at each window about if it contains a target object or not Decision boundary Background Where are the screens? Computer screen Bag of image patches In some feature space Related by function, not form HRCT Lung Image Training Examples ������������������ �������������� ������������������� ������������������� Dilated bronchus ��� × ��������� 2

  3. Discriminative Methods Formulation § Formulation: binary classification Neural Networks Nearest Neighbor … x 1 x 2 x 3 … x N x N+1 x N+2 … x N+M 10 6 examples Features x = LeCun, Bottou, Bengio, Haffner 1998 Shakhnarovich, Viola, Darrell 2003 Rowley, Baluja, Kanade 1998 Berg, Berg, Malik 2005 -1 +1 -1 -1 Labels y = ? ? ? … … Training data: each image patch is labeled Test data Conditional Random Fields Support Vector Machines and Kernels as containing the object or not • Classification function Where belongs to some family of functions Guyon, Vapnik McCallum, Freitag, Pereira 2000 Heisele, Serre, Poggio, 2001 • Minimize misclassification error Kumar, Hebert 2003 … … (Not that simple: we need some guarantees that there will be generalization) Object categorization: Object categorization: Object categorization: Object categorization: the statistical viewpoint the statistical viewpoint the statistical viewpoint the statistical viewpoint ( | ) p zebra image p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ vs. ( | ) ( | ) ( ) p no zebra image p image no zebra p no zebra p ( no zebra|imag e ) posterior ratio likelihood ratio prior ratio § Bayes’s rule: § Discriminative methods model the posterior p ( zebra | image ) p ( image | zebra ) p ( zebra ) = ⋅ p ( no zebra | image ) p ( image | no zebra ) p ( no zebra ) § Generative methods model the likelihood and prior posterior ratio likelihood ratio prior ratio 3

  4. Discriminative Generative p ( image | zebra ) p ( image | no zebra ) § Model and p ( zebra | image ) § Direct modeling of p ( no zebra | image ) Decision Zebra boundary Non-zebra p ( image | zebra ) p ( image | no zebra ) Low Middle High Middle � Low Constructing models of image content Three main issues Three main issues Basic components: local features and spatial relations Textures Objects Scenes § Representation § How to represent an object category § Learning § How to form the classifier, given training data § Recognition § How the classifier is to be used on novel data 4

  5. Constructing models of image content Constructing models of image content Basic components: local features and spatial relations Basic components: local features and spatial relations Textures Objects Scenes Textures Objects Scenes Local model Local model Constructing models of image content Constructing models of image content Basic components: local features and spatial relations Basic components: local features and spatial relations Textures Objects Scenes Textures Objects Scenes Semi-local model Semi-local model Local model Local model 5

  6. Approach 2: Generative Methods Constructing models of image content using Bag of Words Models Basic components: local features and spatial relations Textures Objects Scenes (usually appearance) § An image is represented by a collection of “visual words” and their corresponding counts given a universal dictionary § Object categories are modeled by the distributions of these visual words § Although “bag of words” models can use both generative and discriminative approaches, here we will focus on Semi-local model Global model Local model generative models Analogy to documents Analogy to documents Object Bag of ‘words’ Object Bag of ‘words’ China is forecasting a trade surplus of $90bn Of all the sensory impressions proceeding to (£51bn) to $100bn this year, a threefold the brain, the visual experiences are the increase on 2004's $32bn. The Commerce dominant ones. Our perception of the world Ministry said the surplus would be created by a around us is based essentially on the predicted 30% jump in exports to $750bn, messages that reach the brain from our eyes. compared with a 18% rise in imports to For a long time it was thought that the retinal sensory, brain, China, trade, $660bn. The figures are likely to further annoy image was transmitted point by point to visual visual, perception, the US, which has long argued that China's surplus, commerce, centers in the brain; the cerebral cortex was a exports are unfairly helped by a deliberately movie screen, so to speak, upon which the retinal, cerebral cortex, exports, imports, US, undervalued yuan. Beijing agrees the surplus image in the eye was projected. Through the eye, cell, optical is too high, but says the yuan is only one yuan, bank, domestic, discoveries of Hubel and Wiesel we now know factor. Bank of China governor Zhou nerve, image foreign, increase, that behind the origin of the visual perception Xiaochuan said the country also needed to do in the brain there is a considerably more Hubel, Wiesel trade, value more to boost domestic demand so more complicated course of events. By following the goods stayed within the country. China visual impulses along their path to the various increased the value of the yuan against the cell layers of the optical cortex, Hubel and dollar by 2.1% in July and permitted it to trade Wiesel have been able to demonstrate that the within a narrow band, but the US wants the message about the image falling on the retina yuan to be allowed to trade freely. However, undergoes a step-wise analysis in a system of Beijing has made it clear that it will take its nerve cells stored in columns. In this system time and tread carefully before allowing the each cell has its specific function and is yuan to rise further in value. responsible for a specific detail in the pattern of the retinal image. 6

  7. learning learning recognition recognition codewords dictionary codewords dictionary feature detection & representation image representation category models category models category category (and/or) classifiers decision decision (and/or) classifiers Feature Detection Feature Detection 1. Feature Detection and Representation 1. Feature Detection and Representation § Sliding window § Leung et al., 1999 § Viola et al., 1999 § Renninger et al. 2002 7

  8. Feature Detection Feature Detection Feature Detection Feature Detection § Sliding window § Sliding window § Leung et al., 1999 § Leung et al., 1999 § Viola et al., 1999 § Viola et al., 1999 § Renninger et al., 2002 § Renninger et al., 2002 § Regular grid § Regular grid § Vogel et al., 2003 § Vogel et al., 2003 § Fei-Fei et al., 2005 § Fei-Fei et al., 2005 § Interest point detector § Csurka et al., 2004 § Fei-Fei et al., 2005 § Sivic et al., 2005 Feature Detection Feature Detection Feature Representation Feature Representation § Sliding window Visual words, aka textons, aka keypoints: § Leung et al., 1999 § Viola et al., 1999 K-means clustered pieces of the image § Renninger et al., 2002 § Regular grid § Vogel et al., 2003 § Various representations: § Fei-Fei et al., 2005 § Filter bank responses § Interest point detector § Image Patches § Csurka et al., 2004 § Fei-Fei et al., 2005 § SIFT descriptors § Sivic et al., 2005 All encode more-or-less the same thing … § Other methods § Random sampling (Ullman et al., 2002) § Segmentation based patches (Barnard et al., 2003 8

  9. Interest Point Features Interest Point Features Interest Point Features Interest Point Features … Compute Normalize SIFT patch descriptor [Lowe’99] Detect patches [Mikojaczyk and Schmid ’02] [Matas et al. ’02] [Sivic et al. ’03] Slide credit: Josef Sivic Patch Features Patch Features Dictionary Formation Dictionary Formation … … 9

  10. Clustered Image Patches Clustered Image Patches Clustering (usually k- Clustering (usually k -Means) Means) … Vector quantization Slide credit: Josef Sivic Fei-Fei et al. 2005 Image Patch Examples of Codewords Codewords Image Representation Image Representation Image Patch Examples of frequency ….. codewords Sivic et al. 2005 10

Recommend


More recommend