Visual Categorization With Bags Basic Problem Addressed of Keypoints. ECCV, 2004 . � Find a method for Generic Visual G. Csurka, C. Bray, C. Dance, and L. Fan. Categorization � Visual Categorization: Identifying whether objects of one or more types Shilpa Gulati are present in an image. 2/15/2007 � Generic: Method generalizes to new object types. Invariant to scale, rotation, affine transformation, lighting changes, occlusion, intra-class variations etc. 1 2 Main Idea The Approach I: Training � Extract interest points from a dataset � Applying the bag-of-keywords of training images and attach approach for text categorization to descriptors to them. visual categorization. � Cluster the keypoints and construct a � Constructing vocabulary of feature set of vocabularies (Why a set? Next vectors from clustered descriptors of slide). images. � Train a multi-class qualifier using bags-of-keypoints around the cluster centers. 3 4 Why a set of vocabularies? The Approach II: Testing � The approach is motivated by text � Given a new image, get its keypoint categorization (spam filtering for descriptors. example). � Label each keypoint with its closest � For text, the keywords have a clear meaning cluster center in feature space. (Lottery! Deal! Affine Invariance). Hence � Categorize the objects using the finding a vocabulary is easy. multi-class classifier learnt earlier: � For images, keypoints don’t necessarily have repeatable meanings. � Naïve Bayes � Hence find a set, then experiment and find � Support Vector Machines (SVMs) the best vocabulary and classifier . 5 6 1
Feature Extraction and Description Visual Vocabulary Construction � From a database of images: � Use a k -means clustering algorithm to Vocabulary is Extract interest points using Harris affine � form a set of clusters V = { V 1, V 2 .. ,V m } detector . of feature vectors. � It was shown in Mikolajczyk and Schmid (2002) � The feature vectors Construct multiple that scale invariant interest point detectors are associated with the not sufficient to handle affine transformations. vocabularies. cluster centers ( V 1 ..V m ) form a Attach SI FT descriptors to the interest points. � vocabulary. A SIFT description is 128 dimension vector. � Find multiple sets of � SIFT descriptors were found to be best for clusters using different matching in Mikolajczyk and Schmid (2003). values of k. V m V 1 V 2 7 8 Slide inspired by [ 3 ] Categorization by Naïve Bayes I: Clustering Example Training Extract keypoint � F descriptors from a Minimum set of labeled Image of distance from V 2 category C i images. Put the descriptor � in the cluster or n i1 n i2 n im “bag” with minimum distance V 1 V 2 V m from cluster center. n ij is the total number of times Count the number � All features Clusters a feature “near” V j occurs in of keypoints in each training images of category i bag. If a feature in image I is nearest to cluster center Slide inspired 9 10 Image taken from [2] Vj, we say that keypoint j has occurred in image I by [ 3 ] Categorization by Naïve Bayes III: Categorization by Naïve Bayes II: Testing Training � For each category C i , � P ( C i | Image ) = β P( C i )P( Image | C i ) P( C i ) = Number of images of category C i / � Total number of images = β P( C i )P( V 0 , V 1 ,.. , V m | C i ) � In all images I of category C i , For each keypoint V j � m ∏ = β P( C i ) ( | ) � P ( V j | C i ) = Number of keypoints V j in I / P V C i i Total number of keypoints in I = i 0 = n ij / n i � But use Laplace smoothing to avoid numbers near zero. P ( Vj | C i ) = ( n ij + 1) / ( n i + | V |) 11 12 Slide inspired by [ 3 ] Slide inspired by [ 3 ] 2
SVM: Brief Introduction Categorization by SVM I: Training � The classifying function is � SVM classifier finds a hyperplane that separates two-class data with � f ( x ) = sign ( ∑ i y i β i K ( x, x i ) + b ) � x i is a feature vector from the training maximum margin. Two class dataset with images, y i is the label for x i (yes, in linearly separable classes. category C i , or no not in C i ), β i and b support vectors have to be learnt. Maximum margin hyperplane � Data is not always linearly separable give greatest separation (Non linear SVM) between classes. � A function Φ maps original data space to The data instances closest to higher dimensional space. maximum margin the hyperplane are called hyperplane. � K ( x, x i ) = Φ ( x ) . Φ ( x i ) support vectors. Equation f(x) is the target (classifying function) 13 14 Categorization by SVM II: Training Categorization by SVM III: Testing � For an image of category C i , x i is a � Given a query image, assign it to the vector formed by the number of category with the highest SVM occurrences of keypoints V in the output. image. � The parameters are sometimes learnt using Sequential Quadratic Program m ing . The approach used in the paper is not mentioned. � For the m class problem, the authors train m SVMs, each to distinguish some category C i from the other m -1. 15 16 Experiments Performance Metrics � Two databases � Confusion Matrix, M � m ij = Number of images from category j � DB1: In-house. 1779 images. identified by the classifier as category i . � 7 object classes: faces, buildings, trees, cars, phones, bikes. � Overall Error Rate, R � Some images contain objects from multiple � Accuracy = Total number of correctly classified classes. But large proprtion of image is test images/Total number of test images occupied by target image. � R = 1 – Accuracy � DB2: Freely available from various sites. � Mean Rank, MR About 3500 images. MR for category j = E [rank of class j in � � 5 object classes: faces, airplanes, cars classified output | true class is j ] (rear), cars(side) and motorbikes(side). 17 18 3
Finding Value of k Naïve Bayes Results for DB1 � Error rate True � Faces Buildings Trees Phones Cars Bikes Books 4 2 2 4 3 9 decreases with Faces 7 5 2 5 0 5 3 3 Buildings 4 2 increasing k. 2 2 0 0 5 0 Trees 8 0 selected � Decrease is low 4 0 0 3 0 3 Phones 7 6 operating after k >1000. 8 1 15 Cars 1 5 6 7 1 3 1 3 point 2 11 0 9 0 Bikes 1 4 7 3 � Choose k = 1000. 4 0 5 7 1 Books 1 9 6 9 Good tradeoff � 1.49 1.88 1.33 1.33 1.63 1.57 1.57 Mean between accuracy rank and speed. Confusion Matrix for Naïve Bayes on DB1 Overall error rate = 28% Graph of error rate vs. k for Naïve Bayes for DB1 19 20 Graph is taken from [ 2 ] Table taken from [2] SVM Results SVM Results Results for DB1 � Linear SVM gives best results out of True � Faces Buildings Trees Cars Phones Bikes Books 0 13 Faces 9 8 1 4 1 0 1 0 3 4 linear, quadratic and cubic, except for 1 3 0 3 1 6 Buildings 6 3 cars. Quadratic gives best results on 1 10 1 0 6 0 Trees 8 1 0 1 1 5 0 5 cars. Cars 8 5 0 5 4 3 2 3 Phones 5 5 � How do we know these will work for 0 4 1 0 1 0 Bikes 9 1 other categories? What if we have to use 0 3 0 1 2 0 Books 7 3 higher degrees? Only time and more 1.04 1.77 1.28 1.30 1.83 1.09 1.39 Mean rank experiments will tell. Confusion Matrix for SVM on DB1 Error rate for faces = 2%. But increased rate of confusion with other categories due to larger number of faces in the training set. 21 22 Overall error rate = 15% Multiple Object Instances: Correctly Partially Visible Objects: Correctly Classified Classified 23 24 Images taken from [2] Images taken from [2] 4
Images with Multi-Category Conclusions Objects � Good results for 7 category database. � However time information (for training and testing) not provided! � SVMs superior to Naïve Bayes. � Robust to background clutter. � Extension is to test on databases where the target object does NOT form a large fraction of the image. � May need to include geometric information. 25 26 Images taken from [2] References SVM Results Results on DB2 1. G. Csurka, C. Bray, C. Dance, and L. Fan. Faces Airplanes Cars Cars Motorbikes Visual categorization with bags of ( frontal) ( side) ( rear) ( side) ( side) keypoints . In Workshop on Statistical Learning 0.4 0.7 0 1.4 Faces ( frontal) 9 4 in Computer Vision, ECCV, 2004. 1.5 0.2 0.1 2.7 Airplanes ( side) 9 6 .3 1.9 0.5 0 0.9 Cars ( rear) 9 7 .7 2. Gabriela Csurka, Jutta Willamowski, 1.7 1.9 0.5 Cars ( side) 9 9 .6 2 .3 Christopher Dance. Xerox Research Centre 0.9 1.9 0.9 Motorbikes ( side) 0 .3 9 2 .7 Europe, Grenoble, France. Weak Geometry for 1.07 1.04 1.03 1.01 1.09 Mean rank Visual Categorization . Presentation Slides. Confusion Matrix for SVM on DB2 3. R. Mooney. Computer Science Department, University of Texas at Austin. CS 391L: Machine Learning - Text Categorization. Lecture Slides. 27 28 5
Recommend
More recommend