Feature Representation – Vision BoWs and Beyond Praveen Krishnan
Feature Representation in Vision  Low Level  Local Detectors and Descriptors  Bag of Words  Mid Level  Parts  Attributes  Hierarchical  Deep Representations
Low Level Vision  Bag of Visual Words (BoWs)  Visual Vocabulary  Vector Quantization  Spatial Verification  Inspirations from IR  Advanced coding and pooling schemes  Soft quantization  Higher order representation
Bag of Words (BoWs)
A quick walk through..  BoWs Image Bag
A quick walk through..  Origins in text processing Salton & McGill (1983) Slide: ICCV 2005 short course, L. Fei-Fei
A quick walk through..  Origins in texture recognition Julesz, 1981 Histogram Universal texton dictionary
A quick walk through..  BoWs Representation (i) Interest Point Detection (ii) Feature Extraction (iii) Vector Quantization Visual Vocabulary (iv) Coding and Pooling Figure Courtesy: Tsai‟12
Devil is in the details  Local detectors & descriptors  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.
Devil is in the details Assume dense sampling at  Local detectors & descriptors multiple scales  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.
Feature Extraction  Detection  Regular  Fei-Fei et. al.‟ 05  Bosh et. al. „‟06  Sparse or Interest point  Mikolajczyk et. al ‟05 Descriptor  Csurka et al. 2004  Description  SIFT – Lowe‟99  MSER – Matas et.al. ‟02  HoG – Dalal et.al „05 Descriptor  many more…
Visual Words/ Learning Visual Vocabulary Code Words  Partitioning the local descriptor space into informative regions.  Let be „N‟ Clustering SIFT descriptors from a subset of entire corpus. Visual vocabulary  k-means or Codebook  Minimize sum of squared Euclidean distances between points and their nearest cluster centers.  Here B is the codebook Image patch example Sivic et.al. ICCV‟05
Visual vocabulary  Issues  Size of vocabulary?  Too small: visual words not representative of all patches.  Too large: quantization artifacts , over fitting  Generative or discriminative learning?  Gaussian mixture models. (More later)  Computational Efficiency  Approximate k-means using randomized kd-trees. Phibin et.al . CVPR‟07  Hierarchical K-Means. Nister et.al. CVPR‟07 Nister et.al. CVPR‟07
Coding Codebook   Vector quantization  Assigns each feature to the nearest visual word in the vocabulary.  Hard quantization.
Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max
Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max There goes the geometry too 
Spatial Pooling  Pyramid Match Kernel  Weighted sum of histogram intersections at multiple resolutions.  More weightage for matches found at fine level than Pyramid Match Kernel, Grauman et.al. ICCV‟05 coarse level.  Used for matching in high dimensional spaces.  Spatial Pyramid Matching  Concatenate the histogram vectors at all pyramid levels. Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06
Recognition & Retrieval  Recognition  Discriminative Methods  K-nearest neighbor  SVMs  Non-linear kernels  Generative Methods  Naïve Bayes.  Bayesian Models. (pLSA, LDA)  Ranking & Retrieval  Nearest neighbor search Agenda for this talk  Indexing
Ranking & Retrieval  Similarity measures  Cosine distance L1  L1 distance  Chi-square distance Chi-Square  Hellinger distance Hellinger Applies discount to large values
Ranking & Retrieval  Earth Mover‟s Distance (EMD)  Computes dissimilarity between distribution.  Let be the distribution with „m‟ elements and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:- Distance between element s i and q j Transportation problem
Ranking & Retrieval Cats and Dogs Database  Evaluation measures  Notations: TP-True positives; FP-False positives; TN-True negatives; FN-False negatives  Precision (P): Query  Recall (R):  F-measure:  Mean Average Precision (mAP) TP TP FP TP FP  Area under precision and recall curve TN FN
Re-ranking using Geometric Verification  Use the position and shape of the underlying features to improve retrieval quality. Both images have many matches – which is correct?  Estimate geometric transformation to remove outliers  Approaches:  RANSAC  Hough Transform Slide Credit: Cordelia Schmid
Re-ranking using Geometric Verification  Fitting an affine transformation  Assume we know the correspondences, how do we get the transformation? Slide Credit: Cordelia Schmid
Slide Credit: Kristen Grauman Re-ranking using Geometric Verification E.g. Fitting a Line  RANSAC (Fischler & Bolles, 1981): Randomly select  Randomly select a seed group of matches minimal subset of  Compute transformation from seed group points  Find inliers to this transformation  If the number of inliers is sufficiently large, re-compute least-squares estimate of Hypothesize a model transformation on all of the inliers  Keep the transformation with the largest number of inliers Repeat hypothesize ‐ and Select points consistent with Compute error verify loop model function
Inspirations from IR  Making faster  Inverted indexing  Reverse look up  Enables fast search by exploiting the sparse representation. #Images #Visual Words Image Courtesy : Jawahar et. al DAS‟14
Inspirations from IR  Weighting schemes  Zipf‟s Law: Frequency of any word is inversely proportional to its rank.  TF-IDF Weighting:  Stop Words: T op 5% of frequent visual words. Image Courtesy Wikipedia
Inspirations from IR  Improving the recall  Query expansion: Reformulating the query to increase its expressiveness. E.g. adding synonyms, jittering etc. Results … Spatial verification Query image New results Repeat New query Chum et.al., Total Recall, ICCV’07
Inspirations from IR  Query Expansion:  Baseline  Transitive closure expansion  Use of priority queue.  Average query expansion  Recursive average query expansion  Multiple image resolution expansion  Compute the median image resolution  Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4, infinity)  Do average query expansion for each band. Chum et.al., Total Recall, ICCV’07
Advanced coding schemes
Lost in Quantization  Hard quantization (VQ)  Issues  codeword uncertainty  codeword plausibility.
Modeling Uncertainty  Kernel code books  Allowing a degree of ambiguity in assigning code words from image features.  Uses kernel density estimation  Kernel size determine the amount of smoothing  Kernel shape is related to distance function  Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Modeling Uncertainty  Code word uncertainty  Code word plausibility Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Encoding – Sparse Coding  From VQ to SC Too restrictive. Relax by L1 norm !  Max Pooling Yang, e.t.al, CVPR‟09
Encoding – Sparse Coding  Sparse Coding  Fix V and solve for U [LASSO]  Fix U and solve for V [Least Square]  Linear Classification using SPM kernel Yang, e.t.al, CVPR‟09
Encoding  SC results tends to be local.  Locality more essential than sparsity ?  Local coordinate coding (LCC)  Locality-constrained Linear Coding (LLC)  Dropping the sparsity term and evoking the locality term explicitly  Here denotes element wise multiplication and d i is the locality adaptor that gives different weights to different basis vector as per similarity . Wang et. al., CVPR‟10
Encoding - LLC  Comparison with VQ and SC  Better reconstruction  Local smooth sparsity  Analytical solution Wang et. al., CVPR‟10
Interpretation so far…  Discover sub spaces  Geometry of data manifold • Each basis an “anchor • Each basis is a “direction” point” • Sparsity: each datum is a • Sparsity is induced by linear combination of only locality: each datum is a several bases. linear combination of • Related to topic model neighbor anchors. Slide Credit: Kai Yu
Recommend
More recommend