Feature Representation – Vision BoWs and Beyond Praveen Krishnan
Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep Representations
Low Level Vision Bag of Visual Words (BoWs) Visual Vocabulary Vector Quantization Spatial Verification Inspirations from IR Advanced coding and pooling schemes Soft quantization Higher order representation
Bag of Words (BoWs)
A quick walk through.. BoWs Image Bag
A quick walk through.. Origins in text processing Salton & McGill (1983) Slide: ICCV 2005 short course, L. Fei-Fei
A quick walk through.. Origins in texture recognition Julesz, 1981 Histogram Universal texton dictionary
A quick walk through.. BoWs Representation (i) Interest Point Detection (ii) Feature Extraction (iii) Vector Quantization Visual Vocabulary (iv) Coding and Pooling Figure Courtesy: Tsai‟12
Devil is in the details Local detectors & descriptors SIFT, HOG, LBP, … Vocabulary k-means, approximate k-means, GMM Coding and Pooling Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD Average, Max Spatial Verification Spatial pyramids, Min Hash, LLAH Recognition & Retrieval SVMs Weighting schemes, query expansion, re-ranking etc.
Devil is in the details Assume dense sampling at Local detectors & descriptors multiple scales SIFT, HOG, LBP, … Vocabulary k-means, approximate k-means, GMM Coding and Pooling Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD Average, Max Spatial Verification Spatial pyramids, Min Hash, LLAH Recognition & Retrieval SVMs Weighting schemes, query expansion, re-ranking etc.
Feature Extraction Detection Regular Fei-Fei et. al.‟ 05 Bosh et. al. „‟06 Sparse or Interest point Mikolajczyk et. al ‟05 Descriptor Csurka et al. 2004 Description SIFT – Lowe‟99 MSER – Matas et.al. ‟02 HoG – Dalal et.al „05 Descriptor many more…
Visual Words/ Learning Visual Vocabulary Code Words Partitioning the local descriptor space into informative regions. Let be „N‟ Clustering SIFT descriptors from a subset of entire corpus. Visual vocabulary k-means or Codebook Minimize sum of squared Euclidean distances between points and their nearest cluster centers. Here B is the codebook Image patch example Sivic et.al. ICCV‟05
Visual vocabulary Issues Size of vocabulary? Too small: visual words not representative of all patches. Too large: quantization artifacts , over fitting Generative or discriminative learning? Gaussian mixture models. (More later) Computational Efficiency Approximate k-means using randomized kd-trees. Phibin et.al . CVPR‟07 Hierarchical K-Means. Nister et.al. CVPR‟07 Nister et.al. CVPR‟07
Coding Codebook Vector quantization Assigns each feature to the nearest visual word in the vocabulary. Hard quantization.
Pooling Codebook Invariance to changes in position, lightning conditions. Robustness to clutter Compactness of representation Types Sum or average Max
Pooling Codebook Invariance to changes in position, lightning conditions. Robustness to clutter Compactness of representation Types Sum or average Max There goes the geometry too
Spatial Pooling Pyramid Match Kernel Weighted sum of histogram intersections at multiple resolutions. More weightage for matches found at fine level than Pyramid Match Kernel, Grauman et.al. ICCV‟05 coarse level. Used for matching in high dimensional spaces. Spatial Pyramid Matching Concatenate the histogram vectors at all pyramid levels. Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06
Recognition & Retrieval Recognition Discriminative Methods K-nearest neighbor SVMs Non-linear kernels Generative Methods Naïve Bayes. Bayesian Models. (pLSA, LDA) Ranking & Retrieval Nearest neighbor search Agenda for this talk Indexing
Ranking & Retrieval Similarity measures Cosine distance L1 L1 distance Chi-square distance Chi-Square Hellinger distance Hellinger Applies discount to large values
Ranking & Retrieval Earth Mover‟s Distance (EMD) Computes dissimilarity between distribution. Let be the distribution with „m‟ elements and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:- Distance between element s i and q j Transportation problem
Ranking & Retrieval Cats and Dogs Database Evaluation measures Notations: TP-True positives; FP-False positives; TN-True negatives; FN-False negatives Precision (P): Query Recall (R): F-measure: Mean Average Precision (mAP) TP TP FP TP FP Area under precision and recall curve TN FN
Re-ranking using Geometric Verification Use the position and shape of the underlying features to improve retrieval quality. Both images have many matches – which is correct? Estimate geometric transformation to remove outliers Approaches: RANSAC Hough Transform Slide Credit: Cordelia Schmid
Re-ranking using Geometric Verification Fitting an affine transformation Assume we know the correspondences, how do we get the transformation? Slide Credit: Cordelia Schmid
Slide Credit: Kristen Grauman Re-ranking using Geometric Verification E.g. Fitting a Line RANSAC (Fischler & Bolles, 1981): Randomly select Randomly select a seed group of matches minimal subset of Compute transformation from seed group points Find inliers to this transformation If the number of inliers is sufficiently large, re-compute least-squares estimate of Hypothesize a model transformation on all of the inliers Keep the transformation with the largest number of inliers Repeat hypothesize ‐ and Select points consistent with Compute error verify loop model function
Inspirations from IR Making faster Inverted indexing Reverse look up Enables fast search by exploiting the sparse representation. #Images #Visual Words Image Courtesy : Jawahar et. al DAS‟14
Inspirations from IR Weighting schemes Zipf‟s Law: Frequency of any word is inversely proportional to its rank. TF-IDF Weighting: Stop Words: T op 5% of frequent visual words. Image Courtesy Wikipedia
Inspirations from IR Improving the recall Query expansion: Reformulating the query to increase its expressiveness. E.g. adding synonyms, jittering etc. Results … Spatial verification Query image New results Repeat New query Chum et.al., Total Recall, ICCV’07
Inspirations from IR Query Expansion: Baseline Transitive closure expansion Use of priority queue. Average query expansion Recursive average query expansion Multiple image resolution expansion Compute the median image resolution Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4, infinity) Do average query expansion for each band. Chum et.al., Total Recall, ICCV’07
Advanced coding schemes
Lost in Quantization Hard quantization (VQ) Issues codeword uncertainty codeword plausibility.
Modeling Uncertainty Kernel code books Allowing a degree of ambiguity in assigning code words from image features. Uses kernel density estimation Kernel size determine the amount of smoothing Kernel shape is related to distance function Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Modeling Uncertainty Code word uncertainty Code word plausibility Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008
Encoding – Sparse Coding From VQ to SC Too restrictive. Relax by L1 norm ! Max Pooling Yang, e.t.al, CVPR‟09
Encoding – Sparse Coding Sparse Coding Fix V and solve for U [LASSO] Fix U and solve for V [Least Square] Linear Classification using SPM kernel Yang, e.t.al, CVPR‟09
Encoding SC results tends to be local. Locality more essential than sparsity ? Local coordinate coding (LCC) Locality-constrained Linear Coding (LLC) Dropping the sparsity term and evoking the locality term explicitly Here denotes element wise multiplication and d i is the locality adaptor that gives different weights to different basis vector as per similarity . Wang et. al., CVPR‟10
Encoding - LLC Comparison with VQ and SC Better reconstruction Local smooth sparsity Analytical solution Wang et. al., CVPR‟10
Interpretation so far… Discover sub spaces Geometry of data manifold • Each basis an “anchor • Each basis is a “direction” point” • Sparsity: each datum is a • Sparsity is induced by linear combination of only locality: each datum is a several bases. linear combination of • Related to topic model neighbor anchors. Slide Credit: Kai Yu
Recommend
More recommend