feature representation vision
play

Feature Representation Vision BoWs and Beyond Praveen Krishnan - PowerPoint PPT Presentation

Feature Representation Vision BoWs and Beyond Praveen Krishnan Feature Representation in Vision Low Level Local Detectors and Descriptors Bag of Words Mid Level Parts Attributes Hierarchical Deep


  1. Feature Representation – Vision BoWs and Beyond Praveen Krishnan

  2. Feature Representation in Vision  Low Level  Local Detectors and Descriptors  Bag of Words  Mid Level  Parts  Attributes  Hierarchical  Deep Representations

  3. Low Level Vision  Bag of Visual Words (BoWs)  Visual Vocabulary  Vector Quantization  Spatial Verification  Inspirations from IR  Advanced coding and pooling schemes  Soft quantization  Higher order representation

  4. Bag of Words (BoWs)

  5. A quick walk through..  BoWs Image Bag

  6. A quick walk through..  Origins in text processing Salton & McGill (1983) Slide: ICCV 2005 short course, L. Fei-Fei

  7. A quick walk through..  Origins in texture recognition Julesz, 1981 Histogram Universal texton dictionary

  8. A quick walk through..  BoWs Representation (i) Interest Point Detection (ii) Feature Extraction (iii) Vector Quantization Visual Vocabulary (iv) Coding and Pooling Figure Courtesy: Tsai‟12

  9. Devil is in the details  Local detectors & descriptors  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.

  10. Devil is in the details Assume dense sampling at  Local detectors & descriptors multiple scales  SIFT, HOG, LBP, …  Vocabulary  k-means, approximate k-means, GMM  Coding and Pooling  Histogram, kernel code books, sparse codes, LLC, Fisher kernels, Super Vectors, VLAD  Average, Max  Spatial Verification  Spatial pyramids, Min Hash, LLAH  Recognition & Retrieval  SVMs  Weighting schemes, query expansion, re-ranking etc.

  11. Feature Extraction  Detection  Regular  Fei-Fei et. al.‟ 05  Bosh et. al. „‟06  Sparse or Interest point  Mikolajczyk et. al ‟05 Descriptor  Csurka et al. 2004  Description  SIFT – Lowe‟99  MSER – Matas et.al. ‟02  HoG – Dalal et.al „05 Descriptor  many more…

  12. Visual Words/ Learning Visual Vocabulary Code Words  Partitioning the local descriptor space into informative regions.  Let be „N‟ Clustering SIFT descriptors from a subset of entire corpus. Visual vocabulary  k-means or Codebook  Minimize sum of squared Euclidean distances between points and their nearest cluster centers.  Here B is the codebook Image patch example Sivic et.al. ICCV‟05

  13. Visual vocabulary  Issues  Size of vocabulary?  Too small: visual words not representative of all patches.  Too large: quantization artifacts , over fitting  Generative or discriminative learning?  Gaussian mixture models. (More later)  Computational Efficiency  Approximate k-means using randomized kd-trees. Phibin et.al . CVPR‟07  Hierarchical K-Means. Nister et.al. CVPR‟07 Nister et.al. CVPR‟07

  14. Coding Codebook   Vector quantization  Assigns each feature to the nearest visual word in the vocabulary.  Hard quantization.

  15. Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max

  16. Pooling Codebook   Invariance to changes in position, lightning conditions.  Robustness to clutter  Compactness of representation  Types  Sum or average  Max There goes the geometry too 

  17. Spatial Pooling  Pyramid Match Kernel  Weighted sum of histogram intersections at multiple resolutions.  More weightage for matches found at fine level than Pyramid Match Kernel, Grauman et.al. ICCV‟05 coarse level.  Used for matching in high dimensional spaces.  Spatial Pyramid Matching  Concatenate the histogram vectors at all pyramid levels. Spatial Pyramid Matching, Lazebnik et. al. CVPR‟06

  18. Recognition & Retrieval  Recognition  Discriminative Methods  K-nearest neighbor  SVMs  Non-linear kernels  Generative Methods  Naïve Bayes.  Bayesian Models. (pLSA, LDA)  Ranking & Retrieval  Nearest neighbor search Agenda for this talk  Indexing

  19. Ranking & Retrieval  Similarity measures  Cosine distance L1  L1 distance  Chi-square distance Chi-Square  Hellinger distance Hellinger Applies discount to large values

  20. Ranking & Retrieval  Earth Mover‟s Distance (EMD)  Computes dissimilarity between distribution.  Let be the distribution with „m‟ elements and be the distribution with „n‟ elements. The flow F that minimizes the overall cost is given as:- Distance between element s i and q j Transportation problem

  21. Ranking & Retrieval Cats and Dogs Database  Evaluation measures  Notations: TP-True positives; FP-False positives; TN-True negatives; FN-False negatives  Precision (P): Query  Recall (R):  F-measure:  Mean Average Precision (mAP) TP TP FP TP FP  Area under precision and recall curve TN FN

  22. Re-ranking using Geometric Verification  Use the position and shape of the underlying features to improve retrieval quality. Both images have many matches – which is correct?  Estimate geometric transformation to remove outliers  Approaches:  RANSAC  Hough Transform Slide Credit: Cordelia Schmid

  23. Re-ranking using Geometric Verification  Fitting an affine transformation  Assume we know the correspondences, how do we get the transformation? Slide Credit: Cordelia Schmid

  24. Slide Credit: Kristen Grauman Re-ranking using Geometric Verification E.g. Fitting a Line  RANSAC (Fischler & Bolles, 1981): Randomly select  Randomly select a seed group of matches minimal subset of  Compute transformation from seed group points  Find inliers to this transformation  If the number of inliers is sufficiently large, re-compute least-squares estimate of Hypothesize a model transformation on all of the inliers  Keep the transformation with the largest number of inliers Repeat hypothesize ‐ and Select points consistent with Compute error verify loop model function

  25. Inspirations from IR  Making faster  Inverted indexing  Reverse look up  Enables fast search by exploiting the sparse representation. #Images #Visual Words Image Courtesy : Jawahar et. al DAS‟14

  26. Inspirations from IR  Weighting schemes  Zipf‟s Law: Frequency of any word is inversely proportional to its rank.  TF-IDF Weighting:  Stop Words: T op 5% of frequent visual words. Image Courtesy Wikipedia

  27. Inspirations from IR  Improving the recall  Query expansion: Reformulating the query to increase its expressiveness. E.g. adding synonyms, jittering etc. Results … Spatial verification Query image New results Repeat New query Chum et.al., Total Recall, ICCV’07

  28. Inspirations from IR  Query Expansion:  Baseline  Transitive closure expansion  Use of priority queue.  Average query expansion  Recursive average query expansion  Multiple image resolution expansion  Compute the median image resolution  Formulate query for other resolution bands. (0, 4/5) - (2/3, 3/2) - (5/4, infinity)  Do average query expansion for each band. Chum et.al., Total Recall, ICCV’07

  29. Advanced coding schemes

  30. Lost in Quantization  Hard quantization (VQ)  Issues  codeword uncertainty  codeword plausibility.

  31. Modeling Uncertainty  Kernel code books  Allowing a degree of ambiguity in assigning code words from image features.  Uses kernel density estimation  Kernel size determine the amount of smoothing  Kernel shape is related to distance function  Kernel Codebook Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

  32. Modeling Uncertainty  Code word uncertainty  Code word plausibility Gemert et al, Kernel Codebooks for Scene Categorization, ECCV2008

  33. Encoding – Sparse Coding  From VQ to SC Too restrictive. Relax by L1 norm !  Max Pooling Yang, e.t.al, CVPR‟09

  34. Encoding – Sparse Coding  Sparse Coding  Fix V and solve for U [LASSO]  Fix U and solve for V [Least Square]  Linear Classification using SPM kernel Yang, e.t.al, CVPR‟09

  35. Encoding  SC results tends to be local.  Locality more essential than sparsity ?  Local coordinate coding (LCC)  Locality-constrained Linear Coding (LLC)  Dropping the sparsity term and evoking the locality term explicitly  Here denotes element wise multiplication and d i is the locality adaptor that gives different weights to different basis vector as per similarity . Wang et. al., CVPR‟10

  36. Encoding - LLC  Comparison with VQ and SC  Better reconstruction  Local smooth sparsity  Analytical solution Wang et. al., CVPR‟10

  37. Interpretation so far…  Discover sub spaces  Geometry of data manifold • Each basis an “anchor • Each basis is a “direction” point” • Sparsity: each datum is a • Sparsity is induced by linear combination of only locality: each datum is a several bases. linear combination of • Related to topic model neighbor anchors. Slide Credit: Kai Yu

Recommend


More recommend