introduction to visual search and recognition
play

Introduction to Visual Search and Recognition Visual Search - PDF document

Perceptual and Sensory Augmented Computing Introduction to Visual Search and Recognition Visual Search Tutorial Global representations: limitations Success may rely on alignment -> sensitive to viewpoint Perceptual and Sensory Augmented


  1. Perceptual and Sensory Augmented Computing Introduction to Visual Search and Recognition Visual Search Tutorial Global representations: limitations • Success may rely on alignment -> sensitive to viewpoint Perceptual and Sensory Augmented Computing • All parts of the image or window impact the description -> sensitive to occlusion, clutter Visual Search Tutorial 2

  2. Local representations • Describe component regions or patches separately. • Many options for detection & description… Perceptual and Sensory Augmented Computing Maximally Stable Extremal Regions Shape context Superpixels [Matas 02] SIFT [Lowe 99] [Belongie 02] [Ren et al.] Visual Search Tutorial Salient regions Harris-Affine Spin images Geometric Blur [Kadir 01] [Mikolajczyk 04] [Johnson 99] [Berg 05] 3 Recall: Invariant local features Subset of local feature types designed to be invariant to y 1 Perceptual and Sensory Augmented Computing y 2 Scale  … Translation  y d Rotation  Affine transformations  Illumination  x 1 x 2 1) Detect interest points Visual Search Tutorial … 2) Extract descriptors x d [Mikolaj czyk01, Matas02, Tuytelaars04, Lowe99, Kadir01,… ]

  3. Recognition with local feature sets • Previously, we saw how to use local invariant features + a global Perceptual and Sensory Augmented Computing spatial model to recognize specific objects, using a planar object assumption. • Now, we’ll use local features for  Indexing-based recognition  Bags of words representations Visual Search Tutorial  Correspondence / matching kernels 5 Basic flow … … Index each one into pool of descriptors from … Perceptual and Sensory Augmented Computing previously seen images Describe Detect or sample features features Visual Search Tutorial List of positions, Associated list of scales, d-dimensional orientations descriptors 6

  4. Indexing local features • Each patch / region has a descriptor, which is a point in some high-dimensional feature space (e.g., SIFT) Perceptual and Sensory Augmented Computing Visual Search Tutorial Indexing local features • When we see close points in feature space, we have similar descriptors, which indicates similar local Perceptual and Sensory Augmented Computing content. Visual Search Tutorial Figure credit: A. Zisserman

  5. Indexing local features • We saw in the previous section how to use voting and pose clustering to identify objects using local features Perceptual and Sensory Augmented Computing Visual Search Tutorial Figure credit: David Lowe 9 Indexing local features • With potentially thousands of features per image, and hundreds to millions of images to search, how to Perceptual and Sensory Augmented Computing efficiently find those that are relevant to a new image?  Low-dimensional descriptors : can use standard efficient data structures for nearest neighbor search Visual Search Tutorial  High-dimensional descriptors: approximate nearest neighbor search methods more practical  Inverted file indexing schemes 10

  6. Indexing local features: approximate nearest neighbor search Best-Bin First (BBF), a variant of k-d Perceptual and Sensory Augmented Computing trees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997] Visual Search Tutorial Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998] 11 Indexing local features: inverted file index • For text documents, an efficient way to Perceptual and Sensory Augmented Computing find all pages on which a word occurs is to use an index… • We want to find all images in which a feature occurs. Visual Search Tutorial • To use this idea, we’ll need to map our features to “visual words”. 12 K. Grauman, B. Leibe

  7. Visual words: main idea • Extract some local features from a number of images … Perceptual and Sensory Augmented Computing Visual Search Tutorial e.g., S IFT descriptor space: each point is 128-dimensional 13 S lide credit: D. Nister Visual words: main idea Perceptual and Sensory Augmented Computing Visual Search Tutorial 14 S lide credit: D. Nister

  8. Visual words: main idea Perceptual and Sensory Augmented Computing Visual Search Tutorial 15 S lide credit: D. Nister Visual words: main idea Perceptual and Sensory Augmented Computing Visual Search Tutorial 16 S lide credit: D. Nister

  9. Visual Search Tutorial Perceptual and Sensory Augmented Computing Perceptual and Sensory Augmented Computing Visual Search Tutorial S S lide credit: D. Nister lide credit: D. Nister 18 17

  10. Visual words: main idea Map high-dimensional descriptors to tokens/words by quantizing the feature space Perceptual and Sensory Augmented Computing • Quantize via clustering, let cluster centers be the prototype “ words” Visual Search Tutorial Descriptor space 19 Visual words: main idea Map high-dimensional descriptors to tokens/words by quantizing the feature space Perceptual and Sensory Augmented Computing • Determine which word to assign to each new image region by finding the closest cluster center. Visual Search Tutorial Descriptor space 20

  11. Visual words • Example: each group of patches Perceptual and Sensory Augmented Computing belongs to the same visual word Visual Search Tutorial Figure from S ivic & Zisserman, ICCV 2003 21 Visual words • First explored for texture and material Perceptual and Sensory Augmented Computing representations • Texton = cluster center of filter responses over collection of images • Describe textures and materials based on Visual Search Tutorial distribution of prototypical texture elements. Leung & Malik 1999; Varma & Zisserman, 2002; Lazebnik, S chmid & Ponce, 2003;

  12. Visual words • More recently used for describing scenes and Perceptual and Sensory Augmented Computing objects for the sake of indexing or classification. Visual Search Tutorial S ivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others. 23 K. Grauman, B. Leibe Inverted file index for images comprised of visual words Word List of image number numbers Perceptual and Sensory Augmented Computing Visual Search Tutorial Image credit: A. Zisserman K. Grauman, B. Leibe

  13. Bags of visual words • Summarize entire image based on its distribution Perceptual and Sensory Augmented Computing (histogram) of word occurrences. • Analogous to bag of words representation commonly used for documents. Visual Search Tutorial 25 Image credit: Fei-Fei Li Video Google System Query region 1. Collect all words within query region Perceptual and Sensory Augmented Computing 2. Inverted file index to find relevant frames 3. Compare word counts 4. Spatial verification Retrieved frames Sivic & Zisserman, ICCV 2003 Visual Search Tutorial • Demo online at : http://www.robots.ox.ac.uk/~vgg/ research/vgoogle/index.html 26

  14. Basic flow … … Index each one into pool of descriptors from Perceptual and Sensory Augmented Computing … previously seen images or … Quantize to form Describe Detect or sample bag of words vector features features for the image Visual Search Tutorial List of positions, Associated list of scales, d-dimensional orientations descriptors 27 Visual vocabulary formation Issues: • Sampling strategy Perceptual and Sensory Augmented Computing • Clustering / quantization algorithm • Unsupervised vs. supervised • What corpus provides features (universal vocabulary?) • Vocabulary size, number of words Visual Search Tutorial 28

  15. Sampling strategies Perceptual and Sensory Augmented Computing S parse, at Dense, uniformly Randomly interest points • To find specific, textured obj ects, sparse sampling from interest points often more reliable. Visual Search Tutorial • Multiple complementary interest operators offer more image coverage. • For obj ect categorization, dense sampling offers better coverage. Multiple interest operators [S ee Nowak, Jurie & Triggs, ECCV 2006] 29 Image credits: F-F . Li, E. Nowak, J. S ivic Clustering / quantization methods • k-means (typical choice), agglomerative clustering, mean-shift,… Perceptual and Sensory Augmented Computing • Hierarchical clustering: allows faster insertion / word assignment while still allowing large vocabularies  Vocabulary tree [Nister & Stewenius, CVPR 2006] Visual Search Tutorial 30

  16. Example: Recognition with Vocabulary Tree • Tree construction: Perceptual and Sensory Augmented Computing Visual Search Tutorial [Nister & S tewenius, CVPR’ 06] 31 S lide credit: David Nister Vocabulary Tree • Training: Filling the tree Perceptual and Sensory Augmented Computing Visual Search Tutorial [Nister & S tewenius, CVPR’ 06] 32 S lide credit: David Nister

  17. Vocabulary Tree • Training: Filling the tree Perceptual and Sensory Augmented Computing Visual Search Tutorial [Nister & S tewenius, CVPR’ 06] 33 S lide credit: David Nister Vocabulary Tree • Training: Filling the tree Perceptual and Sensory Augmented Computing Visual Search Tutorial [Nister & S tewenius, CVPR’ 06] 34 S lide credit: David Nister

Recommend


More recommend