recent image retrieval techniques
play

Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course - PowerPoint PPT Presentation

WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course URL: URL http://sglab.kaist.ac.kr/~sungeui/IR Go over some of recent


  1. WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval Recent Image Retrieval Techniques Sung-Eui Yoon ( 윤성의 ) ( 윤성의 ) C Course URL: URL http://sglab.kaist.ac.kr/~sungeui/IR

  2. ● Go over some of recent image retrieval Go over some of recent image retrieval techniques Today Today 2

  3. Video Google: A Text Retrieval G Approach to Object Matching pp j g in Videos Josef Sivic and Andrew Zisserman Josef Sivic and Andrew Zisserman Robotics Research Group, Department of Engineering Science University of Oxford United Kingdom University of Oxford, United Kingdom ICCV 03 ICCV 03 Citation: over 1300 at 2011 3

  4. Motivations Motivations ● Retrieve key frames and shots of a video Retrieve key frames and shots of a video containing a particular object ● I nvestigate whether a text retrieval approach can be successful for object approach can be successful for object recognition 4

  5. Viewpoint Invariant Description Viewpoint Invariant Description ● Find viewpoint covariant regions Fi d i i t i t i ● Produce elliptical affine invariant regions (e.g., Shape Adapted (SA) and Maximally Stable(MS)) Adapted (SA) and Maximally Stable(MS)) ● SA regions centered on corner like features ● MS regions correspond to high contrast with respect to g p g p their surroundings (dark window, gray wall…) ● Compute a SI FT descriptor for each region p p g 5

  6. MSER (Maximally Stable Extremal Regions) MSER (M i ll St bl E t l R i ) ● Affinely-invariant stable regions in the Affinely invariant stable regions in the image ● can be used to localize regions around ● can be used to localize regions around keypoints ● We will use only SI FT descriptors that are y p inside of MSER regions 6

  7. 7

  8. Visual Vocabulary Visual Vocabulary ● Quantize descriptor vectors into clusters, Quantize descriptor vectors into clusters which are visual ‘word’ for text retrieval ● Performed with K-means clustering ● Performed with K-means clustering ● Produce about 6K and 10K clusters for Shape adapted and Maximally Stable p p y regions respectively ● Chosen empirically to maximize retrieval results lt 8

  9. K Means Clustering K-Means Clustering ● Minimize the within-cluster sum of squares Minimize the within cluster sum of squares (WCSS) 9

  10. Distance Function Distance Function ● Use Mahalanobis distance as the distance function U M h l bi di t th di t f ti for clustering: , where S is covariance matrix ● I f S is the identify matrix, it reduces to Euclidean distance ● Decorrelate components of SI FT ● Decorrelate components of SI FT ● I nstead, Euclidean distance may be used 10

  11. Visual Indexing Visual Indexing ● Each document is represented by k-vector E h d t i t d b k t ● Weighting by tf-idf ● term frequency * log (inverse document frequency) f * l (i d f ) ● n id : # of occurrences of word i in document d ● n d : total # of words in the document d ● n d : total # of words in the document d ● n i : # of occurrences of term i in the whole database ● N: # of documents in the whole database ● At the retrieval stage documents are ranked by their normalized scalar product between query p q y vector V q and V d in database 11

  12. Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003] ● mAP: mean average precision AP i i 12

  13. Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003] ● Performance highly depended on number of P f hi hl d d d b f k(visual words) : not scalable 13

  14. Scalable Recognition with a Vocabulary Tree Vocabulary Tree David Niter et al. CVPR 2006 CVPR 2006 Citation: over 1000 at 2011 14

  15. Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06] ● Hierarchical k-means clustering Hi hi l k l t i 15

  16. Vocabulary tree with branch factor 10 factor 10 16

  17. Inverted File Inverted File 17

  18. Retrieval Algorithm Retrieval Algorithm ● Compute a histogram of visual words with Compute a histogram of visual words with SI FTs ● I dentify images that contain words of the ● I dentify images that contain words of the input query image ● Can be done with the inverted file ● Can be done with the inverted file ● Sort images based on a similarity function 18

  19. Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06] ● On 8GB RAM machine(40000 images)queries took 1s, database creation took 2.5 days 1s, database creation took 2.5 days 19

  20. Vocabulary Tree Vocabulary Tree ● Benefits: Benefits: ● Allow faster image retrieval (and pre- computation) computation) ● Scales efficiently to a large number of images ● Problems: ● Too much memory requirement Too much memory requirement ● Quantization effects 20

  21. Object retrieval with large j g vocabularies and fast spatial matching matching Philbin et al. CVPR 2007 CVPR 2007 Citation: over 350 at 2011 21

  22. Approximating K means Approximating K-means ● Use a forest of 8 randomized k-d trees Use a forest of 8 randomized k d trees ● Randomize splitting dimension among a set of the dimensions with highest variance the dimensions with highest variance ● Randomly choose a point close to the median for split value p ● Helps to mitigate quantization effects ● Each tree is descending to leaf, distance g , from boundaries are recorded in a prior queue ● Similar to best-bin-first search Si il t b t bi fi t h 22

  23. Appro imate K means Approximate K-means ● Algorithmic complexity of a single k-means Al ith i l it f i l k iteration ● Reduces from O(NK) to O(NlogK) where N is the # of ● Reduces from O(NK) to O(NlogK), where N is the # of features ● Achieved by multiple random kd-trees ● Find images with kd-trees too ● But using approximate K-means, performance is superior! p ● Due to reduction of quantization effect) 23

  24. Spatial Re Ranking with RANSAC Spatial Re-Ranking with RANSAC ● Generate hypotheses with pairs of Generate hypotheses with pairs of corresponding features ● Assume a restricted transformation since many ● Assume a restricted transformation, since many images on the web are captured in particular ways (axis-aligned ways) ● Evaluate other pairs and measure errors ● Re-ranking images by scoring the # of g g y g inliers 24

  25. Results Results 25

  26. Results Results 26

  27. Total Recall: Automatic Query Q y Expansions with a Generative Feature Model for Object Feature Model for Object Retrieval Chum et al Chum et al. ICCV 2007 ICCV 2007 Citation: over 150 at 2011 27

  28. Query Expansion Query Expansion ● I mprove recall with re-querying I mprove recall with re querying combination of the original query and result with spatial verification result with spatial verification query input DB result s 28

  29. Query Expansion Query Expansion ● Spatial verification Spatial verification ● Similar with the technique used in [Philbin et al 07]; Uses a RANSAC-like algorithm al. 07]; Uses a RANSAC-like algorithm ● I dentify a set of images that are very similar to the original query image g q y g 29

  30. BoW interpreted Probabilistically BoW interpreted Probabilistically ● Extracts a generative model of an object Extracts a generative model of an object from the query region ● Compute a response set that are likely to ● Compute a response set that are likely to have been generated from the model ● The generative model ● The generative model ● Spatial configuration of visual words with a background clutter g 30

  31. Generative Models Generative Models ● Query expansion baseline Q i b li ● Average term frequency vectors from the top 5 queries without verification without verification ● Transitive closure expansion ● A priority queue of verified images is keyed by # of ● A priority queue of verified images is keyed by # of inliers ● Take the top image and query it as a new query ● Average query expansion ● A new query is constructed by averaging the top 50 verified results (di is the term frequency vector of ith verified image) 31

  32. Generative Models Generative Models ● Multiple image resolution expansion M lti l i l ti i ● Consider images with different resolutions; higher resolutions give more detailed information resolutions give more detailed information ● Use a resolution band with (0, 4/ 5), (2/ 3, 3/ 2), and (5/ 4, infinity) ● Use averaged queries for each resolution band ● Show the best result 32

  33. mAP Results Results 33

  34. Results Results Expanded results that were Expanded results that were not identified by the original Original query Top 4 images query 34

  35. Lost in Quantization: Q Improving Particular Object p g j Retrieval in Large Scale Image Databases Databases Philbin et al. CVPR 2008 CVPR 2008 Citation: over 175 at 2011 35

  36. Soft Quantization [Philbi Soft Quantization [Philbin et al. CVPR 08] t l CVPR 08] ● 3 and 4 will be never matched in hard assignment g ● No way of distinguishing 2 and 3 are closer than 1 and 2 ● Soft assignment: use a weight vector ● A weight to a cluster is assigned proportional to the distance between the descriptor and the center of the cluster 36

  37. Results Results 37

Recommend


More recommend