WST665/CS770A: Web-Scale Image Retrieval WST665/CS770A: Web Scale Image Retrieval Recent Image Retrieval Techniques Sung-Eui Yoon ( 윤성의 ) ( 윤성의 ) C Course URL: URL http://sglab.kaist.ac.kr/~sungeui/IR
● Go over some of recent image retrieval Go over some of recent image retrieval techniques Today Today 2
Video Google: A Text Retrieval G Approach to Object Matching pp j g in Videos Josef Sivic and Andrew Zisserman Josef Sivic and Andrew Zisserman Robotics Research Group, Department of Engineering Science University of Oxford United Kingdom University of Oxford, United Kingdom ICCV 03 ICCV 03 Citation: over 1300 at 2011 3
Motivations Motivations ● Retrieve key frames and shots of a video Retrieve key frames and shots of a video containing a particular object ● I nvestigate whether a text retrieval approach can be successful for object approach can be successful for object recognition 4
Viewpoint Invariant Description Viewpoint Invariant Description ● Find viewpoint covariant regions Fi d i i t i t i ● Produce elliptical affine invariant regions (e.g., Shape Adapted (SA) and Maximally Stable(MS)) Adapted (SA) and Maximally Stable(MS)) ● SA regions centered on corner like features ● MS regions correspond to high contrast with respect to g p g p their surroundings (dark window, gray wall…) ● Compute a SI FT descriptor for each region p p g 5
MSER (Maximally Stable Extremal Regions) MSER (M i ll St bl E t l R i ) ● Affinely-invariant stable regions in the Affinely invariant stable regions in the image ● can be used to localize regions around ● can be used to localize regions around keypoints ● We will use only SI FT descriptors that are y p inside of MSER regions 6
7
Visual Vocabulary Visual Vocabulary ● Quantize descriptor vectors into clusters, Quantize descriptor vectors into clusters which are visual ‘word’ for text retrieval ● Performed with K-means clustering ● Performed with K-means clustering ● Produce about 6K and 10K clusters for Shape adapted and Maximally Stable p p y regions respectively ● Chosen empirically to maximize retrieval results lt 8
K Means Clustering K-Means Clustering ● Minimize the within-cluster sum of squares Minimize the within cluster sum of squares (WCSS) 9
Distance Function Distance Function ● Use Mahalanobis distance as the distance function U M h l bi di t th di t f ti for clustering: , where S is covariance matrix ● I f S is the identify matrix, it reduces to Euclidean distance ● Decorrelate components of SI FT ● Decorrelate components of SI FT ● I nstead, Euclidean distance may be used 10
Visual Indexing Visual Indexing ● Each document is represented by k-vector E h d t i t d b k t ● Weighting by tf-idf ● term frequency * log (inverse document frequency) f * l (i d f ) ● n id : # of occurrences of word i in document d ● n d : total # of words in the document d ● n d : total # of words in the document d ● n i : # of occurrences of term i in the whole database ● N: # of documents in the whole database ● At the retrieval stage documents are ranked by their normalized scalar product between query p q y vector V q and V d in database 11
Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003] ● mAP: mean average precision AP i i 12
Video Google [Sivic et al CVPR 2003] Video Google [Sivic et al. CVPR 2003] ● Performance highly depended on number of P f hi hl d d d b f k(visual words) : not scalable 13
Scalable Recognition with a Vocabulary Tree Vocabulary Tree David Niter et al. CVPR 2006 CVPR 2006 Citation: over 1000 at 2011 14
Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06] ● Hierarchical k-means clustering Hi hi l k l t i 15
Vocabulary tree with branch factor 10 factor 10 16
Inverted File Inverted File 17
Retrieval Algorithm Retrieval Algorithm ● Compute a histogram of visual words with Compute a histogram of visual words with SI FTs ● I dentify images that contain words of the ● I dentify images that contain words of the input query image ● Can be done with the inverted file ● Can be done with the inverted file ● Sort images based on a similarity function 18
Vocabulary Tree [Nister et al CVPR 06] Vocabulary Tree [Nister et al. CVPR 06] ● On 8GB RAM machine(40000 images)queries took 1s, database creation took 2.5 days 1s, database creation took 2.5 days 19
Vocabulary Tree Vocabulary Tree ● Benefits: Benefits: ● Allow faster image retrieval (and pre- computation) computation) ● Scales efficiently to a large number of images ● Problems: ● Too much memory requirement Too much memory requirement ● Quantization effects 20
Object retrieval with large j g vocabularies and fast spatial matching matching Philbin et al. CVPR 2007 CVPR 2007 Citation: over 350 at 2011 21
Approximating K means Approximating K-means ● Use a forest of 8 randomized k-d trees Use a forest of 8 randomized k d trees ● Randomize splitting dimension among a set of the dimensions with highest variance the dimensions with highest variance ● Randomly choose a point close to the median for split value p ● Helps to mitigate quantization effects ● Each tree is descending to leaf, distance g , from boundaries are recorded in a prior queue ● Similar to best-bin-first search Si il t b t bi fi t h 22
Appro imate K means Approximate K-means ● Algorithmic complexity of a single k-means Al ith i l it f i l k iteration ● Reduces from O(NK) to O(NlogK) where N is the # of ● Reduces from O(NK) to O(NlogK), where N is the # of features ● Achieved by multiple random kd-trees ● Find images with kd-trees too ● But using approximate K-means, performance is superior! p ● Due to reduction of quantization effect) 23
Spatial Re Ranking with RANSAC Spatial Re-Ranking with RANSAC ● Generate hypotheses with pairs of Generate hypotheses with pairs of corresponding features ● Assume a restricted transformation since many ● Assume a restricted transformation, since many images on the web are captured in particular ways (axis-aligned ways) ● Evaluate other pairs and measure errors ● Re-ranking images by scoring the # of g g y g inliers 24
Results Results 25
Results Results 26
Total Recall: Automatic Query Q y Expansions with a Generative Feature Model for Object Feature Model for Object Retrieval Chum et al Chum et al. ICCV 2007 ICCV 2007 Citation: over 150 at 2011 27
Query Expansion Query Expansion ● I mprove recall with re-querying I mprove recall with re querying combination of the original query and result with spatial verification result with spatial verification query input DB result s 28
Query Expansion Query Expansion ● Spatial verification Spatial verification ● Similar with the technique used in [Philbin et al 07]; Uses a RANSAC-like algorithm al. 07]; Uses a RANSAC-like algorithm ● I dentify a set of images that are very similar to the original query image g q y g 29
BoW interpreted Probabilistically BoW interpreted Probabilistically ● Extracts a generative model of an object Extracts a generative model of an object from the query region ● Compute a response set that are likely to ● Compute a response set that are likely to have been generated from the model ● The generative model ● The generative model ● Spatial configuration of visual words with a background clutter g 30
Generative Models Generative Models ● Query expansion baseline Q i b li ● Average term frequency vectors from the top 5 queries without verification without verification ● Transitive closure expansion ● A priority queue of verified images is keyed by # of ● A priority queue of verified images is keyed by # of inliers ● Take the top image and query it as a new query ● Average query expansion ● A new query is constructed by averaging the top 50 verified results (di is the term frequency vector of ith verified image) 31
Generative Models Generative Models ● Multiple image resolution expansion M lti l i l ti i ● Consider images with different resolutions; higher resolutions give more detailed information resolutions give more detailed information ● Use a resolution band with (0, 4/ 5), (2/ 3, 3/ 2), and (5/ 4, infinity) ● Use averaged queries for each resolution band ● Show the best result 32
mAP Results Results 33
Results Results Expanded results that were Expanded results that were not identified by the original Original query Top 4 images query 34
Lost in Quantization: Q Improving Particular Object p g j Retrieval in Large Scale Image Databases Databases Philbin et al. CVPR 2008 CVPR 2008 Citation: over 175 at 2011 35
Soft Quantization [Philbi Soft Quantization [Philbin et al. CVPR 08] t l CVPR 08] ● 3 and 4 will be never matched in hard assignment g ● No way of distinguishing 2 and 3 are closer than 1 and 2 ● Soft assignment: use a weight vector ● A weight to a cluster is assigned proportional to the distance between the descriptor and the center of the cluster 36
Results Results 37
Recommend
More recommend