Geometric VLAD for Large Scale Image Search Zixuan Wang 1 , Wei Di 2 , Anurag Bhardwaj 2 , Vignesh Jagadesh 2 , Robinson Piramuthu 2 1 2
Our Goal 1) Robust to various imaging conditions 2) Small memory footprint 3) Speed (<1s per query) 2 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Issues with matching images (1/2) Photometric Invariance • Brightness • Exposure 3 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Issues with matching images (2/2) Geometric Invariance • Rotation • Translation • Scale 4 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
State-of-the-art: Bag-of-Words (BoW) BoW Computation Codebook Descriptor Image Inventory Keypoint Detection Bag-of-Words Construction Computation BoW Encoding Slide evolved from Fei-Fei Li’s Image Encoding Image Inventory Bag-of-Words Inverted Indices (size = 200k) 5 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Issues with BOW Matching v Weak Matching Schema o for a “small” visual dictionary: too many false matches o for a “large” visual dictionary: many true matches are missed v Hard to find vocabulary size trade-offs v Large inverted index size 6 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Recent approaches for very large scale indexing BoW Computation Codebook Descriptor Image Inventory Keypoint Detection Bag-of-Words Construction Computation Vector Encoding Vector Compression Image Inventory Bag-of-Words Nearest Neighbor Search (size = 128) 7 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
VLAD: Vector of Locally Aggregated Descriptors For a given image l x assign each descriptor to closest center c i ► accumulate (sum) descriptors per cell ► v i := v i + ( x - c i ) c i Residual ( x - c i ) adds useful information l VLAD (dimension D = k x d ) – typical k = 64 l 128 Dimension VLAD has better performance than 65k BoW! 8 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Issue with VLAD x x r r c i c i VLAD : v i := v i + (x - c i ) VLAD : r + r + r + r = 4*r VLAD : r + r + r + r = 4*r VLAD fails to capture geometry information 9 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
gVLAD: Incorporating Geometry in VLAD x x r r Bin 1 c i c i Bin 2 gVLAD : Take 2 angle bins - [-30,120), [120-330) § v i := v i + (x - c i ) per angle bin § gVLAD: (4*r, 0) gVLAD: (2*r, 2*r) Bin 1: r + r + r + r Bin 1: r + r Bin 2: 0 Bin 2: r + r Angle binning captures different geometry of configurations! 10 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Power of Keypoint Angle Features mAP Angle Bin (8) 0.15 Angle Bin (18) 0.24 Angle Bin (36) 0.26 Angle Bin (72) 0.27 GIST (544) 0.35 BoW (20,000) 0.45 Retrieval performance using only angle histogram Only 72-D Angle Bin performs well! 11 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Datasets & Vocabularies Paris 6K Holidays Oxford 5K 1491 / 500 queries 6412 / 60 queries 5062 / 55 queries q Large Scale Distractors Flickr 100K, Flickr 1M q Vocabulary k-means clustering on SURF descriptors Rotated Holidays with k = 256 on Paris dataset 12 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Dataset: Holiday & Oxford Holiday example queries Oxford example queries 13 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Example Distractors – Flickr 14 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
gVLAD: Keypoint Detection & Descriptor Extraction – feature descriptor – angle 15
gVLAD: Learning Angle Membership Rotate Holiday, 4 Bins Oxford, 4 Bins 0.855 0.63 0.85 0.62 0.845 0.61 0.84 0.6 mAP mAP 0.835 0.59 0.83 0.58 0.825 0.57 0.82 0.56 0.815 0.81 0.55 0 20 40 60 80 100 0 20 40 60 80 100 Offset Offset 16
gVLAD: Learning Angle Membership Von-Mises Distribution Holiday SURF (8,233,763 key points) 17
gVLAD: Vocabulary Adaptation • Adapt existing codebooks with incremental dataset • Alleviate the need of frequent large-scale codebook training 𝑉 ¡ - Initial data set 𝑣 initial codebook initial codebook k 1 𝐸 ¡ – New data set 𝑣 adapted codebook k 2 k 3 18 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
gVLAD: Compute Descriptors Inter-norm ~ 17.7 % Rotated Holiday Dataset 19 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
gVLAD: PCA whitening Rotated Holiday Dataset Whitened gVLAD: 16.6 % Lower Dim 20 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
gVLAD: PCA whitening Dimension reduction on the original gVLAD using PCA From 65,536 à 128 dimensions, the mAP decreases only about 1%. 21 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Experiment: Full size gVLAD on Holiday & Oxford mAP performance 2003 2010 2013 16.6% 7.1% - Full size gVLAD descriptors - Compared with state-of-the-art results - SURF detector & SURF descriptor are used - Best performances are in bold 22 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Experiment: Low-dim gVLAD on Holiday & Oxford mAP performance 2003 2010 2012 2013 15.4% 15.2% - Low dimensional descriptors -‑ 𝑙↓𝑥 =128 K=128 - Comparison with state-of-the-art - Best performances are in bold. 23 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Experiment: on Large Scale Dataset 2013 2008 Avg ~ 12.5% Avg ~ 16.3% - Large Scale Data with 100k/1M distractors - Comparison with state-of-the-art - Best performances are in bold. 24 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Take Home Message gVLAD 0.8 0.7 0.6 0.5 0.4 1 2 3 4 5 6 7 8 9 10 11 VLAD VLAD+SSR Ours - gVLAD Improved Fisher Bow 2003 2010 2012 2013 MutltiVoc+VLAD 25 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Thank You 26 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
BACKUP
Speed and Memory Speed ~750ms per query Memory ¨ 0.5KB per image for 128-D features. ¨ 0.5GB for 1M images. ¨ 500GB for 1B images. 28 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Comparison with CNN-Based Approaches Neural Codes MOP-CNN “Neural Codes for Image Retrieval”, A Babenko1, A. Slesarev, A. Chigorin, and V. Lempitsky, arXiv, April 2014. “Multi-scale Orderless Pooling of Deep Convolutional Activation Features”, Yunchao Gong1, Liwei Wang2, Ruiqi gVLAD Guo2, and Svetlana Lazebnik, arXiv, March 2014. 29 ICML 2014 workshop on New Learning Frameworks and Models for Big Data
Recommend
More recommend