Particular Object Retrieval with Integral Max-pooling of CNN Activations Tolias et al. ICLR 2016 Presented by Jaehyeong Cho
Contents • Introduction • Related works • Main approaches • Results • Conclusion
Introduction • How to find similar images? • Convert an image into a single feature (e.g. BoW, VLAD, CNN) • Measure the similarity between features => Quality of features highly affects the retrieval results • Are all parts of an image equally representative? • No, it is better to focus on important regions only • Main contribution • Encodes several image regions into single compact feature • Localizes matching objects
Related works • Retrieval methods considering spatial information • Babenko and Lempitsky, Aggregating Deep Convolutional Features for Image Retrieval, ICCV 2015 • Aggregates multiple convolutional features from various position in an image • Gives higher weights for the features near the center
Related works • Retrieval methods considering spatial information • Kalantidis et al. , Cross-dimensional Weighting for Aggregated Deep Convolutional Features, ECCV workshop 2016 • Gives different weights according to the channel and location
Related works • Retrieval methods considering spatial information • Xie et al. , Image Classification and Retrieval are ONE, ICMR 2015 • Extract CNN features from object regions • Represent an image with multiple features
Main approaches • Maximum activations of convolutions (MAC) • Proposed by Azizpour et al. , 2014 • CNN activations for an image I • W × H × K • Utilizes only maximum activations from each channel • Enables to capture representative regions • But lacks location information
Main approaches • Regional maximum activations of convolutions (R-MAC) • Extract MAC from multiple regions => Encodes the location information • Makes a single feature by summation
Main approaches • Object localization • q : MAC feature from the query object (blue) • Find the region that maximize the similarity T is required • Fast computation of f R
Main approaches • Object localization • Approximation of T f R • Localization result helps re-ranking
Results • Comparison of retrieval accuracy • without post-processing
Results • Comparison of retrieval accuracy • with post-processing
Results • Re-ranking with object localization
Conclusion • Generated improved feature vector by encoding location information into the feature • Approximated max-pooling process for fast computation • Localized the target object and effectively used it for re-ranking
References • Babenko, Artem, and Victor Lempitsky. "Aggregating local deep features for image retrieval." Proceedings of the IEEE International Conference on Computer Vision. 2015. • Kalantidis, Yannis, Clayton Mellina, and Simon Osindero. "Cross-dimensional weighting for aggregated deep convolutional features." arXiv preprint arXiv:1512.04065 (2015). • Xie, Lingxi, et al. "Image classification and retrieval are one." Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. ACM, 2015. • Azizpour, Hossein, et al. "From generic to specific deep representations for visual recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2015.
Recommend
More recommend