visual instance retrieval
play

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling


  1. Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1

  2. Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling Sum Pooled Convolutional Features Integral Max Pooling Case Study Gordo et. al. ECCV’16 2

  3. Image Retrieval Image retrieval problem Given a query object, retrieve all candidate objects from the database which matches the query irrespective of view point changes, illumination, scale and location. 3

  4. Instance Level Search Visual Search J. Sivic 4

  5. Instance Level Search Search photos on the web for particular places J. Sivic 5

  6. Instance Level Search Retrieval Challenges J. Sivic 6

  7. Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? 7

  8. Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? Solution Local feature aggregation of learned neural codes. ◮ Inspired from BoVW based encoding and pooling schemes. 7

  9. Neural Codes for Image Retrieval Neural Codes Use of feature activation from the top layers of CNN network as high level descriptor. Babenko et. al. ECCV’14 8

  10. Neural Codes for Image Retrieval Neural Codes ◮ Using pretrained networks on ILSVRC. ◮ Fine tuning on related dataset. Compressed neural codes ◮ PCA compression ◮ Discriminative dimensonality reduction ◮ Metric Learning: Learning of a low-rank projection matrix W . ◮ Training Data: Build matching graph by using standard image pipeline such as SIFT+NN Matching+RANSAC. Babenko et. al. ECCV’14 9

  11. Neural Codes for Image Retrieval Results Babenko et. al. ECCV’14 10

  12. Local Convolutional Features ◮ Activations from convolutional layers interpreted as local feature codes . ◮ Pooling of local features to produce compact global descriptors. E.g. VLAD, Fisher Vectors etc. ◮ More discriminative and less false positives. We will now see different ways to pool such codes for a global representation. 11

  13. Multi-Scale Orderless Pooling : MOP-CNN Building an orderless representation on top of CNN ( globally ordered ) activation in a multi-scale manner. Figure 1: Classification of CNN activations of local patches in an image. Notice the sensitivity of prediction w.r.t patches. Gong et. al. ECCV’14 12

  14. Multi-Scale Orderless Pooling : MOP-CNN Gong et. al. ECCV’14 13

  15. Sum Pooled Convolutional Features : SPoC SPoC Design 1. Sum Pooling with centering Prior: H W � � ψ 1 ( I ) = α ( x , y ) f ( x , y ) y =1 x =1 Here α denotes Gaussian weights dependent on the spatial co-ordinates. 2. Post Processing: PCA+Whitening ψ 2 ( I ) = diag ( s 1 , . . . , s N ) − 1 M PCA ψ 1 ( I ) ψ 2 ( I ) ψ SPoC ( I ) = || ψ 2 ( I ) || 2 Here M PCA is the PCA matrix and s i ’s are the singular values. Babenko et. al. CVPR’15 14

  16. Integral Max Pooling: R-MAC Revisiting traditional Bag of Visual Words:- ◮ Compact image representation derived from multiple image regions by global max-pooling. ◮ Approximating max pooling on integral images for efficient object localization. ◮ Performing image re-ranking and query expansion. Tolias et. al. ICLR’16 15

  17. Integral Max Pooling: R-MAC Maximum activations of convolutions (MAC) Given a set of 2 D convolutional feature channel responses X = {X i } , i = 1 . . . K , spatial max-pooling over all location is given as:- f ω = [ f Ω , 1 , . . . , f Ω , i , . . . , f Ω , K ] T , with f Ω , i = max p ∈ Ω X i ( p ) Here, Ω is the set of valid spatial locations, X i ( p ) is the response at particular position p , and K is the number of feature channels. Tolias et. al. ICLR’16 16

  18. Integral Max Pooling: R-MAC Regional maximum activation of convolutions (R-MAC) 1. Regional feature vector: f R over a rectangular region R ⊂ Ω = [1 , W ] × [1 , H ] is given as:- [ f R , 1 , . . . , f R , i , . . . , f R , K ] T , with f R , i = max p ∈R X i ( p ) 2. Sampling of regions: uniformly at l different scales. 3. Final descriptor: Individual R-MAC’s are l 2 normalized, PCA-Whitened and summed across all regions with l 2 normalization. Tolias et. al. ICLR’16 17

  19. Integral Max Pooling: R-MAC Object Localization ◮ Approximate integral max-pooling: Using generalized mean [Dollar et. al. 2009] 1   α ˜ � f R , i = X i ( p ) α  p ∈R where α > 1 and ˜ f i → f i when α → + ∞ ˜ f T R q ◮ Window detection: ˆ R = arg max R⊂ Ω � ˜ f R �� q � To reduce the search space of windows:- ◮ Efficient subwindow search (ESS) [Lampert et. al. 2009] ◮ Approximate max-pooling localization : Uses heuristics. Tolias et. al. ICLR’16 18

  20. Integral Max Pooling: R-MAC End2End Pipeline 1. Initial retrieval using R-MACs vectors. 2. Re-ranking by localization of query object in top-N ranked images. 3. Query expansion by merging the query vector with top-5 results. Tolias et. al. ICLR’16 19

  21. Takeaways till now. Takeaways ◮ Global image representation using pre-trained networks. ◮ Aggregation of local conv. activations from multiple regions better than FC layer activation. ◮ PCA compression, whitening and normalization plays an important role. Further Questions ◮ How to leverage deep architecture for the task of image retrieval? ◮ How to deal with non-uniform region and selecting pooling from them? 20

  22. Deep Image Retrieval: Gordo et. al. ECCV’16 CNN Architecture for Instance Retrieval ◮ A triplet network for optimizing the R-MAC [Tolias et. al. ICLR’15] representation. ◮ Uses a trained region proposal network to generate valid proposals. 21

  23. Deep Image Retrieval: Gordo et. al. ECCV’16 Detour: A quick overview on R-CNN, Fast R-CNN and Faster R-CNN. 22

  24. Deep Image Retrieval: Gordo et. al. ECCV’16 Leveraging large-scale noisy data ◮ Preparation of cleaned Landmark dataset. ◮ Generating pairwise scores between image pairs by building a matching graph. ◮ Pruning noises and extracting non-duplicate connected components. ◮ Leveraging bounding boxes from cleaned images. 23

  25. Deep Image Retrieval: Gordo et. al. ECCV’16 Bounding box estimation 1. Intialization: For each pair of connected components ( i , j ) and affine transformation matrix A ij , find the geometric median of matched keypoints. 2. Update: Run a diffusion process between a pair of bounding boxes B i and B j :- B ′ j = ( α − 1) B j + α A ij B i 24

  26. Deep Image Retrieval: Gordo et. al. ECCV’16 Qualitative Results 25

  27. Thank you 26

Recommend


More recommend