Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1

Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling Sum Pooled Convolutional Features Integral Max Pooling Case Study Gordo et. al. ECCV’16 2

Image Retrieval Image retrieval problem Given a query object, retrieve all candidate objects from the database which matches the query irrespective of view point changes, illumination, scale and location. 3

Instance Level Search Visual Search J. Sivic 4

Instance Level Search Search photos on the web for particular places J. Sivic 5

Instance Level Search Retrieval Challenges J. Sivic 6

Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? 7

Instance Level Search Problem How to learn class agnostic compact and efficient image representation which is robust to retrieval challenges? Solution Local feature aggregation of learned neural codes. ◮ Inspired from BoVW based encoding and pooling schemes. 7

Neural Codes for Image Retrieval Neural Codes Use of feature activation from the top layers of CNN network as high level descriptor. Babenko et. al. ECCV’14 8

Neural Codes for Image Retrieval Neural Codes ◮ Using pretrained networks on ILSVRC. ◮ Fine tuning on related dataset. Compressed neural codes ◮ PCA compression ◮ Discriminative dimensonality reduction ◮ Metric Learning: Learning of a low-rank projection matrix W . ◮ Training Data: Build matching graph by using standard image pipeline such as SIFT+NN Matching+RANSAC. Babenko et. al. ECCV’14 9

Neural Codes for Image Retrieval Results Babenko et. al. ECCV’14 10

Local Convolutional Features ◮ Activations from convolutional layers interpreted as local feature codes . ◮ Pooling of local features to produce compact global descriptors. E.g. VLAD, Fisher Vectors etc. ◮ More discriminative and less false positives. We will now see different ways to pool such codes for a global representation. 11

Multi-Scale Orderless Pooling : MOP-CNN Building an orderless representation on top of CNN ( globally ordered ) activation in a multi-scale manner. Figure 1: Classification of CNN activations of local patches in an image. Notice the sensitivity of prediction w.r.t patches. Gong et. al. ECCV’14 12

Multi-Scale Orderless Pooling : MOP-CNN Gong et. al. ECCV’14 13

Sum Pooled Convolutional Features : SPoC SPoC Design 1. Sum Pooling with centering Prior: H W � � ψ 1 ( I ) = α ( x , y ) f ( x , y ) y =1 x =1 Here α denotes Gaussian weights dependent on the spatial co-ordinates. 2. Post Processing: PCA+Whitening ψ 2 ( I ) = diag ( s 1 , . . . , s N ) − 1 M PCA ψ 1 ( I ) ψ 2 ( I ) ψ SPoC ( I ) = || ψ 2 ( I ) || 2 Here M PCA is the PCA matrix and s i ’s are the singular values. Babenko et. al. CVPR’15 14

Integral Max Pooling: R-MAC Revisiting traditional Bag of Visual Words:- ◮ Compact image representation derived from multiple image regions by global max-pooling. ◮ Approximating max pooling on integral images for efficient object localization. ◮ Performing image re-ranking and query expansion. Tolias et. al. ICLR’16 15

Integral Max Pooling: R-MAC Maximum activations of convolutions (MAC) Given a set of 2 D convolutional feature channel responses X = {X i } , i = 1 . . . K , spatial max-pooling over all location is given as:- f ω = [ f Ω , 1 , . . . , f Ω , i , . . . , f Ω , K ] T , with f Ω , i = max p ∈ Ω X i ( p ) Here, Ω is the set of valid spatial locations, X i ( p ) is the response at particular position p , and K is the number of feature channels. Tolias et. al. ICLR’16 16

Integral Max Pooling: R-MAC Regional maximum activation of convolutions (R-MAC) 1. Regional feature vector: f R over a rectangular region R ⊂ Ω = [1 , W ] × [1 , H ] is given as:- [ f R , 1 , . . . , f R , i , . . . , f R , K ] T , with f R , i = max p ∈R X i ( p ) 2. Sampling of regions: uniformly at l different scales. 3. Final descriptor: Individual R-MAC’s are l 2 normalized, PCA-Whitened and summed across all regions with l 2 normalization. Tolias et. al. ICLR’16 17

Integral Max Pooling: R-MAC Object Localization ◮ Approximate integral max-pooling: Using generalized mean [Dollar et. al. 2009] 1   α ˜ � f R , i = X i ( p ) α  p ∈R where α > 1 and ˜ f i → f i when α → + ∞ ˜ f T R q ◮ Window detection: ˆ R = arg max R⊂ Ω � ˜ f R �� q � To reduce the search space of windows:- ◮ Efficient subwindow search (ESS) [Lampert et. al. 2009] ◮ Approximate max-pooling localization : Uses heuristics. Tolias et. al. ICLR’16 18

Integral Max Pooling: R-MAC End2End Pipeline 1. Initial retrieval using R-MACs vectors. 2. Re-ranking by localization of query object in top-N ranked images. 3. Query expansion by merging the query vector with top-5 results. Tolias et. al. ICLR’16 19

Takeaways till now. Takeaways ◮ Global image representation using pre-trained networks. ◮ Aggregation of local conv. activations from multiple regions better than FC layer activation. ◮ PCA compression, whitening and normalization plays an important role. Further Questions ◮ How to leverage deep architecture for the task of image retrieval? ◮ How to deal with non-uniform region and selecting pooling from them? 20

Deep Image Retrieval: Gordo et. al. ECCV’16 CNN Architecture for Instance Retrieval ◮ A triplet network for optimizing the R-MAC [Tolias et. al. ICLR’15] representation. ◮ Uses a trained region proposal network to generate valid proposals. 21

Deep Image Retrieval: Gordo et. al. ECCV’16 Detour: A quick overview on R-CNN, Fast R-CNN and Faster R-CNN. 22

Deep Image Retrieval: Gordo et. al. ECCV’16 Leveraging large-scale noisy data ◮ Preparation of cleaned Landmark dataset. ◮ Generating pairwise scores between image pairs by building a matching graph. ◮ Pruning noises and extracting non-duplicate connected components. ◮ Leveraging bounding boxes from cleaned images. 23

Deep Image Retrieval: Gordo et. al. ECCV’16 Bounding box estimation 1. Intialization: For each pair of connected components ( i , j ) and affine transformation matrix A ij , find the geometric median of matched keypoints. 2. Update: Run a diffusion process between a pair of bounding boxes B i and B j :- B ′ j = ( α − 1) B j + α A ij B i 24

Deep Image Retrieval: Gordo et. al. ECCV’16 Qualitative Results 25

Thank you 26

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Beyond instance-level retrieval: Leveraging captions to learn a global visual representation for

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Adaptive Control Chapter 5: Recursive plant model identification in open loop 1 Adaptive Control

Finite Alphabet Estimation Graham C. Goodwin Day 5: Lecture 3 17th September 2004 International

Wh Why using ing artifici icial al intelligenc igence e in the search rch for gr gravita

UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot

Relational QPs Exploiting Symmetries for Modelling and Solving QPs Sriraam Amir Martin Babak

Shower reco validation Test sample Aaron Higuera University of Houston Shower Reco Validation

Primary data reduction and analysis Al Kikhney, EMBL Hamburg Outline 3D 2D 1D

Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad - PowerPoint PPT Presentation

Visual Instance Retrieval Praveen Krishnan CVIT, IIIT Hyderabad June 15, 2017 1 Outline Image Retrieval Instance Level Search Deep Image Retrieval Neural Codes for Image Retrieval Local Convolutional Features Multi-Scale Orderless Pooling

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Beyond instance-level retrieval: Leveraging captions to learn a global visual representation for

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

INSTANCE BASED LEARNING 2 Instance-Based Learning Distance function defines whats learned

Instance recognition Thurs April 6 Kristen Grauman UT Austin Instance recognition Indexing

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

I Instance-level recognition t l l iti Cordelia Schmid INRIA Instance-level recognition

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n &lt;=

Test Instance Generation Test Instance Generation for MAX 2SAT for MAX 2SAT Mitsuo Motoki

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Adaptive Control Chapter 5: Recursive plant model identification in open loop 1 Adaptive Control

Finite Alphabet Estimation Graham C. Goodwin Day 5: Lecture 3 17th September 2004 International

Wh Why using ing artifici icial al intelligenc igence e in the search rch for gr gravita

UTLC Unsupervised Transfer Learning Challenge egoire Mesnil 1 , 2 , Yann Dauphin 1 , Xavier Glorot

Relational QPs Exploiting Symmetries for Modelling and Solving QPs Sriraam Amir Martin Babak

Shower reco validation Test sample Aaron Higuera University of Houston Shower Reco Validation

Primary data reduction and analysis Al Kikhney, EMBL Hamburg Outline 3D 2D 1D

Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=

Divide And Conquer Small And Large Instance Small instance. Sort a list that has n <=