Query-Adaptative Locality Sensitive Hashing Herv Jgou, INRIA/LJK - PowerPoint PPT Presentation

Query-Adaptative Locality Sensitive Hashing Hervé Jégou, INRIA/LJK Laurent Amsaleg, CNRS/IRISA Cordelia Schmid, INRIA/LJK Patrick Gros, INRIA/IRISA ICASSP’2008 April 4 th 2008

Problem setup We want to find the (k-)nearest neighbor(s) of a given query vector → without computing all distances! Curse of the dimensionality q • exact search inefficient x i → approximate nearest neighbor dataset: n d -dimensional vectors i = 1 ::n; x i = ( x 1 ; : : : ; x d ) query q = ( q 1 ; : : : ; q d )

Application : large-scale (= 1 million) image search Image dataset query ranked image list Image search system State-of-the-art for image search: local description ≈ 2000 local descriptors per image • • SIFT descriptors [Lowe 04]: d=128, Euclidean unitary vectors INTENSIVE USE OF NEAREST NEIGHBOR SEARCH

Approximate nearest neighbor (ANN) search Many existing approaches • very popular one: Locality Sensitive Hashing (LSH) → provides some guarantees on the search quality for some distributions LSH: many variants, e.g., • for the Hamming space [Gionis, Indyk, Motwani, 99] Euclidean version [Datar, Indyk, Immorlica, Mirrokni, 04] → E 2 LSH • • using Leech lattice quantization [Andoni, Indyk, 06] • spherical LSH [Terasawa, Tanaka, 07] and applications: computer vision [Shakhnarovich & al, 05], music search [Casey, Slaney, 07], etc

Euclidean Locality Sensitive Hashing (E2LSH) 1) Projection on m random directions i ( x ) = h x j a i i¡ b i h r w h i ( x ) = b h r i ( x ) c 2 2 a 1 1 2) Construction of l hash functions: concatenate k indexes h i per hash 1 2 0 function 1 g j ( x ) = ( h j; 1 ( x ) ; : : : ; h j;k ( x ) ) O b i 1 w (3,1) (2,1) (1,1) 3) For each g j , compute two hash values (0,1) (3,0) (2,0) universal hash functions: u 1 (.), u 2 (.) (1,0) (0,0) store the vector id in a hash table (2,-1) (1,-1) (0,-1)

Search: algorithm summary and complexity For all h i , compute h i (q) O( m d ) • For j = 1.. l , compute g j (q) and hash values u 1 (g j (q)) and u 2 ( g j (q)) O( l k ) • • For j = 1.. l , retrieve the vectors id having the same hash keys O( l  n ) • proportion  of the dataset vectors, i.e.  * n vectors • O( l  n d ) Exact distance computation between query and retrieved vectors Large dataset ⇒ step 4 is by far the most computationally intensive Performance measure: rate of correct nearest neighbors found vs average short-list size

Geometric hash function: the lattice choice [Andoni Indyk 06] Motivation: instead of using h i : R d ! Z and in turn hash functions as ¡ ¢ g j ( x ) = h j; 1 ( x ) ; : : : ; h j;k ( x ) Why not directly using a structured vector quantizer? • spheres would be the best choice (but no such space partitioning) Well-know lattice quantizers: Hexagonal (d=2), E 8 (d=8), Leech (d=24)

LSH using Lattice Several lattices or concatenation of lattices are used for geometric hashing g j ( x ) = lattice-idx( x i;j;d ¤ ¡ b j ) b j is now a vectorial random offset • x i,j,d’ is formed of d* components of x ( ≠ for each g j ) • Previous work by Andoni and Indyk makes use of the Leech lattice ( d* =24) • very good quantization properties • d* = 24, 48, … Here, we use the E8 lattice • very fast computation together with excellent quantization properties • d* = 8, 16, 24, …

Hash function selection criterion: motivation Let consider several hash functions and corresponding space partitioning The position of the query within the cell has a strong impact on the probability that vectors which are close are in the same cell or not HASH FUNCTION RELEVANCE CRITERION  j : the distance to the cell center in the projected k -dimensional subspace = root square of the square Euclidean error in a quantization context

Hash function relevance criterion: E2LSH or lattice-based i ( x ) = h x j a i i¡ b i h r E2LSH: Recall that w h i ( x ) = b h r i ( x ) c square of the relevance criterion = quantization error in the projected space X ¡ ¢ 2 ¸ j ( x ) 2 = h r j;i ( x ) ¡ h i ( x ) ¡ 0 : 5 i =1 ::k For lattice-based LSH, distance between query and lattice point Remark for E8:  j requires no extra-computation → byproduct of the lattice point calculation

Relevance criterion: impact on quality (SIFT descriptors) 1 P ( g j ( NN ( x )) = g j ( x ) j ¸ ( g j ( x )) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 p 0 0.25 0.5 0.75 1.0 1.25 1.5 ¸ ( g j ( x )) k= 2  closer to 0: much higher confidence in the vectors retrieved

Query adaptative LSH: exploiting the criterion l =3, l ’=1 Idea: • define a larger pool of l hash functions • use only for the most relevant ones Search is modified as follows for j = 1.. l , compute criterion  j • select the l ’ (<< l ) hash functions associated with the lowest values of  j • Perform the final steps as in standard LSH, using the hash function subset only • compute u 1 and u 2 and parse the corresponding buckets • compute the exact distances between query and vectors retrieved from buckets

Results: SIFT descriptors 1 Proj-LSH Proj-QALSH E8-LSH rate of nearest neighbors correctly found E8-QALSH 0.8 0.6 0.4 0.2 0 0.001 0.01 0.1 1 % of the database retrieved

Conclusion Using E8 Lattice for LSH provides • excellent quantization properties • high flexibility for d * QALSH trades memory against accuracy → without noticably increasing search complexity for large datasets This a quite generic approach: can be jointly used with other versions of LSH • binary or spherical LSH

Thank you for your attention! ?

Brute force search of optimal parameters 1 1 LSH LSH QALSH QALSH rate of nearest neighbors correctly found rate of nearest neighbors correctly found 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0.001 0.01 0.1 1 0.001 0.01 0.1 1 % of the database retrieved % of the database retrieved E 8 LSH Random projection LSH

p.d.f. of the relevance criterion

Euclidean Locality Sensitive Hashing (E2LSH) 1) Projection on m random directions i ( x ) = h x j a i i¡ b i h r w 2 2 h i ( x ) = b h r a 1 i ( x ) c 1 (3,1) (2,1) (1,1) 2) Construction of l hash functions: (0,1) 1 concatenate k indexes h i per hash 2 0 function (3,0) (2,0) (1,0) (0,0) g j ( x ) = ( h j; 1 ( x ) ; : : : ; h j;k ( x ) ) 1 O b i 1 (2,-1) (1,-1) (0,-1) 3) For each g j , compute two hash values w universal hash functions: u 1 (.), u 2 (.) store the vector id in a hash table

Query-Adaptative Locality Sensitive Hashing Herv Jgou, INRIA/LJK - PowerPoint PPT Presentation

Query-Adaptative Locality Sensitive Hashing Herv Jgou, INRIA/LJK Laurent Amsaleg, CNRS/IRISA Cordelia Schmid, INRIA/LJK Patrick Gros, INRIA/IRISA ICASSP2008 April 4 th 2008 Problem setup We want to find the (k-)nearest neighbor(s) of

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Information near-duplicates Minimum hashing; Locality Sensitive Hashing Web Search Information

MIN-HASHING AND LOCALITY SENSITIVE HASHING Thanks to: Rajaraman and Ullman, Mining Massive

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a

Locality-Sensitive Hashing CS 395T: Visual Recognition and Search Marc Alban Feb 22, 2008 1

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Another Day On The Set Of Holby City Ian Leech, Mels Dad and also Community Engagement Manager

Almond Shell Composites Evaluating the market for more durable and heat tolerant plastic

Minnesota-FirstNet Initial Consultation Minnesotas Preliminary Findings, Recommendations and

P ENSION SECURITY : REPLACING A MYOPIC TRADITION WITH FAR SIGHTED REFORM A N ADDRESS BY J IM L EECH

Monday, March 26, 2018 Kerrie Torres, Assistant Superintendent of Educational Services Jeanine

LCCMR ID: 106-D-2e Project Title: 2e - HCP 7 - Wild Rice/Waterfowl Habitat: Enhancement and

An Overview of Lake Herring ( Coregonus artedii ) propagation and production techniques at the

2017 FOCA Achievement Award presented by: Wendy Sue Lyttle Catchacoma Cottagers Association

Query-Adaptative Locality Sensitive Hashing Herv Jgou, INRIA/LJK - PowerPoint PPT Presentation

Query-Adaptative Locality Sensitive Hashing Herv Jgou, INRIA/LJK Laurent Amsaleg, CNRS/IRISA Cordelia Schmid, INRIA/LJK Patrick Gros, INRIA/IRISA ICASSP2008 April 4 th 2008 Problem setup We want to find the (k-)nearest neighbor(s) of

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Information near-duplicates Minimum hashing; Locality Sensitive Hashing Web Search Information

MIN-HASHING AND LOCALITY SENSITIVE HASHING Thanks to: Rajaraman and Ullman, Mining Massive

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

Locality-Sensitive Hashing &amp; Image Similarity Search Andrew Wylie Overview; LSH given a

Locality-Sensitive Hashing CS 395T: Visual Recognition and Search Marc Alban Feb 22, 2008 1

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Another Day On The Set Of Holby City Ian Leech, Mels Dad and also Community Engagement Manager

Almond Shell Composites Evaluating the market for more durable and heat tolerant plastic

Minnesota-FirstNet Initial Consultation Minnesotas Preliminary Findings, Recommendations and

P ENSION SECURITY : REPLACING A MYOPIC TRADITION WITH FAR SIGHTED REFORM A N ADDRESS BY J IM L EECH

Monday, March 26, 2018 Kerrie Torres, Assistant Superintendent of Educational Services Jeanine

LCCMR ID: 106-D-2e Project Title: 2e - HCP 7 - Wild Rice/Waterfowl Habitat: Enhancement and

An Overview of Lake Herring ( Coregonus artedii ) propagation and production techniques at the

2017 FOCA Achievement Award presented by: Wendy Sue Lyttle Catchacoma Cottagers Association

Locality-Sensitive Hashing & Image Similarity Search Andrew Wylie Overview; LSH given a