Neighbor-Sensitive Hashing Yongjoo Park (250, 3, 122, 130, 68, ) - PowerPoint PPT Presentation

h v i ) 4 Let h user-dist( q v i ) Hamming-dist( h q be a function that produces a hashcode. Then, Hashed DB First proposed by [Datar et al., 2004] and [Charikar, 2002] Hashed Query Locality-Sensitive Hashing for k NN LSH: Use similarity-preserving hash functions Hamming-dist( h ( q ) , h ( v i ) ) ∝ user-dist( q , v i ) h ( q ) hashcodes → lookup operations in a Look up hash table → fast. Perfect hash functions may not exist, h ( v 1 ) h ( v 2 ) h ( v 3 ) h ( v 4 ) or extremely hard to find → approximate. h ( v 5 ) h ( v 6 ) h ( v 7 ) h ( v 8 ) Note: Longer hashcode makes the searching slower. h ( v 9 ) h ( v 10 ) h ( v 11 ) h ( v 12 )

0000 00 0 1000 10 1 1100 11 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 1 Hamming distance = 2 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Hamming distance = 0 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH v 2 v 3

0000 00 10 1000 11 1100 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? distance from q other data points query point Hashcodes as a proxy Hamming distance = 0 Hamming distance = 2 neighbors assigned the same hashcode Hamming distance = 1 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH 0 1 v 2 v 3

0000 0 1000 1 1100 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hashcodes as a proxy neighbors assigned the same hashcode Hamming distance = 2 Hamming distance = 0 other data points query point distance from q Hamming distance = 1 5 Suppose LSH generates hashcodes of length 4. v 7 q v 1 v 4 v 5 v 6 v 8 h 4 h 1 h 2 h 3 Hashcodes Generation for LSH 00 10 11 v 2 v 3

0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3

00 0 10 1 11 For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? other data points query point distance from q h 4 Hashcodes as a proxy Hamming distance = 0 Hamming distance = 1 Suppose LSH generates hashcodes of length 4. 5 h 3 neighbors assigned the same hashcode h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hamming distance = 2 Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two

00 0 1 10 11 Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 0 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Suppose LSH generates hashcodes of length 4. h 4 5 h 3 Hamming distance = 2 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hamming distance = 1 Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two → For 3-NN, approximate

00 0 1 10 11 5 distance from q Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 neighbors assigned the same hashcode Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 other data points h 1 v 8 v 7 v 6 v 5 v 4 Hashcodes as a proxy v 1 q query point Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two → For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ?

6 Outline 1. Background and Motivation 2. NSH Intuition 3. NSH Algorithm 4. Experiments

We care which one is closer We don’t care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 h 3 h 4 h 2 We are interested in 3-NN. Hash functions by LSH. h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3

We don’t care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We care which one is closer

We care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We don’t care which one is closer

We care which one is closer We don’t care which one is closer Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN).

We care which one is closer We don’t care which one is closer We don’t care which one is closer 7 h 1 h 4 h 3 We are interested in 3-NN. Hash functions by LSH. h 2 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items.

NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN (thus, able to solve 3-NN accurately) Suppose we could (somehow) generate hash functions in this way. data items 8 v 8 v 7 v 6 v 5 v 4 same hashcodes v 1 q different hashcodes Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3

NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN (thus, able to solve 3-NN accurately) Suppose we could (somehow) generate hash functions in this way. data items 8 v 8 v 7 v 6 v 5 v 4 same hashcodes v 1 q different hashcodes Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes.

NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close Not an issue for 3-NN 8 Suppose we could (somehow) generate hash functions in this way. data items v 8 different hashcodes v 7 v 6 v 5 v 4 same hashcodes v 1 q (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes.

NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close different hashcodes Suppose we could (somehow) generate hash functions in this way. data items 8 (thus, able to solve 3-NN accurately) v 7 v 6 v 5 v 4 same hashcodes v 1 q v 8 Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN

NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. 8 (thus, able to solve 3-NN accurately) q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items Suppose we could (somehow) generate hash functions in this way. Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN

Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. 8 Suppose we could (somehow) generate hash functions in this way. q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. NSH : larger Hamming distance between close items Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN

8 Suppose we could (somehow) generate hash functions in this way. q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. NSH : larger Hamming distance between close items Note: Could not distinguish v 6 and v 8 based on their hashcodes. Seemingly counter-intuitive ; however, our paper proves that larger Not an issue for 3-NN Hamming distance leads to higher accuracy in general.

Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH Hamming Distance 0 to 10-NN to 100-NN max distance Original Distance

Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance

Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance

Key Challenge LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes.

Key Challenge NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges.

Key Challenge 9 b selectively for close data items? How to enlarge the Hamming distances 0 Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other .

9 0 selectively for close data items? How to enlarge the Hamming distances b Important Difference between LSH and NSH NSH Hamming Distance LSH Key Challenge 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other .

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 v 4 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 v 4 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query.

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 v 4 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? Then, generate hash functions on this transformed space. 11 q (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 v 3 v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8 We call this new space the transformed space .

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? 11 q (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 v 3 v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.

Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? 11 v 2 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 3 v 1 v 6 v 1 v 4 q v 5 v 7 v 8 (before generating hash functions) Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.

How can we expand the space around an arbitrary query? 11 v 1 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 2 v 3 q v 6 v 1 (before generating hash functions) v 5 v 4 v 7 v 8 Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? Is it easier if we know the query a priori ? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.

11 v 1 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 2 v 3 q v 4 v 1 v 8 v 7 (before generating hash functions) v 6 v 5 Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? Is it easier if we know the query a priori ? v 5 v 6 v 7 v 8 How can we expand the space around an arbitrary query? We call this new space the transformed space . Then, generate hash functions on this transformed space.

Visual illustration of NST Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . )

Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST

Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST pivot 1 pivot 2

Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST query pivot 1 pivot 2

12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. query pivot 1 pivot 2

LSH Workflow Our Contribution NST hash Original Database Search Online Processing NST hash Original Query Query Hashed 13 Query Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing

LSH Workflow Our Contribution NST hash Search Online Processing NST hash Original Query Query Query Hashed 13 Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing Original Database

LSH Workflow Our Contribution hash Search Online Processing NST hash Original Query Query Hashed Query 13 Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing NST Original Database

LSH Workflow Our Contribution Search Online Processing NST hash Original Query Query Hashed Query Transformed 13 Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing NST hash Original Database

LSH Workflow Our Contribution Search NST hash Original Query 13 Transformed DB Hashed DB Query Hashed Transformed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing

LSH Workflow Our Contribution Search NST hash 13 Query Transformed DB Hashed DB Query Hashed Transformed Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing Original Query

LSH Workflow Our Contribution Search hash 13 Transformed Transformed DB Hashed DB Query Hashed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing NST Original Query

LSH Workflow Our Contribution Search 13 Transformed Transformed DB Query Hashed DB Hashed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing NST hash Original Query

LSH Workflow Our Contribution 13 Transformed Query Transformed DB Hashed Hashed DB Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Search Online Processing NST hash Original Query

Our Contribution 13 Transformed Query Hashed Transformed DB Query Hashed DB Big Picture: NSH Workflow Offline Processing LSH Workflow NST hash Original Database Search Online Processing NST hash Original Query

13 Hashed DB Query Hashed Query Transformed DB Transformed Big Picture: NSH Workflow Offline Processing LSH Workflow Our Contribution NST hash Original Database Search Online Processing NST hash Original Query

LSH NSH We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space.

NSH We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space. LSH

We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space. LSH NSH

(i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts 16 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] 3. Spherical Hashing (SpH) [Heo et al., 2012] 2. Spectral Hashing (SH) [Weiss et al., 2009] Note: Higher recall means either k Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 .

Compared Methods : three well-known and five state-of-the-arts 16 3. Spherical Hashing (SpH) [Heo et al., 2012] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] 2. Spectral Hashing (SH) [Weiss et al., 2009] Note: Higher recall means either k Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall.

16 2. Spectral Hashing (SH) [Weiss et al., 2009] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 3. Spherical Hashing (SpH) [Heo et al., 2012] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] k Note: Higher recall means either Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts

2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim:

3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than

4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than

Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast .

1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets:

Neighbor-Sensitive Hashing Yongjoo Park (250, 3, 122, 130, 68, ) - PowerPoint PPT Presentation

Michael Cafarella Barzan Mozafari University of Michigan, Ann Arbor Neighbor-Sensitive Hashing Yongjoo Park (250, 3, 122, 130, 68, ) What are the k most similar items? (109, 33, 92, 87, 161, ), database (50, 83, 22, 230, 98, ), (2, 183,

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Today. Cuckoo hashing. Today. Cuckoo hashing. Johnson-Lindenstrass. Cuckoo hashing. Hashing

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Overview Intro to Hashing Intro to Hashing Hashing with Chaining Whats hashing?

14. Hashing Hash Tables, Pre-Hashing, Hashing, Resolving Collisions using Chaining, Simple

Locality-Sensitive Hashing Documents LSH Metric Spaces Sensitive Function Anil Maheshwari

Locality-Sensitive Hashing LSH Fingerprints References Anil Maheshwari School of Computer

Database Systems Index: Hashing Based on slides by Feifei Li, University of Utah Hashing n

Hashing (Application of Probability) Ashwinee Panda Final CS 70 Lecture! 9 Aug 2018 Overview

Hashing Connections 2-Universal Hash Function Perfect Hashing Anil Maheshwari Proofs

Union-Find [10] In the last class Hashing Collision Handling for Hashing Closed

Hashing Chapter 5 1 Objectives Understand the idea of hashing Compare hashing to sorting

Near Neighbor Search in High Dimensional Data (2) Locality-Sensitive Hashing (continued) LS

Information near-duplicates Minimum hashing; Locality Sensitive Hashing Web Search Information

MIN-HASHING AND LOCALITY SENSITIVE HASHING Thanks to: Rajaraman and Ullman, Mining Massive

A6: Sensitive Data Exposure A6 Sensitive Data Exposure Sensitive data stored or transmitted

Symbol-table problem Symbol table T holding n records : record x Operations on T : key [ x ] key

Datastructures 1 Hash Tables Red Black Trees Week 8 Objectives Hash Tables, Hashing

Hash-BasedIndexes Chapter10

Hash Tables 1 Hash Table in Primary Storage Main parameter B = number of buckets Hash

Chapter 4 Cryptographic hash functions References: A. J. Menezes, P. C. van Oorschot, S. A.

Securing Industrial Control Systems An E2E Integrity Verification Approach Sye-Loong Keoh , Ken

Agreement in HPSG Introduction to HPSG, WS 2007/2008 Monica L. L au Universitt Tbingen

Ali Kamandi kamandi@ce.sharif.edu Spring 2007 Sharif University of Technology

Sambuz

Useful Links

Newsletter

Mail Us