h v i ) 4 Let h user-dist( q v i ) Hamming-dist( h q be a function that produces a hashcode. Then, Hashed DB First proposed by [Datar et al., 2004] and [Charikar, 2002] Hashed Query Locality-Sensitive Hashing for k NN LSH: Use similarity-preserving hash functions Hamming-dist( h ( q ) , h ( v i ) ) ∝ user-dist( q , v i ) h ( q ) hashcodes → lookup operations in a Look up hash table → fast. Perfect hash functions may not exist, h ( v 1 ) h ( v 2 ) h ( v 3 ) h ( v 4 ) or extremely hard to find → approximate. h ( v 5 ) h ( v 6 ) h ( v 7 ) h ( v 8 ) Note: Longer hashcode makes the searching slower. h ( v 9 ) h ( v 10 ) h ( v 11 ) h ( v 12 )
0000 00 0 1000 10 1 1100 11 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 1 Hamming distance = 2 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Hamming distance = 0 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH v 2 v 3
0000 00 0 1000 10 1 1100 11 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 1 Hamming distance = 2 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Hamming distance = 0 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH v 2 v 3
0000 00 0 1000 10 1 1100 11 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 1 Hamming distance = 2 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Hamming distance = 0 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH v 2 v 3
0000 00 10 1000 11 1100 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? distance from q other data points query point Hashcodes as a proxy Hamming distance = 0 Hamming distance = 2 neighbors assigned the same hashcode Hamming distance = 1 5 Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hashcodes Generation for LSH 0 1 v 2 v 3
0000 0 1000 1 1100 1110 1111 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hashcodes as a proxy neighbors assigned the same hashcode Hamming distance = 2 Hamming distance = 0 other data points query point distance from q Hamming distance = 1 5 Suppose LSH generates hashcodes of length 4. v 7 q v 1 v 4 v 5 v 6 v 8 h 4 h 1 h 2 h 3 Hashcodes Generation for LSH 00 10 11 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
0 00 10 1 11 We can’t distinguish these two For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? query point distance from q Suppose LSH generates hashcodes of length 4. Hashcodes as a proxy other data points Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 5 h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q neighbors assigned the same hashcode Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3
00 0 10 1 11 For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? other data points query point distance from q h 4 Hashcodes as a proxy Hamming distance = 0 Hamming distance = 1 Suppose LSH generates hashcodes of length 4. 5 h 3 neighbors assigned the same hashcode h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hamming distance = 2 Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two
00 0 1 10 11 Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ? Hamming distance = 0 other data points query point distance from q neighbors assigned the same hashcode Hashcodes as a proxy Suppose LSH generates hashcodes of length 4. h 4 5 h 3 Hamming distance = 2 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Hamming distance = 1 Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two → For 3-NN, approximate
00 0 1 10 11 5 distance from q Hamming distance = 0 Hamming distance = 1 Hamming distance = 2 neighbors assigned the same hashcode Suppose LSH generates hashcodes of length 4. h 4 h 3 h 2 other data points h 1 v 8 v 7 v 6 v 5 v 4 Hashcodes as a proxy v 1 q query point Hashcodes Generation for LSH 0000 1000 1100 1110 1111 v 2 v 3 We can’t distinguish these two → For 3-NN, approximate Motivation A new scheme able to distinguish v 3 and v 4 based on their hashcodes ?
6 Outline 1. Background and Motivation 2. NSH Intuition 3. NSH Algorithm 4. Experiments
We care which one is closer We don’t care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 h 3 h 4 h 2 We are interested in 3-NN. Hash functions by LSH. h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3
We don’t care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We care which one is closer
We care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We don’t care which one is closer
We care which one is closer We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We don’t care which one is closer
We care which one is closer We don’t care which one is closer Our Idea Generating hash functions close to the query so that we can better distinguish the close items. 7 We are interested in 3-NN. Hash functions by LSH. h 4 h 3 h 2 h 1 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 We don’t care which one is closer Observation : h 3 and h 4 are wasted (for 3-NN).
We care which one is closer We don’t care which one is closer We don’t care which one is closer 7 h 1 h 4 h 3 We are interested in 3-NN. Hash functions by LSH. h 2 v 8 v 7 v 6 v 5 v 4 v 1 q Neighbor-Sensitive Hashing Intuition v 2 v 3 Observation : h 3 and h 4 are wasted (for 3-NN). Our Idea Generating hash functions close to the query so that we can better distinguish the close items.
NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN (thus, able to solve 3-NN accurately) Suppose we could (somehow) generate hash functions in this way. data items 8 v 8 v 7 v 6 v 5 v 4 same hashcodes v 1 q different hashcodes Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3
NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN (thus, able to solve 3-NN accurately) Suppose we could (somehow) generate hash functions in this way. data items 8 v 8 v 7 v 6 v 5 v 4 same hashcodes v 1 q different hashcodes Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes.
NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close Not an issue for 3-NN 8 Suppose we could (somehow) generate hash functions in this way. data items v 8 different hashcodes v 7 v 6 v 5 v 4 same hashcodes v 1 q (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes.
NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close different hashcodes Suppose we could (somehow) generate hash functions in this way. data items 8 (thus, able to solve 3-NN accurately) v 7 v 6 v 5 v 4 same hashcodes v 1 q v 8 Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN
NSH : larger Hamming distance between close items Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. 8 (thus, able to solve 3-NN accurately) q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items Suppose we could (somehow) generate hash functions in this way. Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN
Seemingly counter-intuitive ; however, our paper proves that larger Hamming distance leads to higher accuracy in general. 8 Suppose we could (somehow) generate hash functions in this way. q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. NSH : larger Hamming distance between close items Note: Could not distinguish v 6 and v 8 based on their hashcodes. Not an issue for 3-NN
8 Suppose we could (somehow) generate hash functions in this way. q v 1 same hashcodes v 4 v 5 v 6 v 7 v 8 different hashcodes data items (thus, able to solve 3-NN accurately) Neighbor-Sensitive Hashing Intuition (cont’d) h 1 h 2 h 3 h 4 0000 1000 1100 1110 1111 v 2 v 3 Difference in NSH’s Intuition: A decade of existing work : small Hamming distance between close We could distinguish v 3 and v 4 based on their hashcodes. NSH : larger Hamming distance between close items Note: Could not distinguish v 6 and v 8 based on their hashcodes. Seemingly counter-intuitive ; however, our paper proves that larger Not an issue for 3-NN Hamming distance leads to higher accuracy in general.
Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH Hamming Distance 0 to 10-NN to 100-NN max distance Original Distance
Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance
Key Challenge A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance
Key Challenge LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes.
Key Challenge NSH : Higher distinguishing-power for the points that are close each other . 9 selectively for close data items? How to enlarge the Hamming distances 0 b Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges.
Key Challenge 9 b selectively for close data items? How to enlarge the Hamming distances 0 Important Difference between LSH and NSH NSH Hamming Distance LSH 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other .
9 0 selectively for close data items? How to enlarge the Hamming distances b Important Difference between LSH and NSH NSH Hamming Distance LSH Key Challenge 0 to 10-NN to 100-NN max distance Original Distance A larger slope indicates higher distinguishing-power based on hashcodes. LSH : uniform distinguishing-power over all distance ranges. NSH : Higher distinguishing-power for the points that are close each other .
10 Outline 1. Background and Motivation 2. NSH Intuition 3. NSH Algorithm 4. Experiments
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 v 4 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 v 4 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query.
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? We call this new space the transformed space . Then, generate hash functions on this transformed space. 11 v 4 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 3 q v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? Then, generate hash functions on this transformed space. 11 q (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 v 3 v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8 We call this new space the transformed space .
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? Key Questions How can we expand the space around a query? 11 q (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 v 3 v 2 v 1 q (before generating hash functions) v 8 v 7 v 6 v 5 v 4 v 1 Neighbor-Sensitive Hashing Overview v 2 v 3 Transform data points to expand the space around the query. v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.
Is it easier if we know the query a priori ? How can we expand the space around an arbitrary query? 11 v 2 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 3 v 1 v 6 v 1 v 4 q v 5 v 7 v 8 (before generating hash functions) Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.
How can we expand the space around an arbitrary query? 11 v 1 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 2 v 3 q v 6 v 1 (before generating hash functions) v 5 v 4 v 7 v 8 Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? Is it easier if we know the query a priori ? v 5 v 6 v 7 v 8 We call this new space the transformed space . Then, generate hash functions on this transformed space.
11 v 1 (thus, convert data points to hashcodes accordingly) h 4 h 3 h 2 h 1 v 4 q v 2 v 3 q v 4 v 1 v 8 v 7 (before generating hash functions) v 6 v 5 Neighbor-Sensitive Hashing Overview v 2 v 3 Key Questions Transform data points to expand the space around the query. How can we expand the space around a query? Is it easier if we know the query a priori ? v 5 v 6 v 7 v 8 How can we expand the space around an arbitrary query? We call this new space the transformed space . Then, generate hash functions on this transformed space.
Visual illustration of NST Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . )
Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST
Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST pivot 1 pivot 2
Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST query pivot 1 pivot 2
Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. 12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST query pivot 1 pivot 2
12 Neighbor-Sensitive Transformation We expand the space around an arbitrary query using our proposed Neighbor-Sensitive Transformation (NST). (e.g., f ( v 1 ) , f ( v 2 ) , . . . ) Visual illustration of NST Our Formal Claim (Theorem 2) Using NST with regular hash functions produces higher accuracy than LSH. query pivot 1 pivot 2
LSH Workflow Our Contribution NST hash Original Database Search Online Processing NST hash Original Query Query Hashed 13 Query Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing
LSH Workflow Our Contribution NST hash Search Online Processing NST hash Original Query Query Query Hashed 13 Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing Original Database
LSH Workflow Our Contribution hash Search Online Processing NST hash Original Query Query Hashed Query 13 Transformed Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing NST Original Database
LSH Workflow Our Contribution Search Online Processing NST hash Original Query Query Hashed Query Transformed 13 Hashed DB DB Transformed Big Picture: NSH Workflow Offline Processing NST hash Original Database
LSH Workflow Our Contribution Search NST hash Original Query 13 Transformed DB Hashed DB Query Hashed Transformed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing
LSH Workflow Our Contribution Search NST hash 13 Query Transformed DB Hashed DB Query Hashed Transformed Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing Original Query
LSH Workflow Our Contribution Search hash 13 Transformed Transformed DB Hashed DB Query Hashed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing NST Original Query
LSH Workflow Our Contribution Search 13 Transformed Transformed DB Query Hashed DB Hashed Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Online Processing NST hash Original Query
LSH Workflow Our Contribution 13 Transformed Query Transformed DB Hashed Hashed DB Query Big Picture: NSH Workflow Offline Processing NST hash Original Database Search Online Processing NST hash Original Query
Our Contribution 13 Transformed Query Hashed Transformed DB Query Hashed DB Big Picture: NSH Workflow Offline Processing LSH Workflow NST hash Original Database Search Online Processing NST hash Original Query
13 Hashed DB Query Hashed Query Transformed DB Transformed Big Picture: NSH Workflow Offline Processing LSH Workflow Our Contribution NST hash Original Database Search Online Processing NST hash Original Query
LSH NSH We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space.
NSH We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space. LSH
We visualized the hash functions Dataset: five 2D normal distributions, Generated 4 hash functions Hash functions for NSH were generated in the transformed space. 14 Neighbor-Sensitive Hashing Visualized in the original space. LSH NSH
15 Outline 1. Background and Motivation 2. NSH Intuition 3. NSH Algorithm 4. Experiments
(i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts 16 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] 3. Spherical Hashing (SpH) [Heo et al., 2012] 2. Spectral Hashing (SH) [Weiss et al., 2009] Note: Higher recall means either k Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 .
Compared Methods : three well-known and five state-of-the-arts 16 3. Spherical Hashing (SpH) [Heo et al., 2012] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] 2. Spectral Hashing (SH) [Weiss et al., 2009] Note: Higher recall means either k Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall.
16 2. Spectral Hashing (SH) [Weiss et al., 2009] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 3. Spherical Hashing (SpH) [Heo et al., 2012] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] k Note: Higher recall means either Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts
16 2. Spectral Hashing (SH) [Weiss et al., 2009] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 3. Spherical Hashing (SpH) [Heo et al., 2012] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] k Note: Higher recall means either Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts
16 2. Spectral Hashing (SH) [Weiss et al., 2009] Involves data transformations for different purposes 8. Kernelized Supervised Hashing (KSH) [Kulis and Grauman, 2012] 7. Complementary Projection Hashing (CPH) [Jin et al., 2013] 6. Compressed Hashing (CH) [Lin et al., 2013] 5. Anchor Graph Hashing (AGH) [Liu et al., 2011] 4. Data Sensitive Hashing (CPH) [Gao et al., 2014] 3. Spherical Hashing (SpH) [Heo et al., 2012] 1. Locality Sensitive Hashing (LSH) [Datar et al., 2004] k Note: Higher recall means either Experiment Setup Quality Metric recall ( k )@ r = (# of true k NN in the retrieved) × 100 . (i) more accurate searching for the same time budget, or (ii) faster searching for the same target recall. Compared Methods : three well-known and five state-of-the-arts
2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim:
2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim:
3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than
4. NSH’s hash function generation was reasonably fast . Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than
Real-World Datasets: 1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast .
1. NSH achieved “larger Hamming distances between close data items” compared methods. compared methods. 1. MNIST: 69K hand-written digit images 2. TINY: 80 million image (GIST) descriptors 3. SIFT: 50 million image (SIFT) descriptors 17 Experimental Claim and Datasets Our Experimental Claim: 2. NSH showed higher recalls (for fixed hashcode sizes) than 3. NSH showed faster search speed (for target recalls) than 4. NSH’s hash function generation was reasonably fast . Real-World Datasets:
Recommend
More recommend