Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei ๐๐ฃ๐๐จ๐ก ๐๐ฏ๐๐จ๐ก Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code: https://github.com/1flei/aws_alsh 2019/6/11 1
Applications ๏ฎ Nearest Neighbor Search (NNS) is widely used ๏ฎ Example: booking hotel for ICML 2019 ๏ฑ Considering the conditions to the convention centre, i.e., price, distance, and rating ๏ฑ Query ๐ : a hotel that the user booked before and felt excellent ๏ฑ Weight vector ๐ฅ : different users have different preference to the hotel conditions, which lead to different choices of hotels Price Distance Rating Price Distance Rating Hotel ๐ 300 7 10 Hotel 1 400 8 10 Hotel 2 350 6 8 ๐ฅ = 0.001, 1, 1 โ Hotel 2 Hotel 3 250 9 8 โ Hotel 1 ๐ฅ = 0, 1, 3 ๐ฅ = 0.001, โ1, 1 โ Hotel 3 Hotel 4 200 6 6 ๐ฅ = โ0.001, โ1, โ1 โ Hotel 4 2019/6/11 2
Problem Defi finition ๏ฎ Given ๏ฑ A dataset ๐ of ๐ data objects in โ ๐ ๏ฑ A query ๐ โ โ ๐ with a weight vector ๐ฅ โ โ ๐ ๏ฑ Measure: the Generalized Weighted Square Euclidean Distance (GWSED) ๐ ๐ฅ ๐ ๐ฅ ๐ ๐ ๐ โ ๐ ๐ 2 ๐ ๐ฅ ๐, ๐ = เท ๐=1 ๏ฎ Nearest Neighbor Search (NNS) over ๐ ๐ฅ ๏ฑ To find ๐ โ โ ๐ s.t. ๐ โ = arg min ๐โ๐ ๐ ๐ฅ (๐, ๐) ๏ฎ This problem is very fundamental ๏ฑ Furthest Neighbor Search (FNS) and MIPS can be reduced to NNS over ๐ ๐ฅ , ๏ฑ i.e., ๐ฅ ๐ = โ1, โ๐ โน arg min ๐โ๐ ๐ ๐ฅ ๐, ๐ = arg max ๐ โ ๐ ๐โ๐ 2019/6/11 3
Background and Motivations ๏ฎ Locality-Sensitive Hashing (LSH) ๏ฑ Sublinear time for Near Neighbor Search ๏ฑ Insight: construct a hash function โ s.t. ๐๐ [โ ๐ = โ(๐)] is monotonic in ๐ธ๐๐ก๐ข(๐, ๐) ๏ฑ Hidden condition: ๐ธ๐๐ก๐ข(๐, ๐) must be a metric ๏ฎ LSH schemes cannot solve NNS over ๐ ๐ฅ directly ( ๐ ๐ฅ is no longer a metric if ๐ฅ ๐ < 0 ) ๏ฎ There is NO sublinear method for this problem ๏ฎ Motivations ๏ฑ Similar to ๐ ๐ฅ , inner product (i.e., ๐ ๐ ๐ ) is also not a metric ๏ฑ However, Shrivastava & Li (2014) introduced a sublinear time method based Asymmetric LSH which constructs ๐(๐) and ๐ (๐) for data objects ๐ โ ๐ and each query ๐ , respectively. 2019/6/11 4
Spherical Asymmetric Transformation ๏ฎ Negative result: ๏ฑ There is no Asymmetric LSH family over โ ๐ for NNS over ๐ ๐ฅ ( Lemma 1 and Theorem 2 ) ๏ฎ Spherical Asymmetric Transformation (SphAT): โ ๐ โ โ 2๐ ๐ ๐ = ๐ท๐๐ ๐ ; ๐๐ฝ๐ ๐ ๐ ๐, ๐ฅ = ๐ฅโจ๐ท๐๐ ๐ ; ๐ฅโจ๐๐ฝ๐ ๐ ๏ฑ where ๐ฅโจ๐ท๐๐ ๐ = (๐ฅ 1 cos ๐ 1 , ๐ฅ 2 cos ๐ 2 , โฆ , ๐ฅ ๐ cos ๐ ๐ ) ๏ฎ Properties of SphAT: ๏ฑ ๐ ๐ฅ ๐, ๐ ~ Euclidean distance (or Angular distance) between ๐ ๐ and ๐ (๐, ๐ฅ) ๏ฑ SphAT is weight-oblivious (because ๐(โ ) is independent of ๐ฅ ) โน build index before ๐ and ๐ฅ 2019/6/11 5
Two Proposed Methods ๏ฎ SL-ALSH = SphAT + E2LSH ๏ฑ SphAT: arg min ๐โ๐ ๐ ๐ฅ ๐, ๐ โ arg min ๐ ๐ โ ๐ ๐, ๐ฅ ๐โ๐ ๏ฑ Apply E2LSH on ๐ ๐ and ๐ ๐, ๐ฅ for NNS over Euclidean distance ๏ฎ S2-ALSH = SphAT + SimHash ๐ ๐ ๐ ๐ (๐,๐ฅ) ๏ฑ SphAT: arg min ๐โ๐ ๐ ๐ฅ ๐, ๐ โ arg max ๐(๐) ๐ (๐,๐ฅ) ๐โ๐ ๏ฑ Apply SimHash on ๐ ๐ and ๐ ๐, ๐ฅ for NNS over Angular distance ๏ฎ Main Results ๏ฑ ๐๐ [โ ๐ ๐ = โ(๐ (๐, ๐ฅ))] is monotonic in ๐ ๐ฅ (๐, ๐) ( Lemmas 3 and 4 ) ๏ฑ SL-ALSH and S2-ALSH solve the problem of NNS over ๐ ๐ฅ with sublinear time ( Theorems 3 and 4 ) 2019/6/11 6
Datasets and Settings ๏ฎ Datasets ๏ฑ Mnist ( ๐ = 60,000 and ๐ = 784 ) ๏ฑ Sift ( ๐ = 1,000,000 and ๐ = 128 ) ๏ฑ Movielens ( ๐ = 52,889 and ๐ = 150 ) ๏ฎ Five types of weight vector ๐ฅ Types Illustrations Identical All โ1โ Uniformly distributed in 0,1 ๐ Binary Normal ๐ -dimensional normal distribution ๐ช(0, ๐ฝ) Uniformly distributed in 0,1 ๐ Uniform Negative All โ - 1โ 2019/6/11 7
Bucketing Experiments Figure: The best fraction of dataset to scan to achieve certain level of recalls ( lower is better ). 2019/6/11 8
Conclusions ๏ฎ Demonstrate that there is no Asymmetric LSH family over โ ๐ for the problem of NNS over ๐ ๐ฅ ๏ฎ Introduce a novel SphAT from โ ๐ to โ 2๐ ๏ฑ SphAT is weight-oblivious ๏ฑ ๐๐ [โ ๐ ๐ = โ(๐ (๐, ๐ฅ))] is monotonic in ๐ ๐ฅ (๐, ๐) ๏ฎ Propose the first two sublinear time methods SL-ALSH and S2-ALSH for NNS over ๐ ๐ฅ ๏ฎ Extensive experiments verify that SL-ALSH and S2-ALSH answer the NNS queries in sublinear time and support various types of weight vectors. 2019/6/11 9
Poster Session [Poster #82: Tue Jun 11th 06:30 โ 09:00 PM @Pacific Ballroom] Thank you for your attention! 2019/6/11 10
Recommend
More recommend