sublinear time nearest neighbor search over generalized
play

Sublinear Time Nearest Neighbor Search over Generalized Weighted - PowerPoint PPT Presentation

Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code:


  1. Sublinear Time Nearest Neighbor Search over Generalized Weighted Space Yifan Lei ๐‘๐ฃ๐›๐จ๐ก ๐ˆ๐ฏ๐›๐จ๐ก Mohan S. Kankanhalli Anthony K. H. Tung School of Computing, National University of Singapore Source code: https://github.com/1flei/aws_alsh 2019/6/11 1

  2. Applications ๏ฎ Nearest Neighbor Search (NNS) is widely used ๏ฎ Example: booking hotel for ICML 2019 ๏ฑ Considering the conditions to the convention centre, i.e., price, distance, and rating ๏ฑ Query ๐‘Ÿ : a hotel that the user booked before and felt excellent ๏ฑ Weight vector ๐‘ฅ : different users have different preference to the hotel conditions, which lead to different choices of hotels Price Distance Rating Price Distance Rating Hotel ๐‘Ÿ 300 7 10 Hotel 1 400 8 10 Hotel 2 350 6 8 ๐‘ฅ = 0.001, 1, 1 โ†’ Hotel 2 Hotel 3 250 9 8 โ†’ Hotel 1 ๐‘ฅ = 0, 1, 3 ๐‘ฅ = 0.001, โˆ’1, 1 โ†’ Hotel 3 Hotel 4 200 6 6 ๐‘ฅ = โˆ’0.001, โˆ’1, โˆ’1 โ†’ Hotel 4 2019/6/11 2

  3. Problem Defi finition ๏ฎ Given ๏ฑ A dataset ๐’  of ๐‘œ data objects in โ„ ๐‘’ ๏ฑ A query ๐‘Ÿ โˆˆ โ„ ๐‘’ with a weight vector ๐‘ฅ โˆˆ โ„ ๐‘’ ๏ฑ Measure: the Generalized Weighted Square Euclidean Distance (GWSED) ๐‘’ ๐‘ฅ ๐‘’ ๐‘ฅ ๐‘— ๐‘ ๐‘— โˆ’ ๐‘Ÿ ๐‘— 2 ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ = เท ๐‘—=1 ๏ฎ Nearest Neighbor Search (NNS) over ๐‘’ ๐‘ฅ ๏ฑ To find ๐‘ โˆ— โˆˆ ๐’  s.t. ๐‘ โˆ— = arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ๏ฎ This problem is very fundamental ๏ฑ Furthest Neighbor Search (FNS) and MIPS can be reduced to NNS over ๐‘’ ๐‘ฅ , ๏ฑ i.e., ๐‘ฅ ๐‘— = โˆ’1, โˆ€๐‘— โŸน arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ = arg max ๐‘ โˆ’ ๐‘Ÿ ๐‘โˆˆ๐’  2019/6/11 3

  4. Background and Motivations ๏ฎ Locality-Sensitive Hashing (LSH) ๏ฑ Sublinear time for Near Neighbor Search ๏ฑ Insight: construct a hash function โ„Ž s.t. ๐‘„๐‘ [โ„Ž ๐‘ = โ„Ž(๐‘Ÿ)] is monotonic in ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) ๏ฑ Hidden condition: ๐ธ๐‘—๐‘ก๐‘ข(๐‘, ๐‘Ÿ) must be a metric ๏ฎ LSH schemes cannot solve NNS over ๐‘’ ๐‘ฅ directly ( ๐‘’ ๐‘ฅ is no longer a metric if ๐‘ฅ ๐‘— < 0 ) ๏ฎ There is NO sublinear method for this problem ๏ฎ Motivations ๏ฑ Similar to ๐‘’ ๐‘ฅ , inner product (i.e., ๐‘ ๐‘ˆ ๐‘Ÿ ) is also not a metric ๏ฑ However, Shrivastava & Li (2014) introduced a sublinear time method based Asymmetric LSH which constructs ๐‘„(๐‘) and ๐‘…(๐‘Ÿ) for data objects ๐‘ โˆˆ ๐’  and each query ๐‘Ÿ , respectively. 2019/6/11 4

  5. Spherical Asymmetric Transformation ๏ฎ Negative result: ๏ฑ There is no Asymmetric LSH family over โ„ ๐‘’ for NNS over ๐‘’ ๐‘ฅ ( Lemma 1 and Theorem 2 ) ๏ฎ Spherical Asymmetric Transformation (SphAT): โ„ ๐‘’ โ†’ โ„ 2๐‘’ ๐‘„ ๐‘ = ๐ท๐‘ƒ๐‘‡ ๐‘ ; ๐‘‡๐ฝ๐‘‚ ๐‘ ๐‘… ๐‘Ÿ, ๐‘ฅ = ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ ; ๐‘ฅโจ‚๐‘‡๐ฝ๐‘‚ ๐‘Ÿ ๏ฑ where ๐‘ฅโจ‚๐ท๐‘ƒ๐‘‡ ๐‘Ÿ = (๐‘ฅ 1 cos ๐‘Ÿ 1 , ๐‘ฅ 2 cos ๐‘Ÿ 2 , โ€ฆ , ๐‘ฅ ๐‘’ cos ๐‘Ÿ ๐‘’ ) ๏ฎ Properties of SphAT: ๏ฑ ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ ~ Euclidean distance (or Angular distance) between ๐‘„ ๐‘ and ๐‘…(๐‘Ÿ, ๐‘ฅ) ๏ฑ SphAT is weight-oblivious (because ๐‘„(โ‹…) is independent of ๐‘ฅ ) โŸน build index before ๐‘Ÿ and ๐‘ฅ 2019/6/11 5

  6. Two Proposed Methods ๏ฎ SL-ALSH = SphAT + E2LSH ๏ฑ SphAT: arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg min ๐‘„ ๐‘ โˆ’ ๐‘… ๐‘Ÿ, ๐‘ฅ ๐‘โˆˆ๐’  ๏ฑ Apply E2LSH on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Euclidean distance ๏ฎ S2-ALSH = SphAT + SimHash ๐‘„ ๐‘ ๐‘ˆ ๐‘…(๐‘Ÿ,๐‘ฅ) ๏ฑ SphAT: arg min ๐‘โˆˆ๐’  ๐‘’ ๐‘ฅ ๐‘, ๐‘Ÿ โ‡’ arg max ๐‘„(๐‘) ๐‘…(๐‘Ÿ,๐‘ฅ) ๐‘โˆˆ๐’  ๏ฑ Apply SimHash on ๐‘„ ๐‘ and ๐‘… ๐‘Ÿ, ๐‘ฅ for NNS over Angular distance ๏ฎ Main Results ๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘ = โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ( Lemmas 3 and 4 ) ๏ฑ SL-ALSH and S2-ALSH solve the problem of NNS over ๐‘’ ๐‘ฅ with sublinear time ( Theorems 3 and 4 ) 2019/6/11 6

  7. Datasets and Settings ๏ฎ Datasets ๏ฑ Mnist ( ๐‘œ = 60,000 and ๐‘’ = 784 ) ๏ฑ Sift ( ๐‘œ = 1,000,000 and ๐‘’ = 128 ) ๏ฑ Movielens ( ๐‘œ = 52,889 and ๐‘’ = 150 ) ๏ฎ Five types of weight vector ๐‘ฅ Types Illustrations Identical All โ€œ1โ€ Uniformly distributed in 0,1 ๐‘’ Binary Normal ๐‘’ -dimensional normal distribution ๐’ช(0, ๐ฝ) Uniformly distributed in 0,1 ๐‘’ Uniform Negative All โ€œ - 1โ€ 2019/6/11 7

  8. Bucketing Experiments Figure: The best fraction of dataset to scan to achieve certain level of recalls ( lower is better ). 2019/6/11 8

  9. Conclusions ๏ฎ Demonstrate that there is no Asymmetric LSH family over โ„ ๐‘’ for the problem of NNS over ๐‘’ ๐‘ฅ ๏ฎ Introduce a novel SphAT from โ„ ๐‘’ to โ„ 2๐‘’ ๏ฑ SphAT is weight-oblivious ๏ฑ ๐‘„๐‘ [โ„Ž ๐‘„ ๐‘ = โ„Ž(๐‘…(๐‘Ÿ, ๐‘ฅ))] is monotonic in ๐‘’ ๐‘ฅ (๐‘, ๐‘Ÿ) ๏ฎ Propose the first two sublinear time methods SL-ALSH and S2-ALSH for NNS over ๐‘’ ๐‘ฅ ๏ฎ Extensive experiments verify that SL-ALSH and S2-ALSH answer the NNS queries in sublinear time and support various types of weight vectors. 2019/6/11 9

  10. Poster Session [Poster #82: Tue Jun 11th 06:30 โ€” 09:00 PM @Pacific Ballroom] Thank you for your attention! 2019/6/11 10

Recommend


More recommend