approximate voronoi diagrams techniques tools and
play

Approximate Voronoi Diagrams: Techniques, tools, and applications to - PowerPoint PPT Presentation

Approximate Voronoi Diagrams: Techniques, tools, and applications to k th ANN search Nirman Kumar University of California, Santa-Barbara January 13th, 2016 Similarity Search ? Need similarity search to make sense of the world! When an


  1. Approximate Voronoi Diagrams: Techniques, tools, and applications to k th ANN search Nirman Kumar University of California, Santa-Barbara January 13th, 2016

  2. Similarity Search ? Need similarity search to make sense of the world!

  3. When an appropriate metric is defined Similarity search reduces to NN search

  4. Nearest neighbor search Set of points P : find quickly for a query q , the closest point to q in P

  5. Nearest neighbor search Also important in other domains

  6. Approximate nearest neighbor search (ANN) Find any point x with d ( q, x ) ≤ (1 + ε ) d 1 ( q, P )

  7. Space partitioning Most data structures for NN (or ANN) search partition space

  8. Space partitioning In low dimensions this is an explicit paritioning

  9. Space partitioning In high dimensions the partitioning is implicit (via hash functions)

  10. Voronoi diagrams

  11. Voronoi diagrams Very efficient in dimensions d ≤ 2

  12. Voronoi diagrams Performance degrades sharply - bad even for d = 3

  13. This talk ◮ Construction of Approximate Voronoi Diagrams ◮ Tools used - Quadtrees, WSPD ◮ Construction of AVD for k th ANN ◮ Some open problems

  14. Approximate Voronoi Diagrams (AVD) A space partition as before

  15. Approximate Voronoi Diagrams (AVD) With each region is associated 1 rep (a point of P )

  16. Approximate Voronoi Diagrams (AVD) This rep is a valid ANN for any q in region

  17. Main ideas behind ANN search and AVDs ◮ If the query point is “far” any point is a good ANN ◮ A region can be approximated well by cubes ◮ Point location can be done in a set of cubes efficiently

  18. Tool 1: Quadtrees A quadtree - intuitively [0 , 1] × [0 , 1]

  19. Tool 1: Quadtrees A quadtree on points f i e a g h c i a d g b d h b c e f

  20. Tool 1: Quadtrees The compressed version f i e a g h c i a d g b d h b c e f

  21. Tool 1: Quadtrees Point Location ≡ find leaf node containing a point

  22. Tool 1: Quadtrees Height h : O (log h ) time - O (log log n ) for balanced tree!

  23. Tool 1: Quadtrees But height not bounded as function of n

  24. Tool 1: Quadtrees Use compressed quadtree - height bounded by O ( n )

  25. Tool 2: Well separated pairs decomposition How many distances among points - Ω( n 2 )

  26. Tool 2: Well separated pairs decomposition What if distances within (1 ± ε ) are considered the same?

  27. Tool 2: Well separated pairs decomposition About O ( n/ε d ) different distinct distances upto (1 ± ε )

  28. Tool 2: Well separated pairs decomposition ◮ How can we represent them? ◮ Given a pair of points, which bucket does it belong to?

  29. Tool 2: Well separated pairs decomposition The WSPD data structure captures this

  30. Tool 2: Well separated pairs decomposition More formally ◮ A collection of pairs A i , B i ⊂ P ◮ A i ∩ B i = ∅ ◮ Every pair of points is separated by some A i , B i ◮ Each pair A i , B i is well separated

  31. Tool 2: Well separated pairs decomposition A well separated pair is a dumbbell r 2 ℓ ≥ 1 /ε max { r 1 , r 2 } r 1

  32. Tool 2: Well separated pairs decomposition WSPD example a b f c d e f a d e c b A 1 = { a, b, c } , B 1 = { e } A 1 = { a } , B 1 = { b, c } d e f . . . a c b

  33. Tool 2: Well separated pairs decomposition Main result about WSPDs There is a ε − 1 -WSPD of size O ( nε − d ) - It can be constructed in O ( n log n + nε − d ) time

  34. AVD results The main result ◮ O ( n/ε d ) cells ◮ Query time - O (log( n/ε ))

  35. The AVD algorithm Construct a 8 -WSPD for the point set

  36. The AVD algorithm Let ( A i , B i ) for i = 1 , . . . , m be the pairs

  37. The AVD algorithm For each pair do some processing - output some cells

  38. The AVD algorithm Preprocess them for point location

  39. The AVD algorithm So what is the processing per pair?

  40. The AVD algorithm Consider a WSPD dumbbell

  41. The AVD algorithm Concentric balls increasing radii - r/ 4 to ≈ r/ε

  42. The AVD algorithm Tile each ball (rad x ) by cubes of size ≈ εx

  43. The AVD algorithm Store the ε/c ANN for some point in each cell

  44. So why does it work? Every pair of competing points is resolved

  45. So why does it work? p 1 , p 2 resolved by the WSPD pair separating them

  46. So why does it work? p 2 p 1

  47. So why does it work? q p 2 p 1

  48. So why does it work? q p 2 p 1

  49. So why does it work? p 2 p 1 q

  50. Bounding the AVD complexity The shown method gives O ( n/ε d log 1 /ε ) cubes

  51. Bounding the AVD complexity This can be improved to O ( n/ε d )

  52. k th ANN search Given q output a point u ∈ P such that: (1 − ε ) d k ( q, P ) ≤ d ( q, u ) ≤ (1 + ε ) d k ( q, P )

  53. Applications of k th ANN search ◮ Density estimation ◮ Functions of the form : F ( q ) = � k i =1 f ( d i ( q, P )) ◮ k th ANN on balls

  54. Applications of k th ANN search Density estimation density ≈ #points area

  55. The result AVD for k th ANN O (( n/k ) ε − d log 1 /ε ) cells ◮ ◮ Query time - O (log( n/ ( kε )))

  56. Quorum clustering

  57. Quorum clustering Find smallest ball containing k points

  58. Quorum clustering Find smallest ball containing k points

  59. Quorum clustering Remove points and repeat

  60. Quorum clustering Remove points and repeat

  61. Quorum clustering Remove points and repeat

  62. Quorum clustering Remove points and repeat

  63. Quorum clustering Remove points and repeat

  64. Quorum clustering Remove points and repeat

  65. Quorum clustering A way to summarize points

  66. Quorum clustering Has properties favorable for k th ANN problem

  67. Quorum clustering Quorum clustering too expensive to compute

  68. Quorum clustering Can compute approximate quorum clustering

  69. Quorum clustering ◮ Computed in: O ( n log d n ) time in I R d [Carmi, Dolev, Har-Peled, Katz and Segal, 2005] R d [Har-Peled and K., ◮ Computed in: O ( n log n ) time in I 2012]

  70. Why is quorum clustering useful r 1 q c 1 c 2 c 3 r 2 r 3 ◮ x = d k ( q, P ) ◮ r 1 ≤ x ◮ x + r 1 ≥ d ( q, c 1 ) = ⇒ d ( q, c 1 ) ≤ 2 x ◮ x ≤ d ( q, c 1 ) + r 1 ≤ 3 x

  71. Refining the approximation Just as in AVDs generate a list of cells

  72. Refining the approximation

  73. Refining the approximation

  74. Refining the approximation For closest ball use ANN data R d +1 structure in I

  75. Refining the approximation R d +1 b = b ( c, r ) → ( c, r ) ∈ I

  76. Refining the approximation Some cells generated by AVD for ball centers

  77. Refining the approximation Store some info with each cell

  78. Refining the approximation A k th ANN, and approximate closest ball

  79. Open problems ◮ In high dimensions, is there a data structure for k th NN whose space requirement is f ( n/k ) ? ◮ There is an AVD for weighted ANN similar to AVD as shown - is there an extension to weighted k th ANN?

Recommend


More recommend