approximate nearest neighbor problem improving query time
play

Approximate Nearest Neighbor Problem: Improving Query Time CS468, - PowerPoint PPT Presentation

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline d ( d 1) / 2 Reducing the constant from O to O in query time Need to know ahead of time Preprocessing


  1. Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006

  2. Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc.

  3. Outline ǫ − d � ǫ − ( d − 1) / 2 � � � • Reducing the ”constant” from O to O in query time • Need to know ǫ ahead of time – Preprocessing time and storage feature O ( ǫ − d ) , O ( ǫ − ( d − 1) / 2 ) etc. • Timothy M. Chan. Approximate Nearest Neighbor Queries Revisited . Discrete and Computational Geometry 1998. – Decomposition of space into cones – BBD-tree for range searching in R d − k + point location in R k • Kenneth Clarkson. An Algorithm for Approximate Closest-point Queries . SoCG 1994. – Additional log( ρ/ǫ ) in space complexity – Polytope approximation in R d +1

  4. Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction

  5. Chen’s Algorithm: Motivation ( 1 + ǫ )-ANN among (sorted) points in a narrow cone q O (log n ) by binary search Need a data structure that returns a sorted points given q and a cone direction Uses the BBD-tree data structure Given a query point q ∈ R d and a radius r one can find O (log n ) cells of the BBD-tree which contain B ( q, r ) and are contained in B ( q, 2 r ) . This takes O (log n ) time Use for approximate range searching in R d − 1

  6. Conic ANN (with a Hint) Input: Query point q and a 2 -approximation r to the NN distance Output: A points s such that || q − s || ≤ (1 + ǫ ) || q − p || � where p is the NN inside a cone with apex q and angle δ = ǫ/ 16 r δ p q s Note: s need not be in the cone! Note: The cone is fixed (not a part of input, mod. translation to q )

  7. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q

  8. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone

  9. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s p true NN ( 1 + ǫ )-ANN (returned from that cone’s data structure) q

  10. Main ( 1 + ǫ )-ANN Algorithm Uses the ”conic-ANN with a hint” as a subrotine Query (given only q ) • Obtain r by [Arya and Mount 1998] • Get one point per data structure, return the one closest to q Preprocessing ”floating” • ”Tile” R d with O ( ǫ − ( d − 1) / 2 ) cones of angle δ = Θ( √ ǫ ) • Build a ”conic-ANN” data structure for each cone Correctness s Query time p O ( ǫ − ( d − 1) / 2 log n ) true NN ( 1 + ǫ )-ANN (returned from [ # of cones] [conic query] that cone’s data structure) q

  11. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ d -axis δ r q

  12. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min δ r s q δr 2 δr

  13. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr

  14. Conic-ANN Data Structure For preprocessing given only direction of the cone (wlog: d -axis) and angle δ Query Algorithm (given q and r ) Approximate range query on the set of projections { p ′ = [ p 1 p 2 · · · p d − 1 ] T , p ∈ P } with B ( q, δr ) • returns O (log n ) BBD-nodes (cells) in O (log n ) time O (log n ) binary searches d -axis Return the point s such that | s d − q d | is min Correctness (proof for || q − s || ≤ (1 + ǫ ) || q − p || ) | s d − q d | ≤ | p d − q d | ≤ || p − q || p | s ′ − q ′ | ≤ 2 δr ≤ 4 δ || p − q || δ r √ s 1 + 16 δ 2 || p − q || = (1 + ǫ ) || p − q || || s − q || ≤ q δr 2 δr Data structure BBD-tree on the projection set For every tree node v the associated list of points is sorted in the d coordinate

  15. Conic-ANN Analysis Construction (preprocessing) BBD-tree O ( n log n ) +sorting O ( n log n ) = O ( n log n ) Query Approximate range query O (log n ) + bin. searches O (log 2 n ) = O (log 2 n ) Improving query time by exploiting correlation [Lueker and Willard] O (log n ) v O (log n ) nodes O (1) O (1) O (1) O (1) O (1) O (1) O (1) right ( v ) left ( v ) O (1) O (1) O (1) O (1) O (1)

  16. Summary and Remarks Variant with projecting to d − 2 dimensions • BBD tree + planar point location Rough ( ≈ d 3 / 2 ) approximation algorithms • Polynomial dependence on d

  17. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q q s

  18. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q s

  19. Clarkson’s Algorithm: Iterative Improvement Exact nearest neighbor problem Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not the nearest neighbor of q , then L s contains a site closer to q Algorithm s ← arbitrary site while ∃ t ∈ L s : || t − q || < || s − q || do s ← t return s q q ′ s Note The same L s valid for all q !

  20. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges

  21. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s

  22. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

  23. Not Useful for Exact NN Reason 1: space complexity Ω( n 2 ) For all s , L s has to include all Delaunay neighbors of s For d > 2 , Delaunay triangulation may have Ω( n 2 ) edges t Proof: q s ∈ L s t Delaunay neighbor of s , but t / c t is the only site closer to q than s Conclusion No improvement over the trivial algorithm! Reason 2: query time Ω( n ) No ”sufficient progress” guarantee, may have to visit all sites s 2 q s 5 s 1 s 4 s 3

  24. Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s ||

  25. Modification for ANN Data structure For each site s , a (small) list L s of other sites such that for any query point q if s is not a (1 + ǫ ) -ANN of q , then L s contains a site (1 + ǫ/ 2) -closer to q b q s t || q − s || 1+ ǫ || q − s || 1+ ǫ/ 2 || q − s || Algorithm (simple version) s ← arbitrary site while ∃ t ∈ L s : || q − t || ≤ || q − s || 1+ ǫ/ 2 do s ← t return s

  26. Query Algorithm Skip list approach [Arya and Mount 1993] R 0 = S

  27. Query Algorithm Skip list approach [Arya and Mount 1993] R 1 R 0 = S

  28. Query Algorithm Skip list approach [Arya and Mount 1993] R 2 R 1 R 0 = S

  29. Query Algorithm Skip list approach [Arya and Mount 1993] R 3 R 2 R 1 R 0 = S

Recommend


More recommend