Approximate Voronoi Diagrams: Techniques, tools, and applications to - PowerPoint PPT Presentation

Approximate Voronoi Diagrams: Techniques, tools, and applications to k th ANN search Nirman Kumar University of California, Santa-Barbara January 13th, 2016

Similarity Search ? Need similarity search to make sense of the world!

When an appropriate metric is defined Similarity search reduces to NN search

Nearest neighbor search Set of points P : find quickly for a query q , the closest point to q in P

Nearest neighbor search Also important in other domains

Approximate nearest neighbor search (ANN) Find any point x with d ( q, x ) ≤ (1 + ε ) d 1 ( q, P )

Space partitioning Most data structures for NN (or ANN) search partition space

Space partitioning In low dimensions this is an explicit paritioning

Space partitioning In high dimensions the partitioning is implicit (via hash functions)

Voronoi diagrams

Voronoi diagrams Very efficient in dimensions d ≤ 2

Voronoi diagrams Performance degrades sharply - bad even for d = 3

This talk ◮ Construction of Approximate Voronoi Diagrams ◮ Tools used - Quadtrees, WSPD ◮ Construction of AVD for k th ANN ◮ Some open problems

Approximate Voronoi Diagrams (AVD) A space partition as before

Approximate Voronoi Diagrams (AVD) With each region is associated 1 rep (a point of P )

Approximate Voronoi Diagrams (AVD) This rep is a valid ANN for any q in region

Main ideas behind ANN search and AVDs ◮ If the query point is “far” any point is a good ANN ◮ A region can be approximated well by cubes ◮ Point location can be done in a set of cubes efficiently

Tool 1: Quadtrees A quadtree - intuitively [0 , 1] × [0 , 1]

Tool 1: Quadtrees A quadtree on points f i e a g h c i a d g b d h b c e f

Tool 1: Quadtrees The compressed version f i e a g h c i a d g b d h b c e f

Tool 1: Quadtrees Point Location ≡ find leaf node containing a point

Tool 1: Quadtrees Height h : O (log h ) time - O (log log n ) for balanced tree!

Tool 1: Quadtrees But height not bounded as function of n

Tool 1: Quadtrees Use compressed quadtree - height bounded by O ( n )

Tool 2: Well separated pairs decomposition How many distances among points - Ω( n 2 )

Tool 2: Well separated pairs decomposition What if distances within (1 ± ε ) are considered the same?

Tool 2: Well separated pairs decomposition About O ( n/ε d ) different distinct distances upto (1 ± ε )

Tool 2: Well separated pairs decomposition ◮ How can we represent them? ◮ Given a pair of points, which bucket does it belong to?

Tool 2: Well separated pairs decomposition The WSPD data structure captures this

Tool 2: Well separated pairs decomposition More formally ◮ A collection of pairs A i , B i ⊂ P ◮ A i ∩ B i = ∅ ◮ Every pair of points is separated by some A i , B i ◮ Each pair A i , B i is well separated

Tool 2: Well separated pairs decomposition A well separated pair is a dumbbell r 2 ℓ ≥ 1 /ε max { r 1 , r 2 } r 1

Tool 2: Well separated pairs decomposition WSPD example a b f c d e f a d e c b A 1 = { a, b, c } , B 1 = { e } A 1 = { a } , B 1 = { b, c } d e f . . . a c b

Tool 2: Well separated pairs decomposition Main result about WSPDs There is a ε − 1 -WSPD of size O ( nε − d ) - It can be constructed in O ( n log n + nε − d ) time

AVD results The main result ◮ O ( n/ε d ) cells ◮ Query time - O (log( n/ε ))

The AVD algorithm Construct a 8 -WSPD for the point set

The AVD algorithm Let ( A i , B i ) for i = 1 , . . . , m be the pairs

The AVD algorithm For each pair do some processing - output some cells

The AVD algorithm Preprocess them for point location

The AVD algorithm So what is the processing per pair?

The AVD algorithm Consider a WSPD dumbbell

The AVD algorithm Concentric balls increasing radii - r/ 4 to ≈ r/ε

The AVD algorithm Tile each ball (rad x ) by cubes of size ≈ εx

The AVD algorithm Store the ε/c ANN for some point in each cell

So why does it work? Every pair of competing points is resolved

So why does it work? p 1 , p 2 resolved by the WSPD pair separating them

So why does it work? p 2 p 1

So why does it work? q p 2 p 1

So why does it work? p 2 p 1 q

Bounding the AVD complexity The shown method gives O ( n/ε d log 1 /ε ) cubes

Bounding the AVD complexity This can be improved to O ( n/ε d )

k th ANN search Given q output a point u ∈ P such that: (1 − ε ) d k ( q, P ) ≤ d ( q, u ) ≤ (1 + ε ) d k ( q, P )

Applications of k th ANN search ◮ Density estimation ◮ Functions of the form : F ( q ) = � k i =1 f ( d i ( q, P )) ◮ k th ANN on balls

Applications of k th ANN search Density estimation density ≈ #points area

The result AVD for k th ANN O (( n/k ) ε − d log 1 /ε ) cells ◮ ◮ Query time - O (log( n/ ( kε )))

Quorum clustering

Quorum clustering Find smallest ball containing k points

Quorum clustering Remove points and repeat

Quorum clustering A way to summarize points

Quorum clustering Has properties favorable for k th ANN problem

Quorum clustering Quorum clustering too expensive to compute

Quorum clustering Can compute approximate quorum clustering

Quorum clustering ◮ Computed in: O ( n log d n ) time in I R d [Carmi, Dolev, Har-Peled, Katz and Segal, 2005] R d [Har-Peled and K., ◮ Computed in: O ( n log n ) time in I 2012]

Why is quorum clustering useful r 1 q c 1 c 2 c 3 r 2 r 3 ◮ x = d k ( q, P ) ◮ r 1 ≤ x ◮ x + r 1 ≥ d ( q, c 1 ) = ⇒ d ( q, c 1 ) ≤ 2 x ◮ x ≤ d ( q, c 1 ) + r 1 ≤ 3 x

Refining the approximation Just as in AVDs generate a list of cells

Refining the approximation

Refining the approximation For closest ball use ANN data R d +1 structure in I

Refining the approximation R d +1 b = b ( c, r ) → ( c, r ) ∈ I

Refining the approximation Some cells generated by AVD for ball centers

Refining the approximation Store some info with each cell

Refining the approximation A k th ANN, and approximate closest ball

Open problems ◮ In high dimensions, is there a data structure for k th NN whose space requirement is f ( n/k ) ? ◮ There is an AVD for weighted ANN similar to AVD as shown - is there an extension to weighted k th ANN?

Approximate Voronoi Diagrams: Techniques, tools, and applications to - PowerPoint PPT Presentation

Approximate Voronoi Diagrams: Techniques, tools, and applications to k th ANN search Nirman Kumar University of California, Santa-Barbara January 13th, 2016 Similarity Search ? Need similarity search to make sense of the world! When an

Computational Geometry Lecture 10: Voronoi diagrams Computational Geometry Lecture 10: Voronoi

Computational Geometry Lecture 10: Voronoi diagrams Computational Geometry Lecture 10: Voronoi

Computational Geometry Lecture 10: Voronoi diagrams 1 Computational Geometry Lecture 10:

Voronoi diagrams Marko Tht Content Voronoi diagram Delauney triangulation How to

Visualizing Geometric Proofs 3D surface diagrams Lukas Heidemann Diagrams 0-diagrams 1

Computational Geometry Lecture 13: More on Voronoi diagrams Computational Geometry Lecture 13:

Lecture 7: Voronoi Diagrams Presented by Allen Miu 6.838 Computational Geometry September 27,

Interactive Voronoi Treemap Final Presentation Group 2 - Chris, Lisa, Markus, Romy 1 / 15

Voronoi Diagrams Fortunes Algorithm and Applications Kevin Wittmer Mentor: Rob Maschal DRP

Farthest-polygon Voronoi diagrams Otfried Cheong, Hazel Everett, Marc Glisse, Joachim

Segmentation of Polycrystalline Images Using Voronoi Diagrams Uzziel Cortez April 24, 2019

Computational Geometry Exercise session #7: Voronoi diagrams Fields of applications of

Approximate Voronoi Diagrams Presentation by Maks Ovsjanikov S. Har-Peleds notes, Chapters 6

1 Class Diagrams and Entity Relationship Diagrams (ERD) Class diagrams and ERDs both model the

Voronoi Diagrams Robust and Efficient implementation Masters Thesis Defense Nirav Patel

Anisotropic Voronoi diagrams and Delaunay Meshes Mariette Yvinec INRIA Sophia Antipolis

Locality-Sensitive Orderings ANN -Quadtree Walecki Theorem Local-Sensitivity Anil Maheshwari

Advances in Programming Languages APL4: Row variables in OCaml Structural typing for objects

Space-optimized texture atlas Jons Martnez Bayona Advisor: Carlos Andujar Master in

A Spanner for the Day After Kevin Buchin 1 Sariel Har-Peled 2 ah 1 D aniel Ol 1 Eindhoven

A quad-tree based Sparse BLAS implementation for shared memory parallel computers Michele Martone

CSE326:DataStructures Lecture#21 OneLastGasp BartNiswonger

RUBIK: Efficient Threshold Queries on Massive Time Series Eleni Tzirita Zacharatou Thomas

CS525: Advanced Database Organization Notes 6: Multi-dimensional indexes Yousef M. Elmehdwi