neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics - PowerPoint PPT Presentation

Sublinear-time approximation of the cost of a metric k -nearest neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics and it Applications) Department of Computer Science University of Warwick Joint work with Christian Sohler

k-nearest neighbor graph The 𝑙 -nearest neighbor ( 𝑙 -NN) graph is a basic data structure with applications in – spectral clustering (and unsupervised learning) – non-linear dimensionality reduction – density estimation – and many more … Computing a 𝑙 -NN graph takes Ω(𝑜 2 ) time in the worst case – many data structures for approximate nearest neighbor queries for specific distances measures known – heuristics

k -NN graph A k -NN graph for a set of objects with a distance measure is a directed graph, such that • every vertex corresponds to exactly one object • every vertex has exactly 𝑙 outgoing edges • the outgoing edges point to its 𝑙 closest neighbors Every point 𝑣 connects to 𝑙 nearest neighbors

Access to the input: Distance oracle model Distance Oracle Model for Metric Spaces: • Input: 𝑜 -point metric space (𝑌, 𝑒) – we will often view the metric space as a complete edge- weighted graph 𝐼 = 𝑊, 𝐹 with 𝑊 = {1, … , 𝑜} , and weights satisfying triangle inequality Query: return distance between any 𝑣 and 𝑤 time ~ number of queries

𝑙 -NN graph cost approximation Problem: Given oracle access to an 𝑜 -point metric space, compute a (1 + 𝜁) -approximation to the cost (denoted by cost( 𝑌 )) of the 𝑙 -NN graph cost( 𝑌 ) denotes the sum of edge weights

(Simple) lower bounds Observation Computing a 𝑙 -NN graph requires Ω(𝑜 2 ) queries Proof: Consider a (1,2) -metric with all distances 2 except for a single random pair with distance 1 • we must find this edge to compute 𝑙 -NN graph • finding a random edge requires Ω(𝑜 2 ) time 1

(Simple) lower bounds Observation Finding a (2 − 𝜁) -approx. 𝑙 -NN requires Ω 𝑜 2 /log 𝑜 queries Proof: • Take a random (1,2)-metric where each distance is chosen independently at random to be 1 with 𝑙 log 𝑜 probability Θ( ) 𝑜 • Whp. every vertex has ≥ 𝑙 neighbors with distance 1 • (2 − 𝜁) -approximate 𝑙 -NN contains Ω(𝑜𝑙) of the 1s

(Simple) lower bounds Observation Approximating the cost of 1 -NN requires Ω(𝑜 2 ) queries Proof: • Via lower bound for perfect matching [Badoiu, C, Indyk, Sohler, ICALP’05 ] • Pick a random perfect matching and set all matching distances to 0 and all other distances to 1 • Distinguish this from an instance where one matching edge has value 1 • Finding this edge requires Ω(𝑜 2 ) queries

(Simple) lower bounds distance 0 or 1 All matching edges except possibly a single one have weights 0 One random edge may have weight 1 All non-matching distances are 1 1-NN cost is either 0 or 1 depending on a single edge • Pick a random perfect matching and set all matching distances to 0 and all other distances to 1 • Distinguish this from an instance where one matching edge has value 1 • Finding this edge requires Ω(𝑜 2 ) queries

(Simple) lower bounds Observation Approximating the cost of 1 -NN requires Ω(𝑜 2 ) queries Proof: • Via lower bound for perfect matching [Badoiu, C, Indyk, Sohler, ICALP’05 ] • Pick a random perfect matching and set all matching distances to 0 and all other distances to 1 • Distinguish this from an instance where one matching edge has value 1 • Finding this edge requires Ω(𝑜 2 ) queries

Approximating cost(X) The “simple lower bounds” show that • finding a low-cost 𝑙 -NN graph is hopeless • estimating cost(X) is hopeless – at least for small 𝑙 Can we do anything?

Approximating cost(X) The “simple lower bounds” show that • finding a low-cost 𝑙 -NN graph is hopeless • estimating cost(X) is hopeless – at least for small 𝑙 A similar situation has been known for some other problems: MST, degree estimation, etc Chazelle et al: MST cost in graphs can be (1 + 𝜁) -approx. in poly 𝑋𝑒/𝜁 time, d-max-deg, W-max-weight C. Sohler: MST cost in metric case – approx. in ෨ 𝑃(𝑜) time

Approximating cost(X): New results Theorem 1 (1 + 𝜁) -approximation of cost(X) with ෨ 𝑃(𝑜 2 /𝑙𝜁 2 ) queries Theorem 2 Ω(𝑜 2 /𝑙) queries are necessary to approximate cost(X) within any constant factor

Approximating cost(X): New results Theorem 1 (1 + 𝜁) -approximation of cost(X) with ෨ 𝑃(𝑜 2 /𝑙𝜁 2 ) queries Theorem 2 Ω(𝑜 2 /𝑙) queries are necessary to approximate cost(X) within any constant factor This is very bad for small 𝑙 ; and very good for large 𝑙 . What can we do for small 𝑙 ?

Approximating cost(X): New results The lower bound for small 𝑙 holds when the instances are very “spread - out” and are “disjoint” Can we get a faster algorithm when we allow the approximation guarantee to depend on the MST cost ?

Approximating cost(X): New results Theorem 3 With ෨ 𝑃 𝜁 (𝑜𝑙 3/2 ) queries one can approximate cost(X) with error 𝜁 ⋅ (cost(X) + MST(X)) Corollary 4 With ෨ 𝑃 𝜁 (min{𝑜𝑙 3/2 , 𝑜 2 /𝑙}) queries one can approximate cost(X) with error 𝜁 ⋅ (cost(X) + MST(X)) Theorem 5 Any algorithm that approximates cost(X) with error 𝜁 ⋅ cost(X) + MST(X) requires Ω(min{𝑜𝑙 3/2 /𝜁, 𝑜 2 /𝑙}) queries.

Approximating cost(X): New results We have tight bounds for estimation of cost(X) When we want a (1 + 𝜁) – approximation: • ෩ Θ 𝜁 (n 2 /k) queries are sufficient and necessary When happy with 𝜁(cost(X) + MST(X)) additive error: • ෩ Θ 𝜁 (min{𝑜𝑙 3/2 , 𝑜 2 /𝑙}) queries are sufficient/necessary • it’s sublinear for every 𝑙 , always at most ෨ 𝑃 𝜁 (𝑜 8/5 )

Approximating cost(X): New results We have tight bounds for estimation of cost(X) When we want a (1 + 𝜁) – approximation: • ෩ Θ 𝜁 (n 2 /k) queries are sufficient and necessary When happy with 𝜁(cost(X) + MST(X)) additive error: • ෩ Θ 𝜁 (min{𝑜𝑙 3/2 , 𝑜 2 /𝑙}) queries are sufficient/necessary Techniques: • Upper bounds: clever random sampling • Lower bounds: analysis of some clustering inputs (more complex part)

Approximating cost(X): New results We have tight bounds for estimation of cost(X) When we want a (1 + 𝜁) – approximation: • ෩ Θ 𝜁 (n 2 /k) queries are sufficient and necessary Efficient for large 𝑙 Relies on a random sampling: of close neighbors and far neighbors in 𝑙 -NN graph

Upper bound for (𝟐 + 𝜻) -approximation Two “hard” instances • A cluster of 𝑜 − 1 points and a single point far away • A cluster of 𝑜 − (𝑙 + 1) points and 𝑙 + 1 points far away and close to each other Our algorithm must be able to distinguish between these two instances

Upper bound for (𝟐 + 𝜻) -approximation Each point 𝑣 approximates length to its 𝑙/2 th – nearest neighbor 𝑛 𝑣 𝑃(𝑜/𝑙) queries per point; ෨ ෨ 𝑃(𝑜 2 /𝑙) queries in total • Short edges in 𝑙 -NN: (𝑣, 𝑤) s.t. 𝑒 𝑣, 𝑤 ≤ 10𝑛 𝑣 Long edges in 𝑙 -NN: all other edges We separately estimate total length of short edges and total length of long edges

Upper bound for (𝟐 + 𝜻) -approximation Short edges in 𝑙 -NN: (𝑣, 𝑤) s.t. 𝑒 𝑣, 𝑤 ≤ 10𝑛 𝑣 Long edges in 𝑙 -NN: all other edges • We separately estimate total length of short edges and total length of long edges by some random sampling methods Summing estimations of short and long edges gives a (1 + 𝜁) – approximation of cost(X)

Lower bound for (𝟐 + 𝜻) -approximation Consider two problem instances: – inner-cluster distance ~ 0; outer-cluster distance ~1 𝑜 𝑙+1 clusters of size 𝑙 + 1 each • – cost(X)~0 𝑜 𝑙+1 − 1 clusters of size 𝑙 + 1 each; one cluster of size • 𝑙 , one cluster of size 1 – cost(Y) ≫ 0 We prove that one requires Ω(𝑜 2 /𝑙) queries to distinguish between these two problem instances

Lower bound for (𝟐 + 𝜻) -approximation clusters of size 𝑙 + 1 each cost(X) ~ 0

Lower bound for (𝟐 + 𝜻) -approximation clusters of size 𝑙 + 1 each cost(X) ~ 0 In other instance: remove a random point from its cluster cost(Y) ≫ 0

Lower bound for (𝟐 + 𝜻) -approximation clusters of size 𝑙 + 1 each cost(X) ~ 0 In other instance: remove a random point from its cluster To find that single point one needs Ω(𝑜 2 /𝑙) queries • 𝑃(𝑜) samples to hit it first • 𝑃(𝑜/𝑙) further samples to detect no neighbors cost(Y) ≫ 0

Lower bound for (𝟐 + 𝜻) -approximation Consider two problem instances: – inner-cluster distance ~ 0; outer-cluster distance ~1 𝑜 𝑙+1 clusters of size 𝑙 + 1 each • – cost(X)~0 𝑜 𝑙+1 − 1 clusters of size 𝑙 + 1 each; one cluster of size • 𝑙 , one cluster of size 1 – cost(Y) ≫ 0 We prove that one requires Ω(𝑜 2 /𝑙) queries to distinguish between these two problem instances

Approximating with error 𝜻( cost(X) + MST(X) ) Simplifying assumptions: • All distances are of the form 1 + 𝜁 𝑗 • All distances are between 1 and poly( 𝑜 )

neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics - PowerPoint PPT Presentation

Sublinear-time approximation of the cost of a metric k -nearest neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics and it Applications) Department of Computer Science University of Warwick Joint work with Christian Sohler

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Scaling IPv6 Neighbor Discovery Ben Mack-Crane ( ben.mackcrane@huawei.com ) Overview of Neighbor

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Example Im at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesnt

Improving Neighbor Discovery with Slot Index Improving Neighbor Discovery with Slot Index

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Greedy Strategy [14] In the last class Undirected and Symmetric Digraph UDF Search

MS&T Conference Materials Science and Technology (Fall Semester) What? Approx 5 day long

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

COMP331/557 Chapter 6: Optimal Trees and Paths (Cook, Cunningham, Pulleyblank & Schrijver,

Greedy Algorithms ms Jeevani Goone*llake University of

Minimal Spanning 9 5 Trees 14 7 8 Chapter 9 15 10 3 CPTR 318 Prims Algorithm

Objectives Minimum Spanning Tree Union-Find Data Structure Clustering Mar 1, 2019

Reduced words in Coxeter Groups Philippe Nadeau (CNRS & Univ. Lyon 1) MADACA conference,

neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics - PowerPoint PPT Presentation

Sublinear-time approximation of the cost of a metric k -nearest neighbor graph Artur Czumaj DIMAP (Centre for Discrete Mathematics and it Applications) Department of Computer Science University of Warwick Joint work with Christian Sohler

Nearest Neighbor and Locality-Sensitive Hashing Nearest Neighbor Set Similarity

Graph-based Nearest Neighbor Search: From Practice to Theory Liudmila Prokhorenkova, Aleksandr

NEAREST NEIGHBOR RULE Jeff Robble, Brian Renzenbrink, Doug Roberts Nearest Neighbor Rule

Scaling IPv6 Neighbor Discovery Ben Mack-Crane ( ben.mackcrane@huawei.com ) Overview of Neighbor

On Optimal Neighbor Discovery Philipp H. Kindt philipp.kindt@tum.de SIGCOMM19, Beijing CH

CSCI 447/547 MACHINE LEARNING Outline Nearest Neighbor K-Nearest Neighbor Algorithm

GRAPH MINING AND GRAPH KERNELS Part I: Graph Mining Karsten Borgwardt^ and Xifeng Yan*

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Nearest Neighbor Classification Machine Learning 1 This lecture K-nearest neighbor

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search Sariel Har-Peled

Example Im at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesnt

Improving Neighbor Discovery with Slot Index Improving Neighbor Discovery with Slot Index

The Nearest Neighbor Algorithm The Nearest Neighbor Algorithm Hypothesis Space Hypothesis Space

Learning From Data Lecture 16 Similarity and Nearest Neighbor Similarity Nearest Neighbor M.

Simultaneous Nearest Neighbor Search Piotr Indyk Robert Kleinberg MIT Cornell Sepideh

Greedy Strategy [14] In the last class Undirected and Symmetric Digraph UDF Search

MS&amp;T Conference Materials Science and Technology (Fall Semester) What? Approx 5 day long

Graph Based Dependency Parsing Wei Qiu December 15, 2011 . . . . . . Graph Based

COMP331/557 Chapter 6: Optimal Trees and Paths (Cook, Cunningham, Pulleyblank &amp; Schrijver,

Greedy Algorithms ms Jeevani Goone*llake University of

Minimal Spanning 9 5 Trees 14 7 8 Chapter 9 15 10 3 CPTR 318 Prims Algorithm

Objectives Minimum Spanning Tree Union-Find Data Structure Clustering Mar 1, 2019

Reduced words in Coxeter Groups Philippe Nadeau (CNRS &amp; Univ. Lyon 1) MADACA conference,

MS&T Conference Materials Science and Technology (Fall Semester) What? Approx 5 day long

COMP331/557 Chapter 6: Optimal Trees and Paths (Cook, Cunningham, Pulleyblank & Schrijver,

Reduced words in Coxeter Groups Philippe Nadeau (CNRS & Univ. Lyon 1) MADACA conference,