9 28 2009
play

9/28/2009 Nearest Neighbor Queries What are the two nearest stars - PDF document

9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse kNNsearch in Arbitrary Dimensionality Where is the nearest restaurant? Seyed Jalal Kazemitabar Original paper by Y. Tao, D. Papadias, and X. Lian Where


  1. 9/28/2009 Nearest Neighbor Queries What are the two nearest stars to Andromeda? Reverse kNNsearch in Arbitrary Dimensionality Where is the nearest restaurant? Seyed Jalal Kazemitabar Original paper by Y. Tao, D. Papadias, and X. Lian Where is the nearest…. InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) Algorithms for finding NN Reverse Nearest Neighbors Queries � Elementary methods: What are the fireplaces I’m nearest to? Indexing Search NN Data Algorithm solution Structure BF DFS R-tree R*-tree Which houses I’m the closest restaurant to? � More advanced methods: Branch & Indexing Search NN Bound Data Algorithm solution Methods Structure Mindist Maxdist Minmaxdist InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) RNN Definition Related Work � A data point p is the reverse nearest neighbor of query point q , if there is no point p’ such that dist(p’, p)< dist(q, p), i.e. q is the NN of p . p 3 Vicinity circles RNN Algorithms p 2 NN(p 2 )=NN(p 3 )=q q Main idea Main idea Pre- Filter/ computing refinement RNN(q)= {p 2 , p 3 } p 1 Methods KM YL SAA SFT p 5 � In our example, p 2 , p 3 are the houses p 4 for which q is the nearest restaurant � Is RNN a symmetric relation? InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) 1

  2. 9/28/2009 KM YL � Original RNN method � YL: Merges the trees For all p: � p 2 � What happens if we insert p 5 ? p 2 Pre-compute NN( p ) 1. p 5 RNN( p 5 )=? 1. p 4 p 4 Represent p as a vicinity circle 2. p 1 p 1 p 3 p 3 Find all points that have p 5 as their new NN q Index the MBR of all circles by an R-tree 3. 2. Update the vicinity circles of those points in the index (Named RNN-tree) ( ) Compute NN( p 5 ) and insert the corresponding p ( p ) p g 3. circle in the index RNN( q )= all circles that contain q 4. Drawbacks? � � Needs two trees: RNN-tree & R-tree Techniques that rely on pre-processing cannot deal efficiently with updates R-tree RNN-tree R-tree RNN-tree RdNNtree MBR MBR MBR MBR MBR MBR MBR MBR MBR MBR InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) SAA SFT Filter � Elimination of the need for pre-computing all NNs in filter/ Find the k NNs of the query q ( k candidates) 1. refinement methods Eliminate the points that are closer to other Boolean range for p 2 2. � SAA: candidates than q. N1 p 1 Refine p 2 p 3 � Divide the space around query into Apply Boolean range queries to determine 3. S 2 S 1 six equal regions S 3 Filter the actual RNNs q p 2 p 4 q � Find NN( q ) in all regions (candidate keys) p 1 A Boolean range query terminates as the first � Refine Refine p 6 p 4 data point is found � Either (i) or (ii) holds for each candidate key p p 5 p 6 p 5 p 3 � (i) p is in RNN( q ) p 7 S 6 S 5 S 4 � (ii) No RNN( q ) in S i � Drawbacks? p 7 Boolean range for p 6 � RNN( q ) = { p 6 } � Any Drawbacks? False misses Choosing a proper k The number of regions increases exponentially with the dimensionality InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) Half-plane pruning � Can p’ be closer to q than p can be? � Concluding former methods: Dynamic Arbitrary Exact result data dimensionality KM, YL KM YL N No Y Yes Y Yes SAA Yes No Yes SFT Yes Yes No � If p 1 , p 2 , …, p n are n data points, then any node whose MBR falls inside U i=1..n Pl p i ( p 3 ,q ) cannot contain any RNN result. InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) 2

  3. 9/28/2009 � Pruning an R-tree MBR: � Approximating the residual MBR � Drawbacks? processing time in terms of bisector trimming for computing Computation of intersections does not scale with dimensionality InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) TPL Algorithm � An MBR can be pruned if its residual region is empty � The big picture � Uses best-first search � Utilizes one R-tree as the data structure � The approximation is a superset of the real residual region � Includes filtering/ refinement phases � Uses candidate points to prune entries � We can prune an MBR if its approximate residual is empty � Filters visited entries to obtain the set S cnd of candidates � Adds pruned entries to set S rfn � S rfn is used in the refinement step to eliminate false hits � Good news: processing time for computing Branch & Indexing Search RNN Bound Data Algorithm solution No more hyper-polyhedrons to make the intersection computation complex Methods Structure InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) TPL E xample Filtering step N 11 N 6 p 8 data R-tree N 4 contents omitted N 12 p 6 N 10 N 11 N 12 p 4 N 5 p 2 N 1 N 2 N 3 N 4 N 5 N 6 p 3 N 3 p p p p p 2 p 2 p 1 5 p 5 p p p 1 p 1 p 7 7 3 3 6 6 q N 2 p 4 p .... N 10 8 p contents omitted 7 N 1 p 5 Action Heap S cnd S rfn � * Figures of this example are obtained from [2] Visit root { N 10 , N 11 , N 12 } {} {} InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) 3

  4. 9/28/2009 N 11 N 6 p 8 data R-tree N 4 contents omitted N 12 p 6 N 10 N 11 N 12 p 4 N 5 p 2 N 2 N 3 N 4 N 6 N 1 N 5 p 3 N 3 p 1 p p 5 p 5 p p p 1 p 1 p p p 2 p 2 p 7 7 6 6 3 3 q N 2 p 4 p N 10 .... 8 p contents omitted N 1 7 p 5 Action Heap S cnd S rfn Action Heap S cnd S rfn Visit N 10 { N 3 , N 11 , N 2 , N 1, N 12 } {} {} Visit N 3 { N 11 , N 2 , N 1, N 12 } { p 1 } {p 3 } InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) Action Heap S cnd S rfn Action Heap S cnd S rfn Visit N 11 { N 5 , N 2 , N 1, N 12 } { p 1 } { p 3 , N 4 , N 6 } Visit N 5 { N 2 , N 1, N 12 } { p 1 , p 2 } { p 3 , N 4 , N 6 , p 6 } InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) Action Heap S cnd S rfn Action Heap S cnd S rfn Visit N 1 { N 12 } { p 1 , p 2 , p 5 } { p 3 , N 4 , N 6 , p 6 , {} { p 1 , p 2 , p 5 } { p 3 , N 4 , N 6 , p 6 , N 2 , p 7 } N 2 , p 7 , N 12 } InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) 4

  5. 9/28/2009 Refinement Heuristics � Let P rfn be the set of points and N rfn be the set of nodes in S rfn � A point p from S cnd can be discarded as a false hit if there is a point such that either of the following hold: (i) (ii) There is a node MBR such that A candidate point can be eliminated if it is closer to another candidate � point than to the query � A point p from S cnd can be reported as an actual result if the following conditions hold: (i) There is no point such that Action S cnd S rfn Actual results (ii) For every node { p 1 , p 2 , p 5 } { p 3 , N 4 , N 6 , p 6 , N 2 , p 7 , N 12 } {} If none of the above works, visit all node MBRs where � Invalidate p 1 { p 2 , p 5 } { N 4 , N 6 , N 2 , N 12 } {} and use the mentioned heuristics considering the newly visited entries Validate p 5 { p 2 } { N 4 , N 6 , N 2 , N 12 } {} Remove N 6 , N 2 { p 2 } { N 4 , N 12 } { p 5 } InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) RkNNpruning � Return all points that have q as one of their k nearest neighbors Action S cnd S rfn Actual results { p 2 } { N 4 , N 12 } { p 5 } � Let be a subset of . Each of the Access N 4 { p 2 } { p 4 , p 8 , N 12 } { p 5 } subsets, prunes the area Invalidate p 2 {} { N 12 } { p 5 } InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) kTPLAlgorithm E xperiments � Same filtering as TPL � RNN queries on real data � Same refining with the following exceptions: � A point can be pruned if k points are found within distance dist(p,q) from p � A counter is associated with each point (initialized to k) and decreases when such a point is found � A candidate is eliminated if counter= 0 � No prior knowledge of number of points in a node, so no application of in pruning � A point p can be pruned if a node N is found such that and InfoLab.usc.edu Geospatial Information Management (Fall 2009) InfoLab.usc.edu Geospatial Information Management (Fall 2009) 5

Recommend


More recommend