outline
play

Outline Spatial Index R Tree NN query Nave solution Nearest - PDF document

9/3/2009 Outline Spatial Index R Tree NN query Nave solution Nearest Neighbor Queries A better solution Branch and bound Nick Roussopoulos, Stephen Kelly and Frdric Vincent Can we do better?


  1. 9/3/2009 Outline • Spatial Index – R ‐ Tree • NN query – Naïve solution Nearest Neighbor Queries • A better solution – Branch ‐ and ‐ bound Nick Roussopoulos, Stephen Kelly and Frédéric Vincent • Can we do better? • Experiment Results Ling Hu lingh@usc.edu 2 R- -Tree Tree R- -Tree Tree 1 1 1 2 2 2 E 3 3 3 F 4 4 4 5 5 5 6 6 6 11 11 11 7 7 7 G 9 9 9 12 12 12 8 8 8 H 10 10 10 3 4 A R- -Tree Tree R- -Tree Tree One entry: 1 1 B Pointer MBR 1 1 B 2 2 A: B C 2 2 E 3 3 F E 3 3 4 4 F 4 4 5 5 5 5 G H C: E F 6 6 B: B: 6 6 6 6 11 11 C E F G 7 7 11 11 G C 6 7 7 9 9 4 5 1 2 3 10 11 12 G 12 12 9 9 8 8 H 12 12 H 8 8 7 8 10 10 9 H 10 10 5 6 1

  2. 9/3/2009 Outline Nearest Neighbor Search • Retrieve the nearest neighbor of query point Q • Spatial Index – R ‐ Tree • Simple Strategy: • NN query – Naïve solution – convert the nearest neighbor search to range search. – Guess a range around Q that contains at least one object say O • A better solution – Branch ‐ and ‐ bound • if the current guess does not include any answers, increase range size until an object found. til bj t f d • Can we do better? – Compute distance d’ between Q and O – re ‐ execute the range query with the distance d’ around Q. • Experiment Results – Compute distance of Q from each retrieved object. The object at minimum distance is the nearest neighbor!!! 7 8 Naïve Approach Outline A A: B C 1 B • Spatial Index – R ‐ Tree F 2 C: G H F • NN query – Naïve solution B: E E 3 E F G • A better solution – Branch ‐ and ‐ bound 4 4 5 6 1 2 3 10 11 12 5 5 H 7 8 9 • Can we do better? 6 • Experiment Results Issues: how to guess range? Query Point Q C The retrieval may be sub ‐ optimal if 7 G incorrect range guessed. 11 9 12 8 Would be a problem in high H 10 dimensional spaces. 10 A Better Strategy for KNN search MINDIST Property • MINDIST is a lower bound of any k-NN distance • A sorted priority queue based on MINDIST; • Nodes traversed in order; • Stops when there is an object at the top of the queue; (1 ‐ NN found) • k ‐ NN can be computed incrementally; (p1, p2) (t1, t2) I/O optimal (p1, p2) (p1, p2) (s1, s2) (p1, p2) (p1, p2) 11 12 2

  3. 9/3/2009 Priority Queue Outline A: A B C 1 B • Spatial Index – R ‐ Tree C: B: F G H E F 2 E F G • NN query – Naïve solution E 6 4 5 1 2 3 10 11 12 3 H 7 8 9 • A better solution – Branch ‐ and ‐ bound 4 5 5 A • Can we do better? 6 B C • Experiment Results E C F Query Point Q C C 5 6 4 F 7 G 11 9 12 8 H 5 G 6 4 F H 10 7 5 8 G 9 6 4 F 1NN 13 14 MBR Face Property – 2D MBR Face Property • MBR is an n ‐ dimensional Minimal Bounding Rectangle used in R trees, which is the minimal bounding n ‐ dimensional rectangle bounds its corresponding objects corresponding objects. • MBR face property: Every face of any MBR contains at least one point of some object in the database. 15 16 MBR Face Property – 3D Improving the KNN Algorithm • While the MinDist based algorithm is I/O optimal, its performance may be further Rectangle R improved by pruning nodes from the priority queue queue. 17 18 3

  4. 9/3/2009 Properties of MINMAXDIST MINDIST & MINMAXDIST • MINMAXDIST(P,R) is the minimum over all dimensions MINDIST(P,R) <= NN(P) <= MINMAXDIST(P,R) distances from P to the furthest point of the closest face of R. • MINMAXDIST is the smallest possible upper bound of distances from the point P to the rectangle R . • MINMAXDIST guarantees there is an object within the R at a distance to P less than or equal to it. • MINMAXDIST is an upper bound of the 1-NN distance 19 20 MinDist & MinMaxDist – 3D Pruning 1 Query Point Q Downward pruning: An MBR R is discarded • If there exists another R’ such that MINDIST(P,R)> MINMAXDIST(P,R’) MinMaxDist(Q,R) MinDist(Q,R) R Rectangle R R R’ P MINDIST MINMAXDIST 21 22 Pruning 2 Pruning 3 Downward pruning: An object O is discarded Upward pruning: An MBR R is discarded • • If an object O is found such that MINDIST(P,R) > Actual_dist(P,O) If there exists an R such that Actual_dist(P,O) > MINIMAXDIST(P,R) R R’ O O R R P Actual ‐ dist MINMAXDIST P MINDIST Actual_dist O 23 24 4

  5. 9/3/2009 MINDIST vs MINMAXDIST Ordering MINDIST vs MINMAXDIST Ordering • MINDIST: optimistic • MINMAXDIST: pessimistic • Example: MINDIST ordering finds the 1 ‐ NN first • Example: MINMAXDIST ordering finds the 1 ‐ NN first 25 26 Outline Generalize to k ‐ NN • Keep a sorted buffer of at most k current nearest • Spatial Index – R ‐ Tree neighbors • NN Query – Intuitive Solutions • Pruning is done according to the distance of the • Optimized NN Query – branch ‐ and ‐ bound furthest nearest neighbor in this buffer • Example: • Experiment Results R P MINDIST Actual_dist The k ‐ th object in the buffer 27 5

  6. 9/3/2009 Key Insights • # of pages accessed grows when k grews; • The denser the dataset, the more page access; • MinDist v.s. MinMaxDist: same in shape, but MinMaxDist has more I/O cost; i i h /O • In Dense area, MinMaxDist is bad; Thanks & Questions ? 6

Recommend


More recommend