CS6100: Topics in Design and Analysis of Algorithms Range Searching John Augustine CS6100 (Even 2012): Range Searching
The Range Searching Problem Given a set P of n points in R d , for fixed integer d ≥ 1 , we want to preprocess and store it in a data structure so that, given a query range, typically an axis parallel rectangle, we can report all the points in the range quickly. For 1D range searching, we will study (i) balanced binary search trees and (ii) skip lists. For 2D point sets, we will study (i) kd -trees and (ii) Range trees, both of which can be extended to arbitrary d -dimensional point sets. 4,000 salary G. Ometer born: Aug 19, 1954 3,000 4,000 salary: $3,500 4 2 3,000 19,500,000 19,559,999 date of birth 19,500,000 19,559,999 CS6100 (Even 2012): Range Searching 1
Balanced Binary Search Trees (BBST) Given a set P of n points in R stored in a sorted array A , we can construct a tree that has depth O (log n ) . For simplicity, we begin with the assumption that n = 2 k for some integer k . The data nodes are in the leaves. The internal nodes store values that guide the search. The root node stores 2 k − 1 th element in A . While searching for a value x in the query phase, if x is less than or equal to the value stored in the root, the search is guided to the left sub tree. Otherwise, the search is guided to the right subtree. The left subtree is constructed recursively over points in A stored from locations 1 through 2 k − 1 . The right subtree is constructed over points in A located from positions 2 k − 1 + 1 through 2 k . When constructing the internal node on 2 element point sets, the left subtree simply points to the smaller CS6100 (Even 2012): Range Searching 2
of the two points and the right subtree to the larger, thus terminating the recursion. The construction can be easily adapted for arbitrary n . See below for an example. 49 23 80 10 37 62 89 3 19 30 59 70 100 49 89 3 10 19 23 30 37 59 62 70 80 100 105 µ ′ µ Lemma 1. If the set of points is sorted, we can construct the BBST in O ( n ) time. If not, it takes O ( n log n ) as we have to sort the points set. The BBST data structure requires O ( n ) storage space. CS6100 (Even 2012): Range Searching 3
To search for a single value µ , we start at the root node and ask if µ is greater than the value stored in the root. If it is, we move to the right subtree, otherwise, we move to the left. We continue recursively till the leaf, where we can report if µ is present. To query a range [ µ, µ ′ ] , we traverse the tree for both µ and µ ′ until we find the internal node where the two split ways — call it v split . root ( T ) ν split µ µ ′ the selected subtrees At v split , we part ways for µ and µ ′ . As we traverse towards µ (past v split ), just before we move to some left subtree, we report all points in the right subtree. We deal with µ ′ symmetrically. CS6100 (Even 2012): Range Searching 4
Lemma 2. The time to report points in some range [ µ, µ ′ ] is O ( k +log n ) where k is the number of points in [ µ, µ ′ ] . Proof. The tree traversal requires O (log n ) time. Reporting points in each subtree requires O ( k ′ ) time, where k ′ is the number of points on which that particular subtree is built. Therefore, O ( k ) time is required to report all k points. Preprocessing Time O ( n log n ) Space O ( n ) Searching for 1 element O (log n ) Reporting a range with k items O ( k + log n ) Insertion O (log n ) Deletion O (log n ) Table 1: Performance bounds of a BBST containing n points. CS6100 (Even 2012): Range Searching 5
Skip Tree While the static implementation of a binary search tree is very straightforward, making the data structure dynamic (i.e., adding and deleting the points from the points set) is non-trivial. The skip tree is a randomized data structure that allows easy implementation including updates (insertions and deletions). On expectation, it has the same performance bounds as BBST’s (in Table 1). Head Pointer CS6100 (Even 2012): Range Searching 6
Construction Again, we assume that the set P of n points is given to us in sorted order. We denote the i th element of P in the sorted list by p i . In our data structure, we use nodes with four pointers: left, right, top and bottom. We first construct the bottom level (or level 0), which is a linked list of the sorted list using the four-pointer node structure. The bottom pointers are set to null. For each p i , we toss a fair coin repeatedly until we get Heads . Let ℓ i be the number of Tails before we obtain the first Heads . Vertical Pointers. We make ℓ i identical nodes containing p i , one for each level up to level ℓ i , and we chain them up as follows. For j < ℓ i , the top pointer of j th node points to the j +1 th node and the bottom pointer of j + 1 th node points to node j . The top pointer of the ℓ i th node is null. CS6100 (Even 2012): Range Searching 7
The number of levels ℓ = max i ℓ i . For each level, we have two special boundary nodes, one to the left of all nodes in that level, and the other to the right. The boundary nodes are also chained up. Horizontal Pointers. We establish horizontal links at each level j starting from j = 1 up to j = ℓ . We start from the left boundary of level j . For each node η in level j (starting from the left boundary) we step down to its copy in level j − 1 and traverse to the right until we come to a node in level j − 1 that has a copy η ′ in level j . We establish bidirectional links between and η and η ′ and continue this process from η ′ until we reach the right boundary. The head pointer points to the left boundary of level ℓ . CS6100 (Even 2012): Range Searching 8
Searching for a Point p Here, given p , we want to report if P (stored using the skip list datastructure) contains p . For simplicity, assume that the left boundary nodes store −∞ and the right boundary nodes store + ∞ . Start from the head pointer. Repeat the following steps: 1. Find the last node whose value is at most than p . If the value is exactly p , we have found it, so we can terminate. 2. Else, if we have reached level 0, then, report that p is not in P and terminate. 3. Else, step directly down one level. CS6100 (Even 2012): Range Searching 9
Exercises 1. How do we search for points in a range? 2. How do we insert a new node? 3. How do we delete a new node? 4. Suppose you are given a skip list, can you strategically add and delete points so that the query times become bad (i.e., ω (log n ) )? Note that you will have to play the role of an adaptive adversary that can see the coin tosses (and therefore see the data structure as it evolves). 5. Suppose the coin tosses are hidden to you and you can’t measure the actual query times. Can you still strategically add and delete points so that the query times become bad? (Such an adversary that cannot see the coin tosses is called an oblivious adversary.) CS6100 (Even 2012): Range Searching 10
6. An alternative way to ask the previous question is the following. How do we prove that, under an oblivious adversary, the expected performance bounds of a skip list matches Table 1? CS6100 (Even 2012): Range Searching 11
kd-Trees Recall that we now want to perform 2D range searches. salary G. Ometer born: Aug 19, 1954 4,000 salary: $3,500 3,000 date of birth 19,500,000 19,559,999 So, we need a data structure that considers both the x AND the y coordinates. Kd-Trees achieve this by alternating between x and y . Let us now recursively construct the kd-Tree given a set P of n points in 2D. As in BBST’s, the data is stored in the leaves. The internal nodes serve the purpose of guiding searches to the required leaves. The root node (level 0) of the kd-Tree corresponds to the entire data set. CS6100 (Even 2012): Range Searching 12
To construct the level 1 nodes, i.e., the left and right children of the root, we split the data along the x median. The subtree rooted at the left child of the root node stores all points with x coordinate values no more than the x median. The rest are stored in the right subtree of the root node. P P right left ℓ To construct level 2 nodes, we again split the points stored in the subtrees rooted at each of the level 1 nodes into two roughly equal halves. However, this time, we split along the y median. We continue recursively alternating between splitting along x and y medians. CS6100 (Even 2012): Range Searching 13
Recommend
More recommend