cs525 advanced database organization
play

CS525: Advanced Database Organization Notes 6: Multi-dimensional - PowerPoint PPT Presentation

CS525: Advanced Database Organization Notes 6: Multi-dimensional indexes Yousef M. Elmehdwi Department of Computer Science Illinois Institute of Technology yelmehdwi@iit.edu February 12, 14, 19, 2018 Slides: adapted from a course taught by


  1. Performance Analysis: lookup/insert a search key Assumption: The grid index file can be store in memory Insert performance In addition to the lookup cost 1 more block write operation to update the bucket block If overflow , need to update the overflow link in the bucket and write an overflow block ) 49 / 149

  2. Using a grid index in multi-dimensional queries Performance of Grid index for the commonly used multi-dimensional queries Assumption: The grid index file can be stored entirely in memory 50 / 149

  3. 1) Partial Match queries The query specifies conditions on some dimensions but not on all dimensions Find all jewelry purchases by people with age = 50 You will access m disk blocks ( m is some dimension of the grid) 51 / 149

  4. 2) Range queries Find objects that are located either partial or wholly within a certain range Find all jewelry purchases by people whose 35 ≤ age ≤ 50 , 50 K ≤ salary ≤ 100 K In this example, we must access 4 disk blocks 52 / 149

  5. Announcement Coding assignment 2 due date: Sunday, March 11, 2018 by midnight (Chicago time:) Quiz 1: Post: Friday February 23. Due on Blackboard: Tuesday February 27 by midnight (Chicago time) Midterm: Close notes/book/friends: March 5 in class time 53 / 149

  6. 3) Nearest neighbor queries Find the nearest neighbor of a data point 54 / 149

  7. 3) Nearest neighbor queries Start by finding the nearest neighbor in the bucket that contains the data point 55 / 149

  8. 3) Nearest neighbor queries This distance will limit the block where you need to search to all blocks that intersect with this circle: 56 / 149

  9. 3) Nearest neighbor queries Expand the search region in an adjacent bucket that contained within the circle: 57 / 149

  10. 3) Nearest neighbor queries And so forth 58 / 149

  11. 3) Nearest neighbor queries And so forth 59 / 149

  12. 3) Nearest neighbor queries Note: You may need to expand the search range beyond the adjacent regions The nearest neighbor is outside the adjacent regions You must use the current nearest neighbor and the grid lines to decide whether you need to expend the range of the search 60 / 149

  13. 3) Nearest neighbor queries: Performance The expanding range search will access on average 9 data blocks (in a 2-dimensional grid index) 61 / 149

  14. 4) Where-am-I queries Given a location (i.e., coordinate) Find the object(s ) that contains the location Grid index cannot represent objects (can only present points) ⇒ Grid Index cannot handle Where-am-I type of queries The only kind of index that can handle Where-am-I queries is the R-tree (Region-tree) (Discussed later) 62 / 149

  15. Grid Index: Summary + Good for multiple-key search - Space, management overhead (nothing is free) - Need partitioning ranges that evenly split keys 63 / 149

  16. Grid Index A major problem with Grid Index files is Poor occupancy rate at many grid buckets Especially when you have 3 or more dimensions. You will have many buckets that are empty. 64 / 149

  17. Multi-dimensional index structures Hash like structures Grid files Partitioned Hash functions Tree like structures Multiple key indexes kd-trees Quad trees R-trees 65 / 149

  18. Partitioned Hashing Traditional hashing Problem with traditional hashing If the key is composite and some component of the key is not known we cannot compute a meaningful hash value at all 66 / 149

  19. Partitioned Hashing Partitioned Hashing The key is a composite: Use n hash functions , one function on one component 67 / 149

  20. Partitioned Hashing Partitioned Hashing The hash value is the concatenation of the individual hash function values 68 / 149

  21. Partitioned Hashing: Example 69 / 149

  22. Advantage of Partitioned Hashing Partitioned Hashing can generate a meaningful hash value for incomplete keys 70 / 149

  23. Partitioned Hashing: A complete example Data on people who buy jewelry Given hash functions Some Hash Function values 71 / 149

  24. Partitioned Hashing: A complete example The Partitioned Hash index 72 / 149

  25. Using a Partitioned Hashing The Partitioned Hash index 73 / 149

  26. 1) Partial Match queries Find people with age = 50 Age = 50 will hash to the hash value Hash(age) = 0 × × . Start at bucket 000 and scan to bucket 011 74 / 149

  27. 2) Range queries Find objects that are located either partial or wholly within a certain range Find people such that: 35 ≤ age ≤ 50 , 50 K ≤ salary ≤ 100 K 75 / 149

  28. 2) Range queries a) Hash all values inside the range Note: the block pointers can have duplicates b) Collect all the buckets (eliminate duplicate block pointers) c) Access all (unique) buckets (disk blocks) ⇒ Hashing is not appropriate for range type queries 76 / 149

  29. 3) Nearest neighbor queries Hashing is completely useless for nearest neighbor type queries Because: There is no notion of distance in the hash function Example: find records that with distance ≤ 1 to search key = 1 We hash the search key 1 77 / 149

  30. 3) Nearest neighbor queries However, we cannot use the distance in the hash table to locate “nearby” objects (records) The value 2 is near the value 1, but may get hash very far away 78 / 149

  31. Property of hashing: Closeness of bucket indexes has nothing to do with real distance between data points (because hashing computes a random number) 79 / 149

  32. 4) Where-am-I queries Hashing is also not useful here either Because hashing provide no information on distance 80 / 149

  33. Advantage of Partitioned Hashing Good hash functions will randomize the records ⇒ Partitioned hashing will achieve good occupancy rate per bucket 81 / 149

  34. Multi-dimensional index structures Hash like structures Grid files Partitioned Hash functions Tree like structures Multiple key indexes kd-trees Quad trees R-trees 82 / 149

  35. Multiple-key index special case of a multilevel index using different types of search keys in each level 83 / 149

  36. Multiple-key index: Example Data on people who buy jewelry A multiple-key index on keys (age, salary) 84 / 149

  37. Using a Multiple-key index: 1) Partial Match queries Find all people with age = 25 Use the index on age to find the index block(s) for age = 25 Then, scan all entries in the salary index file (list of blocks) indexed by age= 25 to find the records 85 / 149

  38. 1) Partial Match queries Multiple-key index for partial match query will only be useful when the first dimension is given We cannot use multiple-key index to process the following query efficiently 86 / 149

  39. 1) Partial Match queries Find all people who earn $60,000 who buy jewelry. We will need to scan the first index Result: many disk accesses 87 / 149

  40. 2) Range queries Find objects that are located either partial or wholly within a certain range Find people such that: 35 ≤ age ≤ 50 , 50 K ≤ salary ≤ 100 K 88 / 149

  41. 2) Range queries Use the range of age to find all of the subindexes that might contain answer Only need to search a limited number of lower level index files 89 / 149

  42. 3) Nearest neighbor queries The multiple key index can help in the processing of Nearest neighbor queries BUT: It involves a complicated expanding range search algorithm in “nearby branches” of the index tree 90 / 149

  43. 4) Where-am-I queries Multiple-key index are not used in Where-am-I queries 91 / 149

  44. Multi-dimensional index structures Hash like structures Grid files Partitioned Hash functions Tree like structures Multiple key indexes kd-trees Quad trees R-trees 92 / 149

  45. kd (k-dimensional) tree: The kd-tree as a main memory data structure Adaptation of the kd-tree for disk storage 93 / 149

  46. Review: Binary Search Tree Binary Search Tree (BST) is a binary tree where The values in the nodes in the left subtree of the node x in the tree has a smaller value than x The values in the nodes in the right subtree of the node x in the tree has a greater value than x Notice the above property holds for every node in the binary tree 94 / 149

  47. Review: Binary Search Tree: Example 95 / 149

  48. Review: Binary Search Tree: Example 96 / 149

  49. The kd-tree The kd-tree is a generalization of the classic Binary Search Tree (BST) The search key used at different levels belongs to a different dimension (domain) The dimensions at different levels will wrap around (i.e., circulate) 97 / 149

  50. Example: a 2-dimensional kd-tree 2 dimentions: x and y 98 / 149

  51. Properties Subtrees of x 1 must satisfy this property 99 / 149

  52. Properties Subtrees of y 1 and y 2 must satisfy this property And so on (for every level of the kd-tree ) 100 / 149

Recommend


More recommend