cs 225
play

CS 225 Data Structures Oc October 19 19 In Intro Kd Kd-tr - PowerPoint PPT Presentation

CS 225 Data Structures Oc October 19 19 In Intro Kd Kd-tr trees a s and Bt Btrees G G Carl Evans Ra Range-ba base sed d Searche hes Balanced BSTs are useful structures for range-based and nearest-neighbor searches. Q: Consider


  1. CS 225 Data Structures Oc October 19 19 – In Intro Kd Kd-tr trees a s and Bt Btrees G G Carl Evans

  2. Ra Range-ba base sed d Searche hes Balanced BSTs are useful structures for range-based and nearest-neighbor searches. Q: Consider points in 1D: p = {p 1 , p 2 , …, p n }. …what points fall in [11, 42]? Ex: 6 11 41 44 3 33 55

  3. Ra Range-ba base sed d Searche hes Q: Consider points in 1D: p = {p 1 , p 2 , …, p n }. …what points fall in [11, 42]? Tree construction:

  4. Ra Range-ba base sed d Searche hes Q: Consider points in 1D: p = {p 1 , p 2 , …, p n }. …what points fall in [11, 42]? 33 6 44 3 11 41 55 3 6 11 33 41 44

  5. Ra Range-ba base sed d Searche hes Consider points in 2D: p = {p 1 , p 2 , …, p n }. Q: What points are in the rectangle: p 2 [ (x 1 , y 1 ), (x 2 , y 2 ) ]? p 5 p 6 p 1 Q: What is the nearest point to (x 1 , y 1 ) ? p 3 p 4 p 7

  6. Ra Range-ba base sed d Searche hes Consider points in 2D: p = {p 1 , p 2 , …, p n }. Tree construction: p 2 p 5 p 6 p 1 p 3 p 4 p 7

  7. Ra Range-ba base sed d Searche hes p 2 p 5 p 6 p 1 p 7 p 3 p 1 p 2 p 3 p 4 p 5 p 6 p 4 p 7

  8. kD kD-Tr Trees p 2 p 5 p 6 p 1 p 7 p 3 p 1 p 2 p 3 p 4 p 5 p 6 p 4 p 7

  9. B-Tr Trees Q: Can we always fit our data in main memory? Q: Where else can we keep our data? However, Our big-O has assumed uniform time for all operations.

  10. Va Vast Differences in Time A 3GHz CPU performs 3m operations in _______. Old Argument: “Disk Storage is Slow” - Bleeding-edge storage is pretty fast: SSD - Large Disks (25 TB+) still have slow throughout: New Argument: “The Cloud is Slow!”

  11. AV AVLs on Disk 8 5 10 3 6 9 12 4 7 11 1 2

  12. Re Real Application Imagine storing TicTok profiles for everyone in the US: How many records? How much data in total? How deep is the AVL tree?

  13. BT BTree Mo Motiv tivatio tions Knowing that we have large seek times for data, we want to:

  14. BT BTree (o (of f orde der m) -3 8 23 25 31 42 43 55 m=9 Goal: Minimize the number of reads! Build a tree that uses ______________________ / node [1 network packet] [1 disk block]

  15. BT BTree In Inser ertio tion A BTrees of order m is an m-way tree: - All keys within a node are ordered - All leaves contain hold no more than m-1 keys. m=5

  16. BT BTree In Inser ertio tion When a BTree node reaches m keys: m=5

  17. BT BTree Re Recurs rsive Insert 23 42 m=3 -3 8 25 31 43 55

  18. BT BTree Re Recurs rsive Insert 23 42 m=3 -3 8 25 31 43 55

  19. BT BTree Vi Visua sualization/ n/Tool https://www.cs.usfca.edu/~galles/visualization/BTree.html

  20. Bt Btree Pr Properties A BTrees of order m is an m-way tree: - All keys within a node are ordered - All leaves contain hold no more than m-1 keys. - All internal nodes have exactly one more child than keys - Root nodes can be a leaf or have [2, m] children. - All non-root, internal nodes have [ceil(m/2), m] children. - All leaves are on the same level

  21. BT BTree 17 3 8 28 48 25 26 29 45 52 53 55 68 1 2 6 7 12 14 16

  22. BT BTree Se Search ch 23 -3 42 55 -11 8 25 31 43 60

  23. BT BTree Se Search ch 1 bool Btree::_exists(BTreeNode & node, const K & key) { 2 3 unsigned i; 4 for ( i = 0; i < node.keys_ct_ && key < node.keys_[i]; i++) { } 5 6 if ( i < node.keys_ct_ && key == node.keys_[i] ) { 7 return true; 8 } 9 10 if ( node.isLeaf() ) { 11 return false; 12 } else { 13 BTreeNode nextChild = node._fetchChild(i); 23 14 return _exists(nextChild, key); 15 } 16 } -3 42 55 -11 8 25 31 43 60

  24. BT BTree An Analysis The height of the BTree determines maximum number of ____________ possible in search data. …and the height of the structure is: ______________. Therefore: The number of seeks is no more than __________. …suppose we want to prove this!

  25. BT BTree An Analysis In our AVL Analysis, we saw finding an upper bound on the height (given n ) is the same as finding a lower bound on the nodes (given h ). We want to find a relationship for BTrees between the number of keys ( n ) and the height ( h ).

  26. BT BTree An Analysis Strategy: We will first count the number of nodes, level by level. Then, we will add the minimum number of keys per node ( n ). The minimum number of nodes will tell us the largest possible height ( h ), allowing us to find an upper-bound on height.

  27. BT BTree An Analysis The minimum number of nodes for a BTree of order m at each level : root: level 1: level 2: level 3: … level h:

  28. BT BTree An Analysis The total number of nodes is the sum of all of the levels:

  29. BT BTree An Analysis The total number of keys :

  30. BT BTree An Analysis The smallest total number of keys is: So an inequality about n , the total number of keys: Solving for h , since h is the number of seek operations:

  31. BT BTree An Analysis Given m=101 , a tree of height h=4 has: Minimum Keys: Maximum Keys:

Recommend


More recommend