cse 373 b trees
play

CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1 Motivation - PowerPoint PPT Presentation

CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1 Motivation What weve done so far: study difgerent dictionary implementations They all make one common assumption: all our data is stored in in-memory, on RAM . 2 ArrayDictionary


  1. CSE 373: B-trees Michael Lee Wednesday, Jan 31, 2018 1

  2. Motivation What we’ve done so far: study difgerent dictionary implementations They all make one common assumption: all our data is stored in in-memory, on RAM . 2 ◮ ArrayDictionary ◮ SortedArrayDictionary ◮ Binary search trees ◮ AVL trees ◮ Hash tables

  3. Motivation What we’ve done so far: study difgerent dictionary implementations They all make one common assumption: all our data is stored in in-memory, on RAM . 2 ◮ ArrayDictionary ◮ SortedArrayDictionary ◮ Binary search trees ◮ AVL trees ◮ Hash tables

  4. Motivation New challenge: what if our data is too large to store all in RAM? (For example, if we were trying to implement a database?) How can we do this effjciently? Two techniques: A tree-based technique Excels for range-lookups (e.g. “fjnd all users with an age between 20 and 30”, where “age” is the key) A hash-based technique Excels for specifjc key-value pair lookups 3

  5. Motivation New challenge: what if our data is too large to store all in RAM? (For example, if we were trying to implement a database?) How can we do this effjciently? Two techniques: Excels for range-lookups (e.g. “fjnd all users with an age between 20 and 30”, where “age” is the key) Excels for specifjc key-value pair lookups 3 ◮ A tree-based technique ◮ A hash-based technique

  6. A tree-based technique Idea 1: Use an AVL tree Suppose the tree has a height of 50. In the best case, how many disk accesses do we need to make? In the worst case? In the best case, the nodes we want happen to be stored in RAM, so we need zero accesses. In the worst case, each node is stored on a difgerent page on disk, so we need to make 50 accesses. 4

  7. A tree-based technique Idea 1: Use an AVL tree Suppose the tree has a height of 50. In the best case, how many disk accesses do we need to make? In the worst case? In the best case, the nodes we want happen to be stored in RAM, so we need zero accesses. In the worst case, each node is stored on a difgerent page on disk, so we need to make 50 accesses. 4

  8. M-ary search trees Idea 1: children. Each node contains a sorted array of children nodes. Example: 5 ◮ Instead of having each node have 2 children, make it have M ◮ Pick M so that each node fjts into a single page

  9. M-ary search trees Idea 1: children. Each node contains a sorted array of children nodes. Example: 5 ◮ Instead of having each node have 2 children, make it have M ◮ Pick M so that each node fjts into a single page

  10. The height is approximately log M n . We need to examine log M n nodes. log M n M-ary search trees n ? Assume the tree is balanced. Per each node, we need to fjnd the child to pick. We can do so using binary search : log M Total runtime: height wordPerNode log M . 6 ◮ What is the height of an M -ary search tree in terms of M and ◮ What is the worst-case runtime of get(...) ?

  11. M-ary search trees n ? Assume the tree is balanced. Per each node, we need to fjnd the child to pick. 6 ◮ What is the height of an M -ary search tree in terms of M and The height is approximately log M ( n ) . ◮ What is the worst-case runtime of get(...) ? We need to examine log M ( n ) nodes. We can do so using binary search : log 2 ( M ) Total runtime: height · wordPerNode = log M ( n ) · log 2 ( M ) .

  12. It’s log M n log M-ary trees With M -ary trees, how many disk accesses do we make, assuming each node is stored on one page? M ! When doing binary search, we need to check the child to see if its key is the one we should pick. 7 Is it log M ( n ) , or log M ( n ) log 2 ( M ) ?

  13. M-ary trees With M -ary trees, how many disk accesses do we make, assuming each node is stored on one page? the child to see if its key is the one we should pick. 7 Is it log M ( n ) , or log M ( n ) log 2 ( M ) ? It’s log M ( n ) log 2 ( M ) ! When doing binary search, we need to check

  14. B-Trees Idea 2: need in the parent – store keys? Internal node A node that stores only keys and pointers to children nodes Leaf node A node that stores only keys and values 8 ◮ Rather then visiting each child, what if we stored the info we ◮ To avoid redundancy, store values only in leaf nodes.

  15. B-Trees Idea 2: need in the parent – store keys? Internal node A node that stores only keys and pointers to children nodes Leaf node A node that stores only keys and values 8 ◮ Rather then visiting each child, what if we stored the info we ◮ To avoid redundancy, store values only in leaf nodes.

  16. B-Trees 18 d 33 f 32 b 31 a 29 a 27 a 26 e 25 m 19 z 17 c An example: 15 a 10 k 9 f 5 b 1 a 30 20 10 9

  17. B-Trees 60 24 25 26 27 28 29 31 33 37 45 33 20 36 40 44 46 50 55 57 58 60 70 100 21 19 A larger example (values in leaf nodes omitted): 10 15 40 4 10 1 2 3 5 6 7 11 17 12 13 14 15 20 25 30 11 12 13 15 10

  18. B-tree invariants The B-tree invariants 1. The B-tree node type invariant 2. The B-tree order invariant 3. The B-tree structure invariant 11

  19. The B-tree node type invariant B-tree node type invariant A B-tree has two types of node: internal nodes, and leaf nodes. 12

  20. The B-tree node type invariant A leaf node contains L key-value pairs, sorted by key. Example Note: M and L are parameters the creator of the B-tree must pick K V K V K V : of leaf node where L B-tree leaf node B-tree internal node K K K K 13 An internal node contains M pointers to children and M − 1 sorted keys. Note: M > 2 must be true. Example of internal node where M = 6 :

  21. The B-tree node type invariant B-tree leaf node Note: M and L are parameters the creator of the B-tree must pick K V K V K V A leaf node contains L key-value pairs, sorted by key. Example K B-tree internal node K K K 13 An internal node contains M pointers to children and M − 1 sorted keys. Note: M > 2 must be true. Example of internal node where M = 6 : of leaf node where L = 3 :

  22. The B-tree order invariant Example: 21 12 B-tree order invariant 3 7 This means the subtree between two adjacent keys a and b may For any given key k , all subtrees to the left may only contain keys 14 x that satisfy x < k . All subtrees to the right may only contain keys x that satisfy k ≥ k . only contain keys x that satisfy a ≤ x < b . x < 3 3 ≤ x < 7 7 ≤ x < 12 12 ≤ x < 21 21 ≤ x

  23. The B-tree structure invariant All other internal nodes must have exception is the root, which can have as few as 2 children. In other words: all nodes must be at least half-full . The only to L children. L All leaf nodes must have to M children. M to M children. containing L , the root node MUST be an internal node When n L B-tree structure when n 12 15 B-tree structure when n ≤ L If n ≤ L , the root node is a leaf:

  24. The B-tree structure invariant to M children. exception is the root, which can have as few as 2 children. In other words: all nodes must be at least half-full . The only to L children. All leaf nodes must have 15 12 All other internal nodes must have B-tree structure when n ≤ L If n ≤ L , the root node is a leaf: B-tree structure when n > L When n > L , the root node MUST be an internal node containing 2 to M children. � M � 2 � L � 2

  25. The B-tree structure invariant to M children. exception is the root, which can have as few as 2 children. In other words: all nodes must be at least half-full . The only to L children. All leaf nodes must have 15 12 All other internal nodes must have B-tree structure when n ≤ L If n ≤ L , the root node is a leaf: B-tree structure when n > L When n > L , the root node MUST be an internal node containing 2 to M children. � M � 2 � L � 2

  26. Why? Otherwise, we could end up with a linked list. It lets us ensure the tree stays balanced . If n is relatively small compared to M and L , it may not be possible for the root to actually be half-full. 16 ◮ Why must M > 2 ? ◮ Why do we insist almost all nodes must be at least half-full? ◮ Why is the root allowed to have as few as 2 children?

  27. Why? Otherwise, we could end up with a linked list. It lets us ensure the tree stays balanced . If n is relatively small compared to M and L , it may not be possible for the root to actually be half-full. 16 ◮ Why must M > 2 ? ◮ Why do we insist almost all nodes must be at least half-full? ◮ Why is the root allowed to have as few as 2 children?

  28. What’s the worst-case runtime of get(...) ? Num disk accesses? log M n log Number of disk accesses is log M n . B-tree get 47 32 34 38 39 41 50 44 50 49 27 60 70 Runtime roughly the same as M -ary trees: log L M . 28 24 17 09 12 44 06 01 02 03 06 08 10 22 20 27 34 12 14 16 17 19 20 Try running get(6) , get(39)

  29. log M n log Number of disk accesses is log M n . B-tree get 47 32 34 38 39 41 50 44 50 49 27 60 70 Runtime roughly the same as M -ary trees: log L M . 28 24 17 09 12 44 06 01 02 03 06 08 10 22 20 27 34 12 14 16 17 19 20 Try running get(6) , get(39) What’s the worst-case runtime of get(...) ? Num disk accesses?

Recommend


More recommend