external memory geometric data structures
play

External Memory Geometric Data Structures Lars Arge Duke University - PowerPoint PPT Presentation

External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets External memory data structures External Memory Geometric Data Structures Many massive dataset applications involve


  1. External Memory Geometric Data Structures Lars Arge Duke University June 27, 2002 Summer School on Massive Datasets

  2. External memory data structures External Memory Geometric Data Structures • Many massive dataset applications involve geometric data (or data that can be interpreted geometrically) – Points, lines, polygons • Data need to be stored in data structures on external storage media such that on-line queries can be answered I/O-efficiently • Data often need to be maintained during dynamic updates • Examples: – Phone: Wireless tracking – Consumer: Buying patterns (supermarket checkout) – Geography: NASA satellites generate 1.2 TB per day Lars Arge 2

  3. External memory data structures Example: LIDAR terrain data • Massive (irregular) point sets (1-10m resolution) • Appalachian Mountains (between 50GB and 5TB) • Need to be queried and updated efficiently Example: Jockey’s ridge (NC cost) Lars Arge 3

  4. External memory data structures Model • Model as previously – N : Elements in structure D – B : Elements per block – M : Elements in main memory Block I/O – T : Output size in searching problems M • Focus on – Worst-case structures – Dynamic structures P – Fundamental structures – Fundamental design techniques Lars Arge 4

  5. External memory data structures Outline • Today: Dimension one – External search trees: B-trees – Techniques/tools * Persistent B-trees (search in the past) * Buffer trees (efficient construction) • Tomorrow: “Dimension 1.5” – Handling intervals/segments (interval stabbing/point location) – Techniques/tools: Logarithmic method, weight-balanced B-trees, global rebuilding • Saturday: Dimension two – Two-dimensional range searching Lars Arge 5

  6. External memory data structures External Search Trees • Binary search tree: – Standard method for search among N elements – We assume elements in leaves Ο (log 2 N ) – Search traces at least one root-leaf path – If nodes stored arbitrarily on disk ÿ Search in Ο (log 2 N ) I/Os Ο N + ÿ Rangesearch in (log 2 ) T I/Os Lars Arge 6

  7. External memory data structures External Search Trees Ο (log 2 B ) Θ ( B ) • BFS blocking: Ο Ο = Ο – Block height (log ) / (log ) (log ) N B N 2 2 B – Output elements blocked þ Ο B N + (log ) Rangesearch in T I/Os B Ο B N + Ο (log T ) • Optimal: space and query ( N ) B B Lars Arge 7

  8. External memory data structures External Search Trees • Maintaining BFS blocking during updates? – Balance normally maintained in search trees using rotations x y y x • Seems very difficult to maintain BFS blocking during rotation – Also need to make sure output (leaves) is blocked! Lars Arge 8

  9. External memory data structures B-trees Θ ( B ) • BFS-blocking naturally corresponds to tree with fan-out • B-trees balanced by allowing node degree to vary – Rebalancing performed by splitting and merging nodes Lars Arge 9

  10. � � External memory data structures (a,b)-tree • T is an ( a , b )-tree ( a 2 and b 2 a -1) (2,4)− tree – All leaves on the same level (contain between a and b elements) – Except for the root, all nodes have degree between a and b – Root has degree between 2 and b (log N ) • ( a , b )-tree uses linear space and has height O a þ Θ Choosing a , b = ( B ) each node/leaf stored in one disk block þ Ο Ο B N + ( N ) space and (log ) query T B B Lars Arge 10

  11. External memory data structures ( a , b )-Tree Insert • Insert: Search and insert element in leaf v v DO v has b+1 elements Split v : + 1 b make nodes v’ and v’’ with ý ü û ú + + ≤ ≥ b 1 b 1 and elements b a 2 2 insert element (ref) in parent(v) (make new root if necessary) v’ v’’ v=parent(v) ý ü û ú + + 1 b 1 b 2 2 Ο • Insert touch (log N ) nodes a Lars Arge 11

  12. External memory data structures ( a , b )-Tree Insert Lars Arge 12

  13. � External memory data structures ( a , b )-Tree Delete • Delete: v Search and delete element from leaf v DO v has a-1 children − 1 a Fuse v with sibling v’: move children of v’ to v delete element (ref) from parent(v) (delete root if necessary) If v has >b (and a+b-1 ) children split v v v=parent(v) ≥ a 2 − 1 Ο • Delete touch (log N ) nodes a Lars Arge 13

  14. External memory data structures ( a , b )-Tree Delete Lars Arge 14

  15. � External memory data structures ( a , b )-Tree (2,3)-tree • ( a,b )-tree properties: – If b=2a-1 one update can insert cause many rebalancing delete operations – If b 2a update only cause O(1) rebalancing operations amortized = 1 – If b>2a 1 rebalancing operations amortized ( ) ( ) O O − a b a 2 * Both somewhat hard to show ( 1 – If b=4a easy to show that update causes rebalance log ) O N a a operations amortized * After split during insert a leaf contains ≅ 4a/2=2a elements * After fuse (and possible split) during delete a leaf contains between ≅ 2a and ≅ 5 a elements 2 Lars Arge 15

  16. External memory data structures ( a , b )-Tree • ( a , b )-tree with leaf parameters a l , b l ( b=4a and b l =4a l ) N – Height (log ) O a a l ( 1 – ) amortized leaf rebalance operations O a l ( 1 – log ) amortized internal node rebalance operations O N ⋅ a a a l Θ ( B ) • B-trees: ( a , b )-trees with a , b = – B-trees with elements in the leaves sometimes called B + -tree • Fan-out k B-tree: Θ – ( k/4 , k )-trees with leaf parameter ( B ) and elements in leaves ≥ Θ 1 c 1 • Fan-out B-tree with c ( ) B – O ( N/B ) space + = + – query (log T ) (log T ) O N O N 1 B B B c B (log ) O N – update B Lars Arge 16

  17. External memory data structures Persistent B-tree • In some applications we are interested in being able to access previous versions of data structure – Databases – Geometric data structures (later) • Partial persistence: – Update current version (getting new version) – Query all versions • We would like to have partial persistent B-tree with – O ( N/B ) space – N is number of updates performed – update (log ) O N B + – query in any version (log T ) O B N B Lars Arge 17

  18. External memory data structures Persistent B-tree • East way to make B-tree partial persistent – Copy structure at each operation – Maintain “version-access” structure (B-tree) update i i+1 i+2 i+3 i i+1 i+2 + • Good (log T ) query in any version, but O B N B – O ( N/B ) I/O update – O ( N 2 /B ) space Lars Arge 18

  19. External memory data structures Persistent B-tree • Idea: – Elements augmented with “existence interval” – Augmented elements stored in one structure – Elements “alive” at “time” t (version t ) form B-tree – Version access structure (B-tree) to access B-tree root at time t Lars Arge 19

  20. External memory data structures Persistent B-tree • Directed acyclic graph with elements in leaves (sinks) – Routing elements in internal nodes • Each element (routing element) and node has existence interval • Nodes alive at time t make up ( B/4 , B )-tree on alive elements • B-tree on all roots (version access structure) þ + Answer query at version t in (log T ) I/Os as in normal B-tree O B N B • Additional invariant: 3 7 – New node (only) contains between and live elements B B 8 8 þ 1 1 1 B B B 8 2 8 O ( N/B ) blocks 7 1 3 B B B B 4 8 8 Lars Arge 20

  21. External memory data structures Persistent B-tree Insert • Search for relevant leaf l and insert new element • If l contains x >B elements: Block overflow – Version split: Mark l dead and create new node v with x alive element > 7 – If : Strong overflow x B 8 < 3 – If : Strong underflow x B 8 ≤ ≤ – If then recursively update parent ( l ): 3 7 B x B 8 8 Delete reference to l and insert reference to v 1 3 7 1 3 7 B B B B B B B B 8 4 8 8 4 8 Lars Arge 21

  22. External memory data structures Persistent B-tree Insert > • Strong overflow ( 7 ) x B 8 < ≤ 3 1 x x – Split v into v’ and v’ with elements each ( B B ) 2 8 2 2 – Recursively update parent ( l ): Delete reference to l and insert reference to v’ and v’’ 7 1 3 1 7 3 7 3 7 1 3 1 B B B B B B B B B B B B B B B B 4 8 8 8 8 4 8 4 8 8 4 8 < 3 • Strong underflow ( x B ) 8 – Merge x elements with y live elements obtained by version split + ≥ on sibling ( 1 ) x y B 2 + ≥ – If 7 then (strong overflow) perform split x y B 8 – Recursively update parent ( l ): Delete two references insert one or two references Lars Arge 22

  23. External memory data structures Persistent B-tree Delete • Search for relevant leaf l and mark element dead < • If l contains 1 alive elements: Block underflow x B 4 – Version split: Mark l dead and create new node v with x alive element < 3 – Strong underflow ( ): x B 8 Merge (version split) and possibly split (strong overflow) – Recursively update parent ( l ): Delete two references insert one or two references 1 1 1 B B B 8 2 8 1 3 7 B B B B 8 4 8 Lars Arge 23

Recommend


More recommend