Storing Trees on Disk Drives Medha Bhadkamkar, Fernando Farfan, Vagelis Hristidis, Raju FLORIDA INTERNATIONAL UNIVERSITY Rangaswami 15th December, 2005 FAST 2005 WiP Report 1
Introduction Tree data are becoming commonplace: Offer an intuitive, natural way for organizing information. Examples: XML, multi-res video, natural sciences data (e.g. Bioinformatics), even traditional directory-file hierarchies. Disk drives are ubiquitous and seem irreplaceable Current approaches: Use relational databases Use flat files Our contributions Examine the tree storage problem Propose native data layout strategies for tree data 15th December, 2005 2 FAST 2005 WiP Report
Tree Structured Placement Idea: Optimize common accesses • Parent to child • Node to sibling Assumptions: • Each node occupies an entire disk block • Semi-sequential access information available 15th December, 2005 3 FAST 2005 WiP Report
Optimized Tree-Structured Placement Problems with basic tree placement: Significant fragmentation. Large random seeks Solution: Use non-free tracks Use rotationally-optimal track-regions 15th December, 2005 4 FAST 2005 WiP Report
Grouping Sequential Add nodes to ‘supernode’ until its capacity allows. Use depth-first traversal to get next node Low fragmentation Tree-preserving Groups adjacent nodes Avoids cycles in original tree Preserves original tree structure in grouping Greater fragmentation 15th December, 2005 5 FAST 2005 WiP Report
Grouping Examples Sequential Tree-preserving Assumption: Supernode can fit 5 nodes 15th December, 2005 6 FAST 2005 WiP Report
Building Supernode Trees Sequential Supernode List • Uses sequential grouping • Nodes linked in the order they are created Tree-Preserving Supernode Tree • Uses tree-preserving grouping • Edges according to original tree Sequential Supernode Tree • Uses sequential grouping • Several possibilities for edge creation • Avoid cycles 15th December, 2005 7 FAST 2005 WiP Report
Performance Evaluation 15th December, 2005 8 FAST 2005 WiP Report
Future Work Multiple drives Modeling more complex data and access patterns Allows data and application directed layout Requires detailed model for the disk-drive Storing graphs on disk drives… More generic than trees! Can use directed and weighted Can model several data-types and access patterns Can model relational data as well! 15th December, 2005 9 FAST 2005 WiP Report
Recommend
More recommend