Tree-Structured Indexing Torsten Grust Chapter 4 Tree-Structured Indexing ISAM and B + -trees Binary Search ISAM Architecture and Implementation of Database Systems Multi-Level ISAM Too Static? Summer 2016 Search Efficiency B + -trees Search Insert Redistribution Delete Duplicates Key Compression Bulk Loading Partitioned B+-trees Torsten Grust Wilhelm-Schickard-Institut für Informatik Universität Tübingen 1
Tree-Structured Ordered Files and Binary Search Indexing Torsten Grust How could we prepare for such queries and evaluate them efficiently? 1 SELECT * 2 FROM CUSTOMERS 3 WHERE ZIPCODE BETWEEN 8880 AND 8999 Binary Search We could ISAM Multi-Level ISAM 1 sort the table on disk (in ZIPCODE -order) Too Static? Search Efficiency 2 To answer queries, use binary search to find the first B + -trees Search qualifying tuple, then scan as long as ZIPCODE < 8999 . Insert Redistribution Delete Here, let k ∗ denote the full record with key k : Duplicates Key Compression Bulk Loading Partitioned B+-trees 4104* 4123* 4222* 4450* 4528* 5012* 6330* 6423* 8050* 8105* 8180* 8245* 8280* 8406* 8570* 8600* 8604* 8700* 8808* 8887* 8910* 8953* 9016* 9200* 9532* scan 2
Tree-Structured Ordered Files and Binary Search Indexing Torsten Grust 4104* 4123* 4222* 4450* 4528* 5012* 6330* 6423* 8050* 8105* 8180* 8245* 8280* 8406* 8570* 8600* 8604* 8700* 8808* 8887* 8910* 8953* 9016* 9200* 9532* Binary Search page 0 page 1 page 2 page 3 page 4 page 5 page 6 page 7 page 8 page 9 page 10 page 11 page 12 ISAM Multi-Level ISAM scan Too Static? Search Efficiency ✧ We get sequential access during the scan phase . B + -trees Search We need to read log 2 (# tuples ) tuples during the search phase . Insert Redistribution Delete Duplicates ✪ We need to read about as many pages for this. Key Compression Bulk Loading Partitioned B+-trees The whole point of binary search is that we make far, unpredictable jumps . This largely defeats page � prefetching. 3
Tree-Structured Tree-Structured Indexing Indexing Torsten Grust • This chapter discusses two index structures which especially Binary Search shine if we need to support range selections (and thus ISAM sorted file scans): ISAM files and B + -trees . Multi-Level ISAM Too Static? • Both indexes are based on the same simple idea which Search Efficiency B + -trees naturally leads to a tree-structured organization of the Search indexes. (Hash indexes are covered in a subsequent chapter.) Insert Redistribution • B + -trees refine the idea underlying the rather static ISAM Delete Duplicates scheme and add efficient support for insertions and Key Compression Bulk Loading deletions . Partitioned B+-trees 4
Tree-Structured Indexed Sequential Access Method (ISAM) Indexing Torsten Grust Remember: range selections on ordered files may use binary Binary Search ISAM search to locate the lower range limit as a starting point for a Multi-Level ISAM sequential scan of the file (until the upper limit is reached). Too Static? Search Efficiency B + -trees ISAM . . . Search Insert • . . . acts as a replacement for the binary search phase, and Redistribution Delete • touches considerably fewer pages than binary search. Duplicates Key Compression Bulk Loading Partitioned B+-trees 5
Tree-Structured Indexed Sequential Access Method (ISAM) Indexing Torsten Grust To support range selections on field A : 1 In addition to the A -sorted data file, maintain an index file with entries (records) of the following form: Binary Search ISAM index entry separator pointer Multi-Level ISAM Too Static? Search Efficiency p 0 k 1 p 1 k 2 p 2 · · · k n p n B + -trees • • • • Search Insert Redistribution 2 ISAM leads to sparse index structures. In an index entry Delete Duplicates Key Compression � k i , pointer to p i � , Bulk Loading Partitioned B+-trees key k i is the first ( i.e. , the minimal) A value on the data file page pointed to by p i ( p i : page no). 6
Tree-Structured Indexed Sequential Access Method (ISAM) Indexing Torsten Grust index entry separator key p 0 k 1 p 1 k 2 p 2 · · · k n p n • • • • Binary Search ISAM • In the index file, the k i serve as separators between the Multi-Level ISAM Too Static? contents of pages p i − 1 and p i . Search Efficiency • It is guaranteed that k i − 1 < k i for i = 2 , . . . , n . B + -trees Search • We obtain a one-level ISAM structure . Insert Redistribution Delete Duplicates One-level ISAM structure of N + 1 pages Key Compression Bulk Loading Partitioned B+-trees index file k k k 2 1 N data file p 0 p p p 1 2 N 7
Tree-Structured Searching ISAM Indexing Torsten Grust SQL range selection on field A 1 SELECT * Binary Search 2 FROM R ISAM 3 WHERE A BETWEEN lower AND upper Multi-Level ISAM Too Static? Search Efficiency B + -trees To support range selections : Search Insert 1 Conduct a binary search on the index file for a key of Redistribution Delete value lower . Duplicates Key Compression 2 Start a sequential scan of the data file from the page Bulk Loading Partitioned B+-trees pointed to by the index entry (scan until field A exceeds upper ). 8
Tree-Structured Indexed Sequential Access Method ISAM Indexing Torsten Grust • The size of the index file is likely to be much smaller than the data file size. Searching the index is far more efficient than searching the data file. Binary Search • For large data files, however, even the index file might be ISAM Multi-Level ISAM too large to allow for fast searches. Too Static? Search Efficiency B + -trees Search Insert Main idea behind ISAM indexing Redistribution Delete Recursively apply the index creation step: treat the topmost Duplicates Key Compression index level like the data file and add an additional index layer on Bulk Loading Partitioned B+-trees top. Repeat, until the the top-most index layer fits on a single page (the root page ). 9
Tree-Structured Multi-Level ISAM Structure Indexing Torsten Grust This recursive index creation scheme leads to a tree-structured hierarchy of index levels: Multi-level ISAM structure Binary Search ISAM Multi-Level ISAM 8180 8910 Too Static? index pages Search Efficiency • • • B + -trees Search Insert 4222 4528 6330 8050 8280 8570 8604 8808 9016 9532 Redistribution Delete Duplicates • • • • • • • • • • • • • Key Compression Bulk Loading Partitioned B+-trees pages 4104* 4123* 4222* 4450* 4528* 5012* 6330* 6423* 8050* 8105* 8180* 8245* 8280* 8406* 8570* 8600* 8604* 8700* 8808* 8887* 8910* 8953* 9016* 9200* 9532* data 10
Tree-Structured Multi-Level ISAM Structure Indexing Torsten Grust Multi-level ISAM structure 8180 8910 index pages • • • 4222 4528 6330 8050 8280 8570 8604 8808 9016 9532 Binary Search • • • • • • • • • • • • • ISAM Multi-Level ISAM pages Too Static? 4104* 4123* 4222* 4450* 4528* 5012* 6330* 6423* 8050* 8105* 8180* 8245* 8280* 8406* 8570* 8600* 8604* 8700* 8808* 8887* 8910* 8953* 9016* 9200* 9532* data Search Efficiency B + -trees Search Insert Redistribution Delete Duplicates • Each ISAM tree node corresponds to one page (disk block). Key Compression Bulk Loading Partitioned B+-trees • To create the ISAM structure for a given data file, proceed bottom-up : 1 Sort the data file on the search key field. 2 Create the index leaf level. 3 If the top-most index level contains more than one page, repeat. 11
Tree-Structured Multi-Level ISAM Structure: Overflow Pages Indexing Torsten Grust • The upper index levels of the ISAM tree remain static : insertions and deletions in the data file do not affect the upper tree layers. • Insertion of record into data file: if space is left on the Binary Search associated leaf page, insert record there. ISAM • Otherwise create and maintain a chain of overflow pages Multi-Level ISAM Too Static? hanging off the full primary leaf page. Note : the records on Search Efficiency B + -trees the overflow pages are not ordered in general. Search ⇒ Over time, search performance in ISAM can degrade . Insert Redistribution Delete Duplicates Multi-level ISAM structure with overflow pages Key Compression Bulk Loading Partitioned B+-trees · · · · · · · · · · · · · · · · · · · · · overflow pages 12
Tree-Structured Multi-Level ISAM Structure: Example Indexing Torsten Grust Eeach page can hold two index entries plus one (the left-most) page pointer: Example (Initial state of ISAM structure) Binary Search ISAM root page Multi-Level ISAM Too Static? 40 Search Efficiency B + -trees Search Insert Redistribution 20 33 51 63 Delete Duplicates Key Compression Bulk Loading Partitioned B+-trees 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 63* 13
Tree-Structured Multi-Level ISAM Structure: Insertions Indexing Torsten Grust Example (After insertion of data records with keys 23 , 48 , 41 , 42 ) root page 40 Binary Search ISAM Multi-Level ISAM Too Static? 20 33 51 63 Search Efficiency B + -trees Search primary Insert leaf Redistribution 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* pages 63* Delete Duplicates Key Compression Bulk Loading 48* 41* overflow 23* Partitioned B+-trees pages 42* 14
Recommend
More recommend