tree structured indexes
play

Tree-Structured Indexes < k , list of rids of data records with - PDF document

Introduction As for any index, 3 alternatives for data entries k* : Data record with key value k < k , rid of data record with search key value k > Tree-Structured Indexes < k , list of rids of data records with search key


  1. Introduction � As for any index, 3 alternatives for data entries k* : � Data record with key value k � < k , rid of data record with search key value k > Tree-Structured Indexes � < k , list of rids of data records with search key k > � Choice is orthogonal to the indexing technique used to locate data entries k* . � Tree-structured indexing techniques support both range searches and equality searches . � ISAM : static structure; B+ tree : dynamic, adjusts gracefully under inserts and deletes. Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Database Management Systems, R. Ramakrishnan and J. Gehrke 2 Range Searches Range Searches � `` Find all students with gpa > 3.0 ’’ � Simple idea: Create an `index’ file – If data entries are sorted, do binary search to find – What is search cost if each index page has F entries? first such student, then scan to find others. � Problem? Index File k1 k2 kN Page 3 Page N Page 1 Page 2 Data (Entries) File Data File Page N Page 1 Page 2 Page 3 � Can do binary search on (smaller) index file! Database Management Systems, R. Ramakrishnan and J. Gehrke 3 Database Management Systems, R. Ramakrishnan and J. Gehrke 4 index entry ISAM Example ISAM Tree P0 K 1 P 1 K 2 P 2 K m P m � Each node can hold 2 entries � Index file may still be quite large. But we can – What is search cost if each leaf node can hold L apply the idea repeatedly! entries and each index node can hold F entries? Root 40 Non-leaf Pages 20 33 51 63 Leaf Pages Overflow 46* 55* page 10* 15* 20* 27* 33* 37* 40* 51* 63* 97* Primary pages � Leaf pages contain data entries . Database Management Systems, R. Ramakrishnan and J. Gehrke 5 Database Management Systems, R. Ramakrishnan and J. Gehrke 6

  2. After Inserting 23*, 48*, 41*, 42* ... … Then Deleting 42* Root Root 40 40 Index Index Pages Pages 20 33 51 63 20 33 51 63 Primary Primary Leaf Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 63* 63* Pages Pages 48* 41* 48* 41* Overflow 23* Overflow 23* Pages Pages 42* 42* Database Management Systems, R. Ramakrishnan and J. Gehrke 7 Database Management Systems, R. Ramakrishnan and J. Gehrke 8 … Then Deleting 51* After Deleting 41* and 51* Root Root 40 40 Index Index Pages Pages 20 33 51 63 20 33 51 63 Primary Primary Leaf Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97* 10* 15* 20* 27* 33* 37* 40* 46* 55* 63* 97* Pages Pages 48* 41* 48* 41* Overflow 23* Overflow 23* Pages Pages � Note 51 appears in Index Page but not in Leaf pages! Database Management Systems, R. Ramakrishnan and J. Gehrke 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 10 B+ Tree: The Most Widely Used Index Example B+ Tree � Insert/delete at log F N cost; keep tree height- � Search begins at root, and key comparisons balanced . (F = fanout, N = # leaf pages) direct it to a leaf (as in ISAM). � Minimum 50% occupancy (except for root). Each � Search for 5*, 15*, all data entries >= 24* ... node contains d <= m <= 2 d entries. The parameter d is called the order of the tree. Root � Supports equality and range-searches efficiently. 30 13 17 24 Index Entries (Direct search) 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Data Entries � Based on the search for 15*, we know it is not in the tree! ("Sequence set") Database Management Systems, R. Ramakrishnan and J. Gehrke 11 Database Management Systems, R. Ramakrishnan and J. Gehrke 12

  3. B+-tree Search Performance B+ Trees in Practice � Assume leaf pages can hold L data entries � Typical order: 100. Typical fill-factor: 67%. � Assume B+-tree has order d – average fanout = 133 � Typical capacities: � Assume the tree has to index N data entries – Height 4: 133 4 = 312,900,700 records – Height 3: 133 3 = 2,352,637 records � Can often hold top levels in buffer pool: � What is the best-case search performance – Level 1 = 1 page = 8 Kbytes (measured in number of I/Os)? – Level 2 = 133 pages = 1 Mbyte � What is the worst-case search performance – Level 3 = 17,689 pages = 133 MBytes Database Management Systems, R. Ramakrishnan and J. Gehrke 13 Database Management Systems, R. Ramakrishnan and J. Gehrke 14 Inserting 23* Inserting 8* … Root Root 13 17 24 30 13 17 24 30 3* 5* 19* 20* 22* 23* 24* 27* 29* 33* 34* 38* 39* 2* 7* 14* 16* 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 15 Database Management Systems, R. Ramakrishnan and J. Gehrke 16 Entry to be inserted in parent node Inserting 8* … Inserting 8* … (Note that 17 is pushed up and only appears once in the index. Contrast this with leaf split) 17 Entry to be inserted in parent node (Note that 5 is copied up and continues to appear in the leaf) Root Root 13 17 24 30 5 13 24 30 5 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 17 Database Management Systems, R. Ramakrishnan and J. Gehrke 18

  4. After Inserting 8* Inserting 8* … Note how tree grew Root by one level! Root 17 13 17 24 30 5 13 24 30 2* 3* 5* 7* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* � In this example, could have “redistributed” to sibling instead of splitting 3* 7* 8* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 5* 14* 16* � Not usually done in practice (Why?) Database Management Systems, R. Ramakrishnan and J. Gehrke 19 Database Management Systems, R. Ramakrishnan and J. Gehrke 20 Deleting 19* … Deleting 20* … Root Root 17 17 5 13 24 30 5 13 24 30 2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 20* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 21 Database Management Systems, R. Ramakrishnan and J. Gehrke 22 Deleting 20* … After Deleting 20* Root Root 17 17 Redistribution: note how entry is copied up 5 13 24 30 5 13 27 30 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 23 Database Management Systems, R. Ramakrishnan and J. Gehrke 24

  5. Deleting 24* … Deleting 24* … Root Root 17 17 5 13 27 30 5 13 27 30 3* 7* 8* 22* 24* 27* 29* 33* 34* 38* 39* 3* 7* 8* 22* 27* 29* 33* 34* 38* 39* 2* 5* 14* 16* 2* 5* 14* 16* Database Management Systems, R. Ramakrishnan and J. Gehrke 25 Database Management Systems, R. Ramakrishnan and J. Gehrke 26 Deleting 24* … Deleting 24* … Root Merge: note how entry is pulled down (contrast 17 Merge: note how with merge of leaf node) entry is deleted Root 5 13 17 30 5 13 30 2* 3* 5* 7* 8* 22* 27* 33* 34* 38* 39* 14* 16* 29* 2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 27 Database Management Systems, R. Ramakrishnan and J. Gehrke 28 Example of Non-leaf Re-distribution After Re-distribution � Entries are re-distributed by ` pushing through ’ the � During deletion of 24* splitting entry in the parent node. � In contrast to previous example, can re-distribute � Suffices to re-distribute index entry with key 20; entry from left child of root to right child. we’ve re-distributed 17 as well for illustration. Root Root 22 17 30 17 20 5 13 5 13 20 22 30 2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* 2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39* Database Management Systems, R. Ramakrishnan and J. Gehrke 29 Database Management Systems, R. Ramakrishnan and J. Gehrke 30

Recommend


More recommend