Database Systems II Index Structures CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 111 Introduction We have discussed the organization of records in secondary storage blocks. Records have an address, either logical or physical. But SQL queries reference attribute values, not record addresses. SELECT * FROM R WHERE a=10; How to find the records that have certain specified attribute values? CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 112
Introduction value recordID1 value recordID2 value . . . blocks matching value index holding records records CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 113 Index-Structure Basics Storage structures consist of files. Data files store, e.g., the records of a relation. Search key : one or more attributes for which we want to be able to search efficiently. Index file over a data file for some search key associates search key values with pointers to (recordID = rid) data file records that have this value. Sequential file : records sorted according to their primary key. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 114
Index-Structure Basics Sequential File 10 20 30 40 50 60 70 80 90 100 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 115 Index-Structure Basics Three alternatives for data entries k*: - record with key value k - <k, rid of record with search key value k> - <k, list of rids of records with search key k> Choice is orthogonal to the indexing technique used to locate entries k* Two major indexing techniques: - tree-structures - hash tables. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 116
Index-Structure Basics Dense index : one index entry for every record in the data file. Sparse index : index entries only for some of the record in the data file. Typically, one entry per block of the data file. Primary index : determines the location of data file records, i.e. order of index entries same as order of data records. Secondary index does not determine data location. Can only have one primary index, but multiple secondary indexes. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 117 Index-Structure Basics Sequential File Dense Index 10 10 20 20 30 30 40 40 50 50 60 60 70 80 70 80 90 100 90 110 100 120 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 118
Index-Structure Basics Sequential File Sparse Index 10 10 20 30 50 30 70 40 90 50 110 60 130 150 70 80 170 190 90 210 100 230 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 119 Index-Structure Basics Duplicate key values – sparse index – data entry for first new key from block 10 10 10 20 10 30 20 40 20 30 30 30 40 45 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 120
Index-Structure Basics Sparse index: - requires less index space per record, - can keep more of index in memory, - needed for secondary indexes. Dense index: - can tell if any record exists without accessing data file, - better for insertions. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 121 Index-Structure Basics Index file can become very large, e.g. at least one tenth of data file size for records with ten attributes of same length. To speed-up index access, add a second index level on top of the first index level, a third level on top of the second one, . . . First level can be dense, other levels are sparse. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 122
Index-Structure Basics Sequential File Sparse 2nd level 10 10 10 20 90 30 170 50 30 250 70 40 90 330 50 110 410 60 130 490 150 70 570 80 170 190 90 210 100 230 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 123 Index-Structure Basics Index structure needs to support Equality Queries and Range Queries. Equality query : one attribute value specified, e.g. docID = 100, or age = 18. Range query : attribute range specified, e.g. 30 <= age <= 40. Index structures must also support DB modifications, i.e. insertions, deletions and updates. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 124
ISAM ISAM = Index Sequential Access Method Hierarchy of index files (tree structure) Non-leaf blocks Leaf blocks Overflow block Primary blocks CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 125 ISAM Leaf blocks contain data entries. Non-leaf blocks contain pairs (k i ,p i ), where k i is a search key value and p i a pointer to the (first of the) records with that search key value. P0 K 1 P 1 K 2 P 2 K m P m CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 126
ISAM File Creation Leaf (data) blocks allocated sequentially, sorted by search key. Then non-leaf blocks allocated, then space for overflow blocks. Index entries: <search key value, block id>; they „direct‟ search for data entries, which are in leaf blocks. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 127 ISAM Example Root 40 20 33 51 63 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 63* CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 128
ISAM Index Operations Search Start at root; use key comparisons to go to leaf. Insert Find leaf data entry belongs to, and put it there. Delete Find and remove from leaf; if empty overflow block, de-allocate. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 129 ISAM Example Root 40 Index blocks 20 33 51 63 Primary Leaf 10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 97* 63* blocks 41* 48* 23* Overflow blocks 42* After inserting 23*, 48*, 41*, 42* CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 130
ISAM Example Root 40 20 33 51 63 46* 55* 10* 15* 20* 27* 33* 37* 40* 63* 41* 48* 23* After deleting 42*, 51*, 97*. 51* appears in index level, but not in leaf level. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 131 ISAM Discussion Inserts / deletes affect only leaf pages. static tree structure Tree can degenerate into a linear list of overflow blocks. In this case, ISAM looses all advantages compared to a simple, non-hierarchical index file. Can we maintain a balanced tree structure dynamically under insertions / deletions? CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 132
B-Trees Introduction Tree node corresponds to block. B-trees are balanced , i.e. all leaves at same level. This guarantees efficient access. B-trees guarantee minimum space utilization. n ( order ): maximum number of keys per node, minimum number of keys is roughly n/2. Exception: root may have one key only. m + 1 pointers in node, m actual number of keys. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 133 B-Trees Introduction Index Entries (inner nodes) Data Entries (leaf nodes) leaf nodes are linked in sequential order this B-tree variant is normally referred to as B+-tree CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 134
B-Trees Introduction Node format: (p 1 ,k 1 , . . ., p n ,k n ,p n+1 ) p i : pointer, k i : search key Node with m pointers has m children and corresponding sub-trees. n+1 -th index entry has only pointer. At leaf level, this pointer references the next leaf node. Search key property : i -th subtree contains data entries with search key k < k i , i +1-th subtree contains data entries with search key k >= k i . CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 135 B-Trees Example Root n = 3 100 120 150 180 30 11 3 5 100 101 110 120 130 150 156 179 180 200 30 35 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 136
B-Trees Example Non-leaf 57 81 95 (inner) node to keys to keys to keys to keys < 57 57 k<81 81 k<95 95 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 137 B-Trees Example From non-leaf node to next leaf in sequence Leaf node 57 81 95 with key 57 with key 81 with key 85 To record To record To record CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 138
B-Trees Space utilization n = 3 full node min. node Non-leaf 120 150 180 30 counts even if null Leaf 11 30 35 3 5 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 139 B-Trees Space utilization Number of pointers/keys for B-tree Max Max Min Min ptrs keys ptrs data keys Non-leaf n+1 n (n+1)/ 2 (n+1)/ 2 - 1 (non-root) Leaf n+1 n (n+ 1) / 2 (n+ 1) / 2 (non-root) Root n+1 n 1 1 CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 140
B-Trees Equality Queries To search for key k , start from root. At a given node, find “nearest key” k i and follow left (p i ) or right (p i+1 ) pointer depending on comparison of k and k i . Continue, until leaf node reached. Explores one path from root to leaf node. Height of B-tree is O (log / N ) n 2 where N : number of records indexed runtime complexity O (log N ) CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 141 B-Trees Insertions Always insert in corresponding leaf. Tree grows bottom-up. Four different cases: space available in leaf, leaf overflow, non-leaf overflow, new root. CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 142
Recommend
More recommend