Indexed Files : Outline ! Introduction ! Indexed Files ! Full Index Organization ! Indexed Sequential Files ! Multilevel Indexes ! Overflow Management ! Performance Analysis rasitjutrakul
Indexed Files Ordered Random access sequential processing Sequential file structure fast slow Direct file structure slow fast Indexed file structure fast fast rasitjutrakul
Indexed file look up index Key Block address Key look up index Block address block # 0 block # 1 block # 2 block # 3 block # 4 675 693 270 105 987 Somchai Somwang Somnuek Somsamorn Somroo Data Data 675 693 270 105 987 Index 0 1 2 3 4 Index 105 270 675 693 907 Index 3 2 0 1 4 Index rasitjutrakul
Full Index Organization 4 0 block # 0 7 0 10 1 16 1 4 7 block # 0 block # 1 18 2 10 16 block # 1 19 2 20 3 18 19 block # 2 block # 2 21 3 20 21 block # 3 22 4 22 23 block # 4 23 4 block # 3 24 5 24 25 block # 5 25 5 26 28 block # 6 26 6 block # 4 28 6 33 35 block # 7 33 7 37 39 block # 8 35 7 block # 5 41 44 37 8 block # 9 39 8 48 78 block # 10 41 9 81 92 block # 11 block # 6 44 9 48 10 78 10 block # 7 81 11 92 11 rasitjutrakul
Indexed File Structure " Binary search for the target key in the index " RetrieveOne : SL [ BinarySearch ] + 1 rba " RetrieveAll : 1 rba + m2 sba " DeleteOne : SL [ RetrieveOne ] + 2 sba " InsertOne : SL [ RetrieveOne ] + 2 sba index blocks data blocks m1 blocks m2 blocks rasitjutrakul
Indexed Sequential Files " If records in the data file are ordered, – ordered sequential is fast. – do not have to be full indexed (keep only max index value of the data block) – # indexes decreases, # index blocks decreases search length decreases, improve performance 18932 38211 16345 17324 17543 18932 19823 20221 23847 38211 Siripol Sirirak Siri Siriroj Toy Tao Ting Took rasitjutrakul
Indexed Sequential File 4 7 block # 0 10 16 7 0 block # 1 16 1 block # 0 18 19 block # 2 19 2 20 21 block # 3 21 3 22 23 block # 4 block # 1 23 4 24 25 25 5 block # 5 26 28 block # 6 28 6 block # 2 35 7 33 35 block # 7 39 8 37 39 block # 8 44 9 41 44 block # 9 block # 3 78 10 48 78 block # 10 x 11 81 92 block # 11 rasitjutrakul
Indexed Sequential Files " 100,000 records, each of size 500 bytes " index record size = 20 bytes " block size = 2000 bytes 1 block = 4 data recs, 1 block = 100 index recs. 25,000 data blocks " Full index : – index file : 100,000 recs = 1000 index blocks " Indexed sequential : – index file : 25,000 recs = 250 index blocks rasitjutrakul
Multilevel Indexed Sequential " Trimming search length = better performance " Modify the logical structure of the index file one level two levels three levels rasitjutrakul
Indexed Sequential File : 2 levels 4 7 block # 0 10 16 7 0 block # 1 16 1 18 19 block # 2 19 2 19 20 21 block # 3 25 21 3 22 23 block # 4 23 4 24 25 25 5 block # 5 26 28 block # 6 28 6 35 7 33 35 39 block # 7 39 8 x 37 39 block # 8 44 9 41 44 block # 9 78 10 48 78 block # 10 x 11 81 92 block # 11 level 0 level 1 rasitjutrakul
Indexed Sequential File : 3 levels 4 7 block # 0 10 16 7 0 block # 1 16 1 18 19 block # 2 19 2 19 20 21 block # 3 25 21 3 22 23 block # 4 23 4 25 24 25 25 5 block # 5 x 26 28 block # 6 28 6 35 7 33 35 39 block # 7 39 8 x 37 39 block # 8 44 9 41 44 block # 9 78 10 48 78 block # 10 x 11 81 92 block # 11 level 0 level 1 level 2 rasitjutrakul
Overflow Records " Insertion generates overflow records " Allocate empty slots for each blocks " Reorganizing the file if needed " Allocate extra overflow blocks (overflow area) – overflow records are linked in a logical, ordered, chained fashion with the primary block to which they belongs – overflow recorded are not blocked rasitjutrakul
Overflow Records Primary area Overflow area 2 4 7 x 10 13 x 16 x 18 19 x 15 20 21 x 6 22 23 x 5 24 25 x 39 x 26 28 x 33 35 x ... 37 38 41 44 x 48 78 x 81 92 x rasitjutrakul
Performance Analysis " Number of rba 's needed to retrieve a target depends the height of the index tree. " The height depends on the NBLK and BF of index. " Let k be the avg. # of indexes per index block " Let the index tree be a h level tree. − h k 1 − = + + + + = 2 h 1 L NBLK 1 k k k index − k 1 = h NBLK k data = h log ( NBLK ) k data rasitjutrakul
Performance : RetrieveOne " 100,000 records, each of size 500 bytes " index record size = 20 bytes " block size = 2000 bytes " Full index : 1000 blocks : log 1000 ≈ 10 rba " Indexed seq (1 level) : 250 blocks – 1 + log 250 ≈ 9 rba " Indexed seq (multilevel) : BF = 2000/20 = 100 – h = ? log 100000 = 3 – 1 + h = 4 rba 100 rasitjutrakul
34 7 7 7 4 4 4 9 4 9 4, 7, 9, 34, 63, 66, 70, 71 34 7 7 7 Initial Loading 34 rasitjutrakul
Initial Loading 4 7 7 34 9 34 34 63 66 66 4 7 7 34 9 34 34 63 66 x 66 70 71 x 4, 7, 9, 34, 63, 66, 70, 71 rasitjutrakul
Reorganization Point " Reorganize when performance has deteriorated by 50% from the performance just after (initial loading). " Let n 1 be # of RetrieveOne in a unit time " Let n 2 be # of RetrieveAll in a unit time " Let L be the average length of overflow recs. − h k 1 − = + + 2 + + h 1 = L NBLK 1 k k k index − k 1 = h NBLK k data = h log ( NBLK ) k data rasitjutrakul
Physical Structure master index master index . . . cylinder index cylinder index cylinder index cylinder index . . . . . . . . . track index track index track index track index . . . . . . . . . rasitjutrakul
Physical Structure trk 0 master index cylinder indx track index data data trk 1 track index data data data data trk 2 track index data data data data track index data data data data trk 19 . . . trk 0 cylinder indx track index data data data trk 1 track index data data data data trk 2 track index data data data data track index data data data data trk 19 rasitjutrakul
Physical Structure level 0 level 1 data blocks ... level 1 data blocks ... level 1 data blocks ... etc. index index index index level 0 level 1 data blocks ... overflow level 1 data blocks ... overflow etc. index index blocks index blocks " Faster access – Mingling the data and index blocks : locality – Keep master index (level 0 index) in RAM rasitjutrakul
Example " 10,000 records, 160 bytes/record, key is 16 bytes, pointer is 4 bytes " HP7925 - 256 bytes/sector, 64 sectors/track, 9 tracks/cylinder, 815 cylinder " Choose BF = 6, utilization=(160x6)/(256x4) = 93.8% " 1 block = 1024, 1024/(16+4) = 51 index entries " 1 track = 64/4 = 16 blocks " 10000 records, 10000/6 = 1667 blocks " number of cylinders = 1667/(16x9-10) = 13 cyl. 16 block / tracks, 9 tracks/cylinder (9 track index block + 1 cylinder index block) rasitjutrakul
Recommend
More recommend