Data Storage and Disk Structure
A Simple Implementation of DBMS • One file per table – Students(name, id, dept) in a file Students – A meta symbol “#” to separate attributes Smith#123#CS Johnson#522#EE … • Database schema in a special file Schema Students#name#STR#id#INT#dept#STR Depts#name#STR#office#Str … CMPT 454: Database II -- Storage and Disk Structure 2
Naïve Query Answering SELECT * FROM Students WHERE dept = ‘CS’ | CSStud • Read file Schema to determine the attributes of relation Student and their types • Check that condition dept = ‘CS’ is semantically valid for Students • Create a new file CSStud • Read file Students, for each line – Check condition dept = ‘CS’, if it is true then write the line as a tuple to file CSStud • Add to the file Schema a line about CSStud • Problems – If we change EE to ECON in one tuple in Students, the entire file has to be rewritten – Even if we look for one student, we have to read the whole file – If multiple users read/write file Students simultaneously, what would happen? CMPT 454: Database II -- Storage and Disk Structure 3
Handling Joins SELECT office FROM Students, Depts WHERE Students.name = ‘Smith’ AND Students.dept = Depts.name; • Algorithm FOR each tuple s in Students DO FOR each tuple d in depts DO IF s.name = ‘Smith’ AND s.dept = d.dept THEN write d.office as a tuple to the output • More problems – Why do we need to match a student “Cindy” with all departments? – I/O Complexity: O(n 2 ), costly! – What if the system crashes? CMPT 454: Database II -- Storage and Disk Structure 4
Storage Device Hierarchy • How should we store data on disks so that queries can be answered efficiently? • How can we organize disks effectively so that a database built on top can be more efficient and robust? CMPT 454: Database II -- Storage and Disk Structure 5
Where Is DMBS? Cache Main memory DBMS Disk Virtual memory File system Tertiary storage CMPT 454: Database II -- Storage and Disk Structure 6
Volatile/Nonvolatile, Trustworthy Storage • A volatile device “forgets” what is stored in it when the power goes off – Main memory, cache • A nonvolatile device keeps its contents intact even for long periods when the device is turned off or there is a power failure – Disk, CD, USB flash memory, … • When a customer and a bank have controversy on a transaction … Let us check the transaction record – oops, the records are stored on a disk in the bank … • Trustworthy: once the data is stored, it cannot be changed later – WORM model: write once read many, e.g., CD ROM, ROM memory chips – WORM storage systems CMPT 454: Database II -- Storage and Disk Structure 7
Moving Head Disk Mechanism CMPT 454: Database II -- Storage and Disk Structure 8
Disk Controller • Control the actuator to move the head assembly • Select a surface to read or write • Transfer the bits read/to be written between main memory and the desired sector Disk CPU Disk Disk controller Disk Main memory Disk Bus CMPT 454: Database II -- Storage and Disk Structure 9
Disk Access Characteristics • How is a desired data block read? – The heads are positioned at the cylinder containing the track on which the block is located – The sectors containing the block move under the disk head as the entire disk assembly rotates • Latency of the disk: the time taken between the moment at which the command to read a block is issued and the time that the contents of the block appear in main memory CMPT 454: Database II -- Storage and Disk Structure 10
Breakdown of Disk Latency • The time taken by processor and disk controller to process the request – Typically a fraction of a millisecond, can be neglected • Seek time: the time to position the head assembly at the proper cylinder – Typically tens of milliseconds • Rotation latency: the time for the disk to rotate so the first of the sectors containing the block reaches the head – A typical disk rotates completely once very 10 milliseconds, thus, the expected rotational latency is about 5 milliseconds • Transfer time: the time it takes the sectors of the block and any gaps between them to rotate past the head – If a disk has 250 kb per track and rotates once in 10 milliseconds, the read rate is 25 mb per second CMPT 454: Database II -- Storage and Disk Structure 11
Breakdown of Disk Latency CMPT 454: Database II -- Storage and Disk Structure 12
Organizing Data by Cylinders • To reduce seek time, we can store data that is likely to be accessed together on a single cylinder or several adjacent cylinders • If all bocks to read/write are on a single track or on a cylinder consecutively, only one seek time and one rotational latency are needed • Writing blocks is quite analogous to reading a block – If it is required to verify whether the written block is correct, wait for an additional rotation and read each sector back • Modifying blocks – Read the block into main memory – Make changes to the block in the main memory copy of the block – Write the new contents of block back to the disk – If necessary, verify the write CMPT 454: Database II -- Storage and Disk Structure 13
Stripping and Mirroring • Bit-level stripping – Write bit i of each byte to disk i – Access is sped up for 8 times • Block-level stripping – With an array of n disks, block I of the disk array is written to disk (i mod n) + 1 and use block ⎣ i / n ⎦ of the disk • Mirroring: maintaining multiple copies CMPT 454: Database II -- Storage and Disk Structure 14
Stripping versus Mirroring Block-level stripping CMPT 454: Database II -- Storage and Disk Structure 15
RAID Levels 0, 1, 2, and 3 • Redundant Array of Independent/Inexpensive Drives • RAID 0: striping at the block level, no redundancy • RAID 1 mirroring and block striping • RAID 2: byte level striping + error-correcting codes • RAID 3: bit-interleaved parity organization CMPT 454: Database II -- Storage and Disk Structure 16
RAID Levels 4, 5, and 6 • RAID 4: block-level striping + a parity block • RAID 5: parity distributed in all disks • RAID 6: storing more redundant information to recover from multiple disk failures CMPT 454: Database II -- Storage and Disk Structure 17
RAM Model and I/O Model • RAM Model: when data can be held in main memory, the bottleneck of computation is CPU – Each data item can be accessed using approximate the same amount of time • I/O Model: if a block needs to be moved between disk and main memory, the time taken to perform the read/write is much longer than the time likely to be used to manipulate the data in main memory – Databases often cannot fit into main memory • The number of block access is a good approximation to the time needed by the algorithm and should be minimized CMPT 454: Database II -- Storage and Disk Structure 18
Sorting in Main Memory • Sorting 5,000,000 tuples, each tuple takes 128 bytes, totally 640 Mb data – cannot fit into a computer with 512 Mb main memory • Each disk block has 16 kb and can contain 128 tuples – 39,063 disk blocks • Quicksort? Fastest if all data is in memory – Need to randomly access data items, many accesses to disk blocks if data is on disk CMPT 454: Database II -- Storage and Disk Structure 19
Merge Sort • Basis: if the length of the list is one, return • Induction: if the list is of length more than one, divide the list into two lists that are either of the same length or as close as possible, recursively sort the two sublists and then merge the resulting sorted sublists into one sorted list • The time to merge two lists in main memory is O(n 1 +n 2 ) • Log 2 n phases in total, total cost O(n log n) CMPT 454: Database II -- Storage and Disk Structure 20
Two-Phase, Multiway Merge Sort • Phase 1 – sort main memory sized partitions into sorted sublists – Fill all available main memory with blocks from the original tuples to be sorted – Sort the records that are in main memory – Write the sorted records from main memory onto new blocks of disk, form one sorted list • Phase 2 – merge all the sorted sublists into a single sorted list – The number of sublists is smaller than the number of blocks in the available main memory CMPT 454: Database II -- Storage and Disk Structure 21
How to Merge? • Find the smallest key among the first remaining elements of all the lists – A linear search of the heads of all sublists which are in main memory • Move the smallest element to the first available position of the output block • If the output block is full, write it to disk and reinitialize the same buffer in main memory to hold the next output block • If the block from which the smallest element was just taken is now exhausted of records, read the next block from the same sorted sublist into the same buffer • Cost analysis – Blocks are read in an unpredictable order – Every block holding records from one of the sorted lists is read from disk only once – The number of block written to disk is the same of the blocks holding the sublists – Each record in a sublist will be moved to the output buffer exactly once CMPT 454: Database II -- Storage and Disk Structure 22
Recommend
More recommend