This Lecture • Physical Database Design • RAID Arrays Efficiency and Storage • Parity • Database File Structures • Indexes Database Systems • Further reading Michael Pound • The Manga Guide to Databases, Chapter 5 • Database Systems, Chapters 7, 18 Physical Design Physical Design • Design so far • Hardware • File Structure • E/R modelling helps find Project Description • Correct storage • Often specifying the the requirements of a hardware should be structure of files on disks database selected to avoid loss of has a huge impact on E/R Diagram data performance • Normalisation helps to • Speed can also be • Implementation of these refine a design by removing data affected by careful structures is also specific Table Design redundancy consideration of to a DBMS hardware • Next we need to think • Indexes can be chosen to Physical Database • RAID Arrays further improve speed about how the files will Design actually be arranged File Structure Hardware Structure and stored RAID Arrays RAID Level 0 • RAID - redundant array • RAID Arrays are • Files are split across of independent controller by software several disks (Striping) Data • Each file is split into parts, (inexpensive) disks or hardware one part stored on each • Storing information • At the OS level the RAID disk in the same position across more than one array will appear to be a • Sizes of each part is Data1 Data2 Data3 physical disk single storage device determined by the array Data4 Data5 Data6 • Speed - can access more • The array may actually controller than one disk contain dozens of disks • Vastly improves speed, but • Robustness – disk failure • Different levels (RAID 0, no redundancy Disk 1 Disk 2 Disk 3 doesn’t always mean • If any disk fails, all data is RAID 1,…) data is lost unrecoverable 1
RAID Level 1 Parity Checking • Files are duplicated over • Above RAID 1 involves • For two bits, A and B, all disks (Mirroring) calculating parity bits XOR is true if either A is Data • Each file is copied onto true or B is true, but not • Parity reduces the every disk both number of disks you • Provides improved read need for redundancy Data1 Data1 Data1 ⊕ performance but no A B • Parity is often Data2 Data2 Data2 write performance 0 0 0 calculated using XOR ⊕ • Large amounts of 0 1 1 redundancy, if all but a Disk 1 Disk 2 Disk 3 single disk fail the data 1 0 1 can still be recovered 1 1 0 Parity Checking Recovery With Parity • The parity of n blocks of data is calculated as • If any single disk breaks, including the parity D 1 ⊕ D 2 ⊕ ... ⊕ D n disk, XOR can be used to re-calculate that • For example: value. For example: D 1 0 0 1 1 0 1 1 0 D 1 0 0 1 1 0 1 1 0 D 2 1 0 1 1 0 0 1 0 D 3 1 1 0 0 0 0 0 0 D 3 1 1 0 0 0 0 0 0 D P 0 1 0 0 0 1 0 0 D P 0 1 0 0 0 1 0 0 D 2 1 0 1 1 0 0 1 0 XOR is 0 if there is an even number of ‘1’ bits and 1 if there is an odd number RAID Level 3 RAID Level 5 • Data is striped over • Data is striped over Data disks, and a parity disk disks with a distributed Data for redundancy parity • Data is split into bytes, • Data is split into blocks, Data1 Data2 Data3 P(1-3) bytes are written to parity blocks are Data1 Data 2 Data3 Parity(1-3) Data4 Data5 P(4-6) Data6 separate disks distributed throughout Data4 Data5 Data6 Parity(4-6) all disks • The final disk stores Data7 P(7-9) Data8 Data9 • Extended version of parity information P(11-12) Data10 Data11 Data12 RAID 3, and will even • Extremely fast, and will Disk 1 Disk 2 Disk 3 Disk 4 allow continued use allow for 1 disk failure Disk 1 Disk 2 Disk 3 Disk 4 after 1 disk failure 2
Other RAID Issues File Structure • Other RAID levels • Considerations with • The structure of files on • In general file structure consider RAID systems a disk is separate to the is concerned with: • Allowing more than a • Cost of disks RAID configuration • How files are stored on a single disk failure with • Do you need speed or • File structure is disk the minimum of • In what order files are redundancy? managed by the DBMS redundancy stored • How reliable are the • Often the designer of a • Nested RAID levels. For • The speed at which a file individual disks? database can have example RAID 10 is a • ‘Hot swapping’ can be retrieved control over aspects of mirrored array of striped • The speed at which files arrays this structure can be inserted or deleted Pages and Rows Pages and Rows • Row data is generally • A page will include a • A table will often span • For a student table: not written to disk header and the row multiple pages Page 1 individually. Instead, data: • INSERTs will be added at 11011465 Jack ... 1 rows are grouped into 11011658 Robert ... 2 Page Header the last position in the pages 11044348 Sarah ... 3 Row 1 newest page • Pages are often used as Row 2 11051499 Max ... 1 • Additional pages can be the atomic unit of I/O, Row 3 Row 4 Page 2 any read or write to the added if the previous Row 5 11012234 James ... 2 disks will be a page, one is full 11034868 Mike ... 2 • There are usually many even if only a single row 11048345 Anne ... 1 is affected rows in a page, but not always Unordered Files Ordered Files • Data sorted by one or • Full pages can • Unordered files are often • There is no ordering on more fields is called a propagate along and called Heaps values however, so sequential file require many rewrites • New records are inserted searching must be • Inserting and deleting • One solution is to conducted linearly into the last page of the from an ordered file is temporarily add records • To delete a record, the file difficult to an overflow file, page is retrieved, a row is • If the page is full, a new these are merged at a • If there is sufficient marked as deleted, and page is added at the end later time the page is written back space on the correct of the file • Overflows improve page, a record can be • This space is difficult to • This structure makes inserted and that page inserts, but make reclaim, so performance insertion very efficient re-written searches more difficult will deteriorate over time 3
Binary Search Binary Search Data Rows Page 11010001 ... 1 • A huge benefit of an • Consider the Student Searching for: 11062365 11011123 ... 2 ordered data structure table, ordered by sID: • We begin the search by 11023134 ... 3 is the concept of the retrieving the page half 11025421 ... 4 binary search SELECT * 11031341 ... 5 way through the file • A binary search is only FROM Student 11034342 ... 6 • The ID values in the WHERE sID = 11062365; possible when 11045332 ... 7 page are smaller than searching for specific 11058543 ... 8 • A linear search through values in a field, where the one we're looking 11062365 ... 9 the database could the data is also ordered for so we next look 11072234 ... 10 involve thousands of by that field further down the file 11074122 ... 11 reads 11077898 ... 12 11082232 ... 13 11083239 ... 14 Binary Search Binary Search Data Rows Page Data Rows Page 11010001 ... 1 11010001 ... 1 Searching for: 11062365 Searching for: 11062365 11011123 ... 2 11011123 ... 2 • We retrieve the page • We next retrieve the 11023134 ... 3 11023134 ... 3 half way between the page half way through 11025421 ... 4 11025421 ... 4 two previous pages the remaining half that 11031341 ... 5 11031341 ... 5 • The value we are 11034342 ... 6 11034342 ... 6 contains our value looking for is on this 11045332 ... 7 11045332 ... 7 • The values are larger page. We read the 11058543 ... 8 11058543 ... 8 than the ID we are 11062365 ... 9 remaining row data 11062365 ... 9 looking for, so we must 11072234 ... 10 • Binary searches 11072234 ... 10 travel backwards in the 11074122 ... 11 11074122 ... 11 complete in Log 2 n time, 11077898 ... 12 11077898 ... 12 file which is often better 11082232 ... 13 11082232 ... 13 than a linear search 11083239 ... 14 11083239 ... 14 Indexes Indexes • Strictly speaking, the • An Index is a data • Indexes are not unlike • There are a number of relational model states structure that resides those you find in books types indexes that the ordering of alongside a table, • The aim is to simplify the • Primary indexes refer to tuples does not matter providing faster access search for key words or a sequential file ordered to the rows • In reality this is values by a key (unique) • An Index is associated inefficient, searching • Often much faster than • Clustered indexes refer and ordering are much with one of more fields, looking through the to a sequential file easier using sequential improving searches book linearly ordered by some fields files involving those fields that may not be unique • The index will be • We can obtain further • The underlying data • Secondary indexes exists ordered to improve improvements with may or may not be search efficiency separately to the data indexes ordered ordering 4
Recommend
More recommend