CS 525: Advanced Database Topics for today Organization • How to lay out data on disk 03: Disk Organization • How to move it to/from memory Boris Glavic Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 525 Notes 3 1 CS 525 Notes 3 2 What are the data items we want to store? What are the data items we want to store? • a salary • a salary • a name • a name • a date • a date • a picture • a picture What we have available: Bytes 8 bits CS 525 Notes 3 3 CS 525 Notes 3 4 To represent: To represent: • Integer (short): 2 bytes • Characters e.g., 35 is → various coding schemes suggested, 00000000 00100011 most popular is ASCII (1 byte encoding) Endian! Could as well be Example: 00100011 00000000 A: 1000001 • Real, floating point a: 1100001 n bits for mantissa, m for exponent…. 5: 0110101 LF: 0001010 CS 525 Notes 3 5 CS 525 Notes 3 6 1
To represent: To represent: • Boolean • Boolean 1111 1111 1111 1111 e.g., TRUE e.g., TRUE FALSE FALSE 0000 0000 0000 0000 • Application specific • Application specific e.g., enumeration e.g., RED → 1 GREEN → 3 RED → 1 GREEN → 3 BLUE → 2 YELLOW → 4 … BLUE → 2 YELLOW → 4 … Can we use less than 1 byte/code? Yes, but only if desperate... CS 525 Notes 3 7 CS 525 Notes 3 8 To represent: To represent: • Dates • String of characters e.g.: - Integer, # days since Jan 1, 1900 – Null terminated - 8 characters, YYYYMMDD e.g., c a t - 7 characters, YYYYDDD (not YYMMDD! Why?) – Length given e.g., • Time 3 c a t e.g. - Integer, seconds since midnight - Fixed length - characters, HHMMSSFF CS 525 Notes 3 9 CS 525 Notes 3 10 To represent: Key Point • Bag of bits Length Bits • Fixed length items • Variable length items - usually length given at beginning CS 525 Notes 3 11 CS 525 Notes 3 12 2
Overview Data Items Also Records • Type of an item: Tells us how to interpret Blocks (plus size if fixed) Files Memory CS 525 Notes 3 13 CS 525 Notes 3 14 Record - Collection of related data Types of records: items (called FIELDS) E.g.: Employee record: • Main choices: name field, – FIXED vs VARIABLE FORMAT – FIXED vs VARIABLE LENGTH salary field, date-of-hire field, ... CS 525 Notes 3 15 CS 525 Notes 3 16 Fixed format Example: fixed format and length A SCHEMA (not record) contains Employee record following information (1) E#, 2 byte integer - # fields (2) E.name, 10 char. Schema - type of each field (3) Dept, 2 byte code - order in record 55 s m i t h 02 - meaning of each field Records 83 j o n e s 01 CS 525 Notes 3 17 CS 525 Notes 3 18 3
Variable format Example: variable format and length • Record itself contains format 2 5 I 46 4 S 4 F O R D Code identifying Code for Ename “ Self Describing ” # Fields Length of str. field as E# Integer type String type Field name codes could also be strings, i.e. TAGS CS 525 Notes 3 19 CS 525 Notes 3 20 Variable format useful for: • EXAMPLE: var format record with repeating fields • “ sparse ” records Employee → one or more → children • repeating fields • evolving formats 3 E_name: Fred Child: Sally Child: Tom But may waste space... CS 525 Notes 3 21 CS 525 Notes 3 22 Note: Repeating fields does not imply Note: Repeating fields does not imply - variable format, nor - variable format, nor - variable size - variable size John Sailing Chess -- John Sailing Chess -- • Key is to allocate maximum number of repeating fields (if not used → null) CS 525 Notes 3 23 CS 525 Notes 3 24 4
Many variants between Record header - data at beginning fixed - variable format: that describes record Example: Include record type in record May contain: - record type 5 27 . . . . - record length record type record length - time stamp tells me what - null-value bitmap to expect - other stuff ... (i.e. points to schema) CS 525 Notes 3 25 CS 525 Notes 3 26 Record Header – null-map Other interesting issues: • Compression • SQL: NULL is special value for every data type – within record - e.g. code selection – Reserve one value for each data type as – collection of records - e.g. find common NULL? patterns • Easier solution • Encryption – Record header has a bitmap to store • Splitting of large records whether field is NULL – E.g., image field, store pointer – Only store non-NULL fields in record CS 525 Notes 3 27 CS 525 Notes 3 28 Separate Storage of Large Encrypting Records Values new • Store fields with large values separately record E(r) r trusted dbms – E.g., image or binary document processor – Records have pointers to large field E(r 1 ) content E(r 2 ) E(r 3 ) • Rationale E(r 4 ) ... – Large fields mostly not used in search conditions – Benefit from smaller records CS 525 Notes 3 29 CS 525 Notes 3 30 5
Encrypting Records Search Key in the Clear search search ?? Q: k=2 A: [2, E(b 2 )] F(r)=x k=2 trusted trusted dbms dbms processor processor E(r 1 ) [1, E(b 1 )] E(r 2 ) [2, E(b 2 )] E(r 3 ) [3, E(b 3 )] E(r 4 ) [4, E(b 4 )] ... ... • each record is [k,b] • store [k, E(b)] • can search for records with k=x CS 525 Notes 3 31 CS 525 Notes 3 32 Issues Encrypt Key • Hard to do range queries search • Encryption not good Q: k ’ =E(2) A: [E(2), E(b 2 )] k=2 trusted dbms processor • Better to use encryption that does not always generate same cyphertext [E(1), E(b 1 )] [E(2), E(b 2 )] [E(3), E(b 3 )] [E(4), E(b 4 )] ... k k E(k, random) • each record is [k,b] E D • store [E(k), E(b)] • can search for records with k=E(x) simplification CS 525 Notes 3 33 CS 525 Notes 3 34 How Do We Search Now? Solution? • Develop new decryption function: ??? D(f(k 1 ), E(k 2 , rand)) is true if k 1 =k 2 search A: [E(2,dhe), E(b 2 )] Q: k ’ =E(2) k=2 trusted [E(2, lkz), E(b 4 )] dbms processor [E(1, abc), E(b 1 )] [E(2, dhe), E(b 2 )] [E(3, nft), E(b 3 )] [E(2, lkz), E(b 4 )] ... • each record is [k,b] • store [E(k, rand), E(b)] • can search for records with k=E(x,???)? CS 525 Notes 3 35 CS 525 Notes 3 36 6
Solution? Issues? • Develop new decryption function: • Cannot do non-equality predicates D(f(k 1 ), E(k 2 , rand)) is true if k 1 =k 2 • Hard to build indexes Q: check if D(f(2),*) true search A: [E(2,dhe), E(b 2 )] k=2 [E(2, lkz), E(b 4 )] trusted dbms processor [E(1, abc), E(b 1 )] [E(2, dhe), E(b 2 )] [E(3, nft), E(b 3 )] [E(2, lkz), E(b 4 )] ... CS 525 Notes 3 37 CS 525 Notes 3 38 Next: placing records into blocks Next: placing records into blocks assume fixed length blocks blocks ... blocks ... a file a file assume a single file (for now) CS 525 Notes 3 39 CS 525 Notes 3 40 (1) Separating records Options for storing records in blocks: Block R1 R2 R3 (1) separating records (2) spanned vs. unspanned (a) no need to separate - fixed size recs. (3) sequencing (b) special marker (4) indirection (c) give record lengths (or offsets) - within each record - in block header CS 525 Notes 3 41 CS 525 Notes 3 42 7
(2) Spanned vs. Unspanned With spanned records: • Unspanned: records must be within one R3 R3 R7 R1 R2 R4 R5 R6 block (a) (b) (a) block 1 block 2 ... need indication R1 R2 R3 R4 R5 need indication of partial record of continuation (+ from where?) • Spanned “ pointer ” to rest block 1 block 2 R3 R3 R7 R1 R2 R4 R5 R6 (a) (b) (a) ... CS 525 Notes 3 43 CS 525 Notes 3 44 (3) Sequencing Spanned vs. unspanned: • Unspanned is much simpler, but may • Ordering records in file (and block) by waste space… some key value • Spanned essential if record size > block size Sequential file ( ⇒ sequenced) CS 525 Notes 3 45 CS 525 Notes 3 46 Why sequencing? Sequencing Options Typically to make it possible to efficiently (a) Next record physically contiguous read records in order (e.g., to do a merge-join — discussed later) ... R1 Next (R1) (b) Linked R1 Next (R1) CS 525 Notes 3 47 CS 525 Notes 3 48 8
Sequencing Options Sequencing Options (c) Overflow area (c) Overflow area header Records Records R1 R1 R2.1 in sequence R2 in sequence R2 R1.3 R3 R3 R4.7 R4 R4 R5 R5 CS 525 Notes 3 49 CS 525 Notes 3 50 (4) Indirection (4) Indirection • How does one refer to records? • How does one refer to records? Rx Rx Many options: Physical Indirect CS 525 Notes 3 51 CS 525 Notes 3 52 Purely Physical Fully Indirect E.g., Record ID is arbitrary bit string Device ID E.g., Record Cylinder # map Block ID Address = Track # rec ID or ID Block # r address Physical Offset in block Rec ID addr. a CS 525 Notes 3 53 CS 525 Notes 3 54 9
Recommend
More recommend