Storing Data: Disks and Files Garcia Molina, Ullman, Widom Ramakrishnan/Gehrke Ch. 9 "Digital information lasts forever - or five years, whichever comes first." -- Jeff Rothenberg, RAND Corp., 1997 340151 Big Data & Cloud Computing (P. Baumann) 1
Why Not Everything in Main Memory? Costs too much • [Rama/Gehrke] $1000 will buy you either 128MB of RAM or 7.5GB of disk • Today: 80 EUR will buy you either 4 GB of RAM or 1 TB of disk • …but today we have multi -Terabyte databases! Main memory is volatile • want data to be saved between runs (obviously!) Typical storage hierarchy: • Main memory (RAM) for currently used data • Disk for main database (secondary storage) • Tapes for archiving older versions of data (tertiary storage) 340151 Big Data & Cloud Computing (P. Baumann) 2
Storage Capacity Absolute times as of 2003, but ratios still ~ same 340151 Big Data & Cloud Computing (P. Baumann) 3
Storage Cost Again, absolute values as of 2003, but ratios still ~ same 340151 Big Data & Cloud Computing (P. Baumann) 4
Storage Hierarchies Primary memory Main memory Larger Magnetic disks Secondary Cheaper memory RAID systems Slower Magneto-optical media Tertiary Optical media memory Magnetic tapes Storage capacity Storage capacity 340151 Big Data & Cloud Computing (P. Baumann) 5
Numbers source: http://carlos.bueno.org/2014/11/cache.html 340151 Big Data & Cloud Computing (P. Baumann) 6
Nearline (Tertiary) Storage Usually tape • Reel, today: cartridge • Capacity 10 GB ~6 TB per tape Tape robots • HSM = Hierarchical storage management • multi-Petabytes 340151 Big Data & Cloud Computing (P. Baumann) 7
Caching & Virtual Memory Cache: Fast memory, holding frequently used parts of a slower, larger memory • small (L1) cache holds a few kilobytes of the memory "most recently used" by the processor • Most operating systems keep most recently used "pages" of memory in main memory, put the rest on disk Virtual memory • programs don't know whether accessing main memory or a page on secondary memory page (most operating systems) Database systems usually take explicit control over 2ndary memory access 340151 Big Data & Cloud Computing (P. Baumann) 8
Where Databases Reside Hard Disk is secondary storage device of choice • Many flavors: Disk: Floppy (hard, soft); Winchester; Ram disks; Optical, CD−ROM; Arrays Main advantage over tapes: random access vs. sequential Data stored and retrieved in units called disk blocks or pages Unlike RAM, time to retrieve a disk page varies depending upon location on disk • relative placement of pages on disk has major impact on DBMS performance! 340151 Big Data & Cloud Computing (P. Baumann) 9
The Miracle Called "Hard Disk" Disk head contains magnet, hovering over spinning platter flight height: 10-20 nm (x 5,000 gives one hair!) 340151 Big Data & Cloud Computing (P. Baumann) 10
Components of a Disk platters spin arm assembly moves in or out to position head on desired track Tracks under heads = a cylinder (imaginary!) Sector size = N * block size (fixed) ...typical numbers? 340151 Big Data & Cloud Computing (P. Baumann) 11
Typical Numbers Diameter: 1 inch ...15 inches Cylinders: 40 (floppy) ... 20,000 Surfaces: 1 (old CDs) ... 2 (floppies) ... 30 Sector Size: 512 B ... 50 kB Capacity: 360 kB (old floppy) ... 4 TB 340151 Big Data & Cloud Computing (P. Baumann) 12
Disk Access Time I want block X block X in memory ? 340151 Big Data & Cloud Computing (P. Baumann) 13
Disk Access Time Time = Seek Time + Rotational Delay + Transfer Time + Other 340151 Big Data & Cloud Computing (P. Baumann) 14
Seek Time Time = Seek Time + Rotational Delay + Transfer Time + Other 340151 Big Data & Cloud Computing (P. Baumann) 15
Average Random Seek Time Time = Seek Time + Rotational Delay + Transfer Time + Other Typical S: 10 ms ...40 ms = millions of times RAM access ! 340151 Big Data & Cloud Computing (P. Baumann) 16
Average Rotational Delay Time = Seek Time + Rotational Delay + Transfer Time + Other R = 1/2 revolution typical R = 4.16 ms (7,200 RPM) 340151 Big Data & Cloud Computing (P. Baumann) 17
Transfer Rate Time = Seek Time + Transfer rate: t Rotational Delay + Transfer Time + Other • typical t: 10 ... 50 MB/second transfer time T: block size T = --------------- t Ex: block size 32 kB, t = 32 MB/second transfer time = …? 340151 Big Data & Cloud Computing (P. Baumann) 18
Other Delays Time = Seek Time + CPU time to issue I/O Rotational Delay + Transfer Time + Other Contention for controller Contention for bus, memory Typical Value: 0 (relative to other values) 340151 Big Data & Cloud Computing (P. Baumann) 19
Sequential Read? So far: Random Block Access What about: Reading next block? Disks optimized towards "consecutive" reading! • Blocks within track • Tracks within cylinder • Next cylinder 340151 Big Data & Cloud Computing (P. Baumann) 20
"Next Block" Costs `Next’ block concept: • blocks on same track, followed by • blocks on same cylinder, followed by • blocks on adjacent cylinder If we don’t need to change cylinder: Block Size Time to get = ---------------- + Negligible block t • + switch track (ie, read next arm) • + once in a while, next cylinder 340151 Big Data & Cloud Computing (P. Baumann) 21
Random vs Sequential Read Rule of Thumb: • Random I/O: Expensive • Sequential I/O: Less expensive Ex: 1 KB Block: • Random I/O: ~ 20 ms • Sequential I/O: ~ 1 ms relative difference is smaller for larger blocks Whenever possible arrange file blocks sequentially on disk (by `next’) to minimize seek and rotational delay • For sequential scan, pre-fetching several pages at a time is a big win! “burst read” 340151 Big Data & Cloud Computing (P. Baumann) 22
...Writing? Cost for Writing cost for Reading ... unless we want to verify! • Then, need to add Block size ---------------- + (full) rotation t 340151 Big Data & Cloud Computing (P. Baumann) 23
...To Modify a Block? (a) Read Block (b) Modify in Memory (c) Write Block [ (d) Verify ] 340151 Big Data & Cloud Computing (P. Baumann) 24
Wrap-Up Capacities grow, data hunger grows larger • Moore's Law vs Greg's Law vs disk growth Databases heavily i/o bound • Disk space management largely determines performance Disk access time = Seek Time + Rotational Delay + Transfer Time + Other 340151 Big Data & Cloud Computing (P. Baumann) 25
Recommend
More recommend