Storage Torsten Grust Chapter 2 Storage Disks, Buffer Manager, Files. . . Magnetic Disks Access Time Sequential vs. Random Architecture and Implementation of Database Systems Access I/O Parallelism Summer 2016 RAID Levels 1, 0, and 5 Alternative Storage Techniques Solid-State Disks Network-Based Storage Managing Space Free Space Management Buffer Manager Pinning and Unpinning Replacement Policies Databases vs. Operating Systems Files and Records Heap Files Torsten Grust Free Space Management Inside a Page Wilhelm-Schickard-Institut für Informatik Alternative Page Layouts Universität Tübingen Recap 1
Storage Database Architecture Torsten Grust Applications SQL Interface Web Forms SQL Commands Magnetic Disks Access Time Executor Parser Sequential vs. Random Access Operator Evaluator Optimizer I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage Techniques Files and Access Methods Transaction Solid-State Disks Manager Network-Based Storage Recovery Buffer Manager Managing Space Manager Free Space Management Lock Buffer Manager Manager Disk Space Manager Pinning and Unpinning Replacement Policies DBMS Databases vs. Operating Systems Files and Records data files, indices, . . . Database Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 2
Storage The Memory Hierarchy Torsten Grust capacity latency CPU (with bytes < 1 ns registers) Magnetic Disks caches kilo-/megabytes < 10 ns Access Time Sequential vs. Random Access main memory gigabytes 70–100 ns I/O Parallelism RAID Levels 1, 0, and 5 hard disks terabytes 3–10 ms Alternative Storage Techniques Solid-State Disks tape library petabytes varies Network-Based Storage Managing Space Free Space Management Buffer Manager • Fast—but expensive and small—memory close to CPU Pinning and Unpinning Replacement Policies • Larger, slower memory at the periphery Databases vs. • DBMSs try to hide latency by using the fast memory as a Operating Systems Files and Records cache . Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 3
Storage Magnetic Disks Torsten Grust rotation arm track Magnetic Disks Access Time Sequential vs. Random Access I/O Parallelism RAID Levels 1, 0, and 5 block sector Alternative Storage Techniques platter heads Solid-State Disks Network-Based Storage Managing Space • A stepper motor positions an array of Photo: http://www.metallurgy.utah.edu/ Free Space Management Buffer Manager disk heads on the requested track Pinning and Unpinning • Platters (disks) steadily rotate Replacement Policies Databases vs. • Disks are managed in blocks: the system Operating Systems Files and Records reads/writes data one block at a time Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 4
Storage Access Time Torsten Grust Data blocks can only be read and written if disk heads and platters are positioned accordingly. Magnetic Disks • This design has implications on the access time to Access Time Sequential vs. Random read/write a given block: Access I/O Parallelism RAID Levels 1, 0, and 5 Definition (Access Time) Alternative Storage Techniques 1 Move disk arms to desired track ( seek time t s ) Solid-State Disks Network-Based Storage 2 Disk controller waits for desired block to rotate under Managing Space disk head ( rotational delay t r ) Free Space Management 3 Read/write data ( transfer time t tr ) Buffer Manager Pinning and Unpinning Replacement Policies Databases vs. ⇒ access time: t = t s + t r + t tr Operating Systems Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 5
Storage Example: Seagate Cheetah 15K.7 Torsten Grust (600 GB, server-class drive) • Seagate Cheetah 15K.7 performance characteristics: • 4 disks, 8 heads, avg. 512 kB/track, 600 GB capacity • rotational speed: 15 000 rpm (revolutions per minute) Magnetic Disks • average seek time: 3.4 ms Access Time Sequential vs. Random • transfer rate ≈ 163 MB/s Access I/O Parallelism RAID Levels 1, 0, and 5 ✛ What is the access time to read an 8 KB data block? Alternative Storage Techniques Solid-State Disks Network-Based Storage Managing Space Free Space Management Buffer Manager Pinning and Unpinning Replacement Policies Databases vs. Operating Systems Files and Records Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 6
Storage Sequential vs. Random Access Torsten Grust Example (Read 1 000 blocks of size 8 kB) • random access: t rnd = 1 000 · 5 . 45 ms = 5 . 45 s Magnetic Disks Access Time • sequential read of adjacent blocks: Sequential vs. Random Access t seq = t s + t r + 1 000 · t tr + 16 · t s , track-to-track I/O Parallelism RAID Levels 1, 0, and 5 = 3 . 40 ms + 2 . 00 ms + 50 ms + 3 . 2 ms ≈ 58 . 6 ms Alternative Storage Techniques The Seagate Cheetah 15K.7 stores an average of 512 kB per Solid-State Disks track, with a 0.2 ms track-to-track seek time; our 8 kB Network-Based Storage Managing Space blocks are spread across 16 tracks. Free Space Management Buffer Manager Pinning and Unpinning ⇒ Sequential I/O is much faster than random I/O Replacement Policies Databases vs. ⇒ Avoid random I/O whenever possible Operating Systems Files and Records 58 . 6 ms ⇒ As soon as we need at least 5,450 ms = 1 . 07 % of a file, � Heap Files Free Space Management we better read the entire file sequentially Inside a Page Alternative Page Layouts Recap 7
Storage Performance Tricks Torsten Grust • Disk manufacturers play a number of tricks to improve performance: track skewing Magnetic Disks Align sector 0 of each track to avoid Access Time Sequential vs. Random rotational delay during longer Access sequential scans I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage Techniques request scheduling Solid-State Disks Network-Based Storage If multiple requests have to be served, choose the one Managing Space that requires the smallest arm movement (SPTF: Free Space Management shortest positioning time first, elevator algorithms) Buffer Manager Pinning and Unpinning Replacement Policies Databases vs. zoning Operating Systems Files and Records Outer tracks are longer than the inner ones. Therefore, Heap Files divide outer tracks into more sectors than inner tracks Free Space Management Inside a Page Alternative Page Layouts Recap 8
Storage Evolution of Hard Disk Technology Torsten Grust Disk seek and rotational latencies have only marginally improved over the last years ( ≈ 10 % per year) Magnetic Disks But: Access Time • Throughput (i.e., transfer rates) improve by ≈ 50 % per year Sequential vs. Random Access • Hard disk capacity grows by ≈ 50 % every year I/O Parallelism RAID Levels 1, 0, and 5 Alternative Storage Techniques Solid-State Disks Therefore: Network-Based Storage • Random access cost hurts even more as time progresses Managing Space Free Space Management Buffer Manager Pinning and Unpinning Example (5 Years Ago: Seagate Barracuda 7200.7) Replacement Policies Databases vs. Read 1K blocks of 8 kB sequentially/randomly: 397 ms / Operating Systems Files and Records 12 800 ms Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 9
Storage Ways to Improve I/O Performance Torsten Grust The latency penalty is hard to avoid But: • Throughput can be increased rather easily by exploiting Magnetic Disks Access Time parallelism Sequential vs. Random Access • Idea: Use multiple disks and access them in parallel, try to I/O Parallelism RAID Levels 1, 0, and 5 hide latency Alternative Storage Techniques Solid-State Disks TPC-C: An industry benchmark for OLTP Network-Based Storage Managing Space A recent #1 system (IBM DB2 9.5 on AIX) uses Free Space Management • 10,992 disk drives (73.4 GB each, 15,000 rpm) (!) Buffer Manager Pinning and Unpinning Replacement Policies plus 8 146.8 GB internal SCSI drives, Databases vs. • connected with 68 4 Gbit fibre channel adapters, Operating Systems Files and Records • yielding 6 mio transactions per minute Heap Files Free Space Management Inside a Page Alternative Page Layouts Recap 10
Storage Disk Mirroring Torsten Grust • Replicate data onto multiple disks: 1 2 3 4 5 Magnetic Disks Access Time 6 7 8 9 · · · Sequential vs. Random Access 1 2 3 4 5 I/O Parallelism 6 7 8 9 · · · RAID Levels 1, 0, and 5 Alternative Storage 1 2 3 4 5 Techniques Solid-State Disks 6 7 8 9 · · · Network-Based Storage Managing Space Free Space Management Buffer Manager • Achieves I/O parallelism only for reads Pinning and Unpinning Replacement Policies • Improved failure tolerance—can survive one disk failure Databases vs. Operating Systems • This is also known as RAID 1 (mirroring without parity) Files and Records Heap Files ( RAID: Redundant Array of Inexpensive Disks) Free Space Management Inside a Page Alternative Page Layouts Recap 11
Recommend
More recommend