Data Management Systems • Storage Management • Introduction and motivation • Memory hierarchy • Segments and file storage • Database buffer cache • Storage techniques in context Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Storage - Introduction 1
Architecture of a database Relations, views Application Queries, Transactions (SQL) Logical data (tables, schemas) Logical view (logical data) Record Interface Logical records (tuples) Access Paths Record Access Physical records Physical data in memory Page access Page structure Pages in memory File Access Storage allocation Blocks, files, segments Physical storage Storage - Introduction 2
Storage Management Relations, views Application Queries, Transactions (SQL) Logical data (tables, schemas) Logical view (logical data) Record Interface Logical records (tuples) Access Paths Record Access Physical records Physical data in memory Page access Page structure Pages in memory File Access Storage allocation Blocks, files, segments Physical storage Storage - Introduction 3
Background • Databases are known for providing, among others, two key guarantees: • Persistence = the data will be recorded in permanent media and will be there after re-starting the database • Recovery = the data will be consistent regardless of what failures occur (might involve a recovery procedure) • These guarantees make application development significantly easier • Often a key element when deciding to use a database • In this series of lectures, we will explore how databases manage storage Storage - Introduction 4
Motivation • A big part of the performance of databases arises from • Proper storage management • Adequate data representations • Suitable optimizations on organizing the data in memory • We will focus on fundamental concepts that are important to know regardless of the type of system used as they illustrate the many trade-offs involved when managing data • We will point out the differences between conventional relational engines and more modern approaches as we go along Storage - Introduction 5
A bit of history • Database engines have been around for decades • Initial designs and concerns centered around disk I/O • Main memory was comparatively small • Disks were slow and bandwidth scarce • This concern can still be seen in many systems • Ideas still apply even if the database engine is not running directly on a hard disk (e.g., cloud computing) Storage - Introduction 6
Have I seen this before? • Many of the concepts to study resemble similar ideas from operating systems (virtual memory, page swapping, etc.) • The fundamental difference: • An operating system is a general purpose component oblivious to the nature, intentions, and goals of the applications and data it manages. • It provides generic services and arbitrates among competing demands • A database understand very well the nature of the code, what it is trying to do, and the goals to achieve. It can use such a knowledge to optimize the system in ways the operating system could never do • Many years ago, databases would work on raw devices to avoid the OS. Today, they use the OS in a controlled manner. Storage - Introduction 7
Required background • What you need to know in advance: • SQL and basics of databases • Memory management at the operating system level (virtual memory) Storage - Introduction 8
Recommend
More recommend