Data Management Systems • Storage Management • Basic principles • Memory hierarchy • Blocks instead of pages • Segments and file storage • Tablespaces, segments, extents • Database buffer cache • Updates, free space • Storage techniques in context Gustavo Alonso Institute of Computing Platforms Department of Computer Science ETH Zürich Storage - Segments and File storage 1
Storage in databases • Remember the two key guarantees provided by a database in regard to storage • Data is persistent • Data is recoverable • Even when failures occur!! • Add as well the following property: • Physical data independence • Taken together these three aspects play a big role on how databases store data to maintain these properties while still achieving the necessary performance. Storage - Segments and File storage 2
Storage Management Relations, views Application Queries, Transactions (SQL) Logical data (tables, schemas) Logical view (logical data) Record Interface Logical records (tuples) Access Paths Record Access Physical records Physical data in memory Page access Page structure Pages in memory File Access Storage allocation Blocks, files, segments Physical storage 3 Storage - Segments and File storage
Disclaimers • As usual: • Standard database architectures based on slow, hard drive disks (HDD) with a high latency for seek operations (random access) • Assuming most of the database is not in memory • Today, some things are different: • More main memory available • Different storage media (SSD, NVM, network attached storage) • Principles remain the same and illustrate the underlying problem rather than a particular implementation Storage - Segments and File storage 4
Explanations using a real system • Many of the explanations that follow are based on Oracle’s database • Cover all the basics • Provide a good example of how a real system works • Demonstrate the many tuning knobs available in a database • Prove why some people consider databases too complicated (more about this later in the course) • Ideas and concepts are nevertheless generic and they are somewhat similar across systems • Logical storage: • https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-4AF2D61A-8675-4D48-97A4-B20F401ADA16 • https://docs.oracle.com/cd/B19306_01/server.102/b14220/logical.htm • Disk storage: • https://docs.oracle.com/cd/B19306_01/server.102/b14220/physical.htm • https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/physical-storage- structures.html#GUID-008A1F08-9C75-4E9F-A70B-41FB942C60B4 Storage - Segments and File storage 5
Problem statement • A database is doing many things at the same time (in the same way an Operating System is managing many different user processes at the same time) • Each “thing” (a query, a system process, a database component) active at any point in time needs its own logical view of the data (and correspondingly, of memory, and disk). This is the same as the OS giving a process the impression it is alone in the machine. • A database engine creates such virtual, logical views of the system using its own mechanisms (and different from the OS) Storage - Segments and File storage 6
Logical and Physical storage in Oracle 19 • Entity-relationship diagram (crow’s foot implies one -to-many) • Logical: • Tablespaces • Segments • Extent • Block • Physical: • Data File https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- • OS block structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D Storage - Segments and File storage 7
Tablespaces • A tablespace is a logical data unit in the database: • Schema related: • A table • An index • Several tables (clustered tables) • Engine related: • Data structures for the database engine (result buffers, undo buffers, etc.) • A tablespace provides a logical representation of the principle of spatial locality (keep together what belongs together) • Does not necessarily mean all data is continuous • It means all the information and all the data of a tablespace is under the same umbrella • Space (memory/disk) is allocated to tablespaces Storage - Segments and File storage 8
Segments, Extents, and Blocks (example) https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D Storage - Segments and File storage 9
Generalizing • A given logical object in a database (a table, an index) is “stored” in a tablespace • A tablespace is organized into segments (each schema object has a segment) • Segments have space allocated to them in the form of extents. A segment can have several extents • Extents are sets of contiguously allocated data blocks. Extents are mapped to one data file (and typically to a file) • Data blocks are the smallest allocation unit of space (not necessarily a page, databases typically use blocks larger than an OS page). The size of the blocks is a tunable parameter. Storage - Segments and File storage 10
Translating • Tablespace = keep together what needs to be kept together • Segment = an object in the schema (or part of an object if it is partitioned) • Extent = groups of continuously allocated pages • Blocks = glorified pages • Why so complicated?: • Tablespaces = logical locality • Segments = allocate (virtual) space to objects • Extents = allocate contiguous physical space • Blocks = the unit of space allocation Storage - Segments and File storage 11
More explanations and analogies • Tablespaces provide a logical unit to refer to the storage of well defined entities (a table, an index, a buffer, etc.) • Virtual space is allocated to a Tablespace in a segment. A segment acts as a form of virtual memory where everything belonging to the same entity appears continuous and can be treated as a single unit • Actual space is allocated to a segment through extents. Extents are sets of blocks that are physically contiguous on storage and can be allocated as a whole • The space in a segment is divided into blocks which typically correspond to several OS pages. Storage - Segments and File storage 12
Segments I • A segment allocates the equivalent of virtual memory to a tablespace • Common setup: • Tablespace -> table • Tablespace has one segment • Segment has one or more extents • These structures, like all examples that follow, are created to provide physical data independence. Higher layers only need to know in which tablespace a table is. This allows the engine to change everything below by simply changing the pointers to the segment, to the extents, etc. Storage - Segments and File storage 13
Segments II • The mapping of a segment to a tablespace simplifies the manipulation of database entities: • CREATE TABLE T • DROP TABLE T • CREATE INDEX I ON table_name (attributes) • DROP INDEX I • If we need more space for a table or an index, the table is still referred to by its segment, but the segment indexes as many extents as needed. • If we partition a table, each partition has its own segment but the table is still referred to by its tablespace Storage - Segments and File storage 14
Extents The big square is a data file! • Extents provide optimal access by ensuring logically continuous allocation of blocks (blocks can end up non continuous on disk because of RAID, for instance) • Acts as a form of virtual address space: • Allocate and release blocks to the extent Space available in the data file • Easy sequential search • Can be dropped as an unit https://docs.oracle.com/en/database/oracle/oracle-database/19/cncpt/logical-storage- structures.html#GUID-13CE5EDA-8C66-4CA0-87B5-4069215A368D Storage - Segments and File storage 15
Dynamic extents The big square is a data file! • When a segment is created, it is allocated an extent (how big? => tunable parameter) • If more space is needed, another extent is created • The new extent is not contiguous with the previous one and can be on a different data file Storage - Segments and File storage 16
Why extents? • Better than the alternatives: Dynamic block mapping Static file mapping Dynamic extents Segment (starting address, size) Easy to maintain Highly efficient (performance) Difficult to maintain Poor utilization (space) Non contiguous No flexibility Poor performance Maximum flexibility Storage - Segments and File storage 17
Why extents? • Databases must optimize along many dimensions. • A static file mapping is very easy to manage but induces fragmentation and provides no flexibility • Dynamic block mapping is extremely flexible but data is not contiguous and is expensive to maintain • Extents are a compromise: • An extent provides a static mapping to a set of blocks (like a static file mapping) • When more space is needed, extents are dynamically allocated (like with dynamic block mapping) Storage - Segments and File storage 18
Recommend
More recommend