EXODUS Extensible DBMS ● EX tensible O bject-oriented D atabase S ystem ● University of Wisconsin ● Efficient support of non-traditional applications – Engineering applications (CAD/CAM) – Scientific and statistical applications – Image and voice (e.g. Satellite images) ● Need new data types and operations to support new application domains efficiently
New vs. Conventional Apps ● Each require different set of data modeling tools – E.g. VLSI circuit designs require different entities and relationships from a banking application ● Each require a special set of operations – E.g. Satellite images can't be joined together – Must be efficiently supported ● This could mean new structures and access methods ● Some might require multiple versions of entities – E.g. PROBE
DBMS for New Applications ● POSTGRES (Berkeley) – Predefined way of support complex objects ● Using POSTQUEL and procedures as data types – Make as few changes to the relational model as possible ● PROBE (CCA) – Mechanism for directly representing complex objects – Rule-based approach to optimization ● Optimizers extended to handle new operators – New methods for existing operators ● Both “complete” DBMS systems
DBMS for New Applications - 2 ● GENESIS (U. Texas) – A modular (and modifiable) system – Extended for new applications ● No “complete” support designed in advance ● EXODUS uses the same methods – A collection of kernel DBMS facilities – Software tools for semi-automatic creation of high- performance application-specific DBMSs for new areas.
EXODUS ● A Toolbox Not a complete DBMS – Can be easily adapted by new applications – A third group of users: database implementors (DBIs) ● Start with a generic solution applicable anywhere – E.g. Support arbitrary size storage objects ● Provide a generator or a library to aid in generating application-specific portions ● Better to see it from the viewpoint of the of the application-specific system that is built using it
EXODUS Architecture 1. The Storage Object Manager 2. E programming language and compiler 3. A generalized Type Manager 4. A library of independent access methods 5. A lock manager and recovery protocol stubs 6. A rule-based query optimizer and compiler 7. Tools for constructing user An implemented DBMS front ends
Storage Objects ● Basic unit of data in the Storage Object Manager – A byte sequence of arbitrary size – untyped, uninterpreted, variable-length – Object Identifier (OID): (page #, slot #) ● Two types of storage objects (internally) – Small storage objects ● Single page, OID points to the object ● Automatically converted into large objects – Large storage objects ● Multiple pages, OID points to the large object header
Large Storage Objects ● Representation – Conceptually an uninterpreted byte sequence – Physically a B+ tree like index on byte position within the object and a collection of leaf blocks ● Disk location – Headers can reside on slotted page with other headers / small objects – Other pages are private (but ● Primitive versioning included can be shared with other (Must support different versioning) ● Only pages that differ are copied versions – if used)
Storage Object Manager ● Read, write and update storage objects – Built-in search, insert, append, delete algorithms – Automatically converts small storage objects to large objects when they can't fit on a page – Can implement application-specific versioning ● Provides locking, buffering and recovery protocols – E.g. Read non-empty portions of the leaf blocks of the desired byte range into a variable length buffer block
File Objects ● Collections of storage objects ● Used to group objects together – Read related objects in sequence (in physical order) – Related objects can be co-located on disk ● Have an OID like large storage objects – Objects can be accessed directly as well ● A B+ tree-like index structure – Use disk page number as the key – Leaf pages contain page numbers ● Standard disk allocation for pages themselves
E Programming Language ● Used for all components that a DBI deals with ● Extends C to support “persistent objects” – Correspond to storage objects – References are similar to those of C structures ● DBI can deal with array of key-pointer pairs – The E translator deals with the internal structure of persistent objects (e.g. lock/unlock, log) not the DBI ● DBI can deliberately exercise control – E supports statements to associate locking, buffering, recovery with references to persistent objects
E Programming Language (cont'd) ● Other additions to C – OID data type (for storage object Ids) – Parameterized types – Addition of “type” as a valid parameter data type for E procedures (only!) ● To allow access methods to use multiple data types – Type constructors to define fields of persistent objects – E allows the DBI to manipulate the internal structure of storage objects ● Not a database programming language! – E is to develop internal system software
Access Methods ● Associative access to file of storage objects – Further support for versioning if needed ● A library of type-independent index structures – B+ tree, Grid files, Linear hashing, etc. – Implemented using the “type parameter” property in E ● Use existing access methods with DBI-defined abstract data types without modifications – As long as access method requirements are satisfied ● Can easily implement new access methods – Don't have to deal with main memory data structures
Operator Methods ● A collections of methods and their combination (as E procedures) to operate on storage objects – Schema-independent (necessary schema information requested at run-time or compiled by the optimizer) ● Contains code by both the DBI and EXODUS – EXODUS provides code for operators that operate on a single type of storage object (e.g. Selection) ● Does not provide application (or data model) specific methods (e.g. Relational join, examining images) – The DBI may implement one or more methods for each operator in the target query language ● Can be schema-dependent (Hire-employee, change-job)
The Type Manager ● Schema support for application-specific systems – Handle wide range of application efficiently ● Class hierarchy with multiple inheritance – Base types (integer, char, object ID, etc.) – Constructed types (record, array, set and bag) – DBI can define new base types and operations ● Using abstract data types ● One-to-one mapping between class instances (typed objects) and storage objects – A class of typed objects can include fields with large multidimensional arrays of real numbers
Type Manager Class Hierarchy ● Loose hierarchy – Classes can inherit one or more classes ● If field names are the same choose one or rename ● Meta-class Class contains inheritance information ● All classes are subclasses or class Object – Including “Class” ● Files can contain objects of only one class – But this can be the Object class!
Query optimizer and compiler ● Query execution in EXODUS (similar to system R) Parse Optimize Compile as Executable ● The parser transform the query to an initial tree – Logical operators as internal nodes, relations as leaves ● The optimizer creates an access plan – A rearranged tree of operator methods (particular instances of operators) ● Methods as internal nodes, files/indices as leaves The Type Manager is invoked during parsing and optimization ●
Query optimizer (the generator) ● A generator that produces an optimizer for an application-specific database system ● The DBI must supply – A description of the operators of the target query language – A list of methods to implement each operator – A cost formula for each operator method – A collection of transformation rules ● The generator transforms these description files into C code for the target query language optimizer
The optimization procedure ● Uses two principal data structures – MESH: A directed graph of alternative operator trees ● Initially the tree of the original query – OPEN: A priority queue of the applicable transformations ordered by the expected cost decrease of transformation ● Select lowest cost method for each node in MESH ● Find possible transformations and insert into OPEN ● Repeat until OPEN is empty, then apply the most promising transformation to MESH ● Reuse equal nodes (same operator, argument and inputs)
CC and Recovery ● Based on Wekum's layered transaction model: – Each layer presents a set of objects and associated operations (aka mini-transactions) to client layers ● Each transaction in layer is a series of mini-transactions in one or more of its servant layers – Two-phase locking on objects within a given layer ● Objects in the servant layer are locked on behalf of transaction in the client layer (held until it completes) – Level-specific recovery information is logged ● When the mini-transaction completes, log is replaced with a simpler client-level representation of the entire operation ● First need to undo the last incomplete mini-transactions, then run the inverse of each completed mini-transaction
Recommend
More recommend