2 management of large objects
play

2. Management of large objects LOB = Large OBject Normal DBMS - PowerPoint PPT Presentation

2. Management of large objects LOB = Large OBject Normal DBMS regards a LOB as one field with no internal structure Traditional business-oriented relational DBMSs: Maximum field length e.g. 255 or 32767 bytes Media objects are


  1. 2. Management of large objects � LOB = Large OBject � ‘Normal’ DBMS regards a LOB as one field with no internal structure � Traditional business-oriented relational DBMSs: Maximum field length e.g. 255 or 32767 bytes � Media objects are usually considerably larger � Today’s relational DBMSs support field lengths of several Gbytes, but - Wasteful to access the whole object if only a piece is needed. - The long object may not fit in the main memory. - Piecewise processing should be supported. - The logical structure is handled by higher-level software. - A log file is needed for recovery from errors. Logging a whole object is very ineffective, if only a small part of it is affected. - Secondary storage management should be more flexible: Multiple page sizes or multiple-size clusters of pages would enhance the I/O for variable-length objects. MMDB-2 J. Teuhola 2012 19

  2. SQL and long fields Long data types: � Character large object ( CLOB ), content e.g. HTML, XML � Binary large object ( BLOB ), sequence of 8-bit octets , content e.g. MP3 or JPEG � External, read-only file ( BFILE ), content e.g. AVI, MPEG Operations: � Concatenation � Substring (from a start position for a given length) � Overlay (substring replacement) � Trim (remove given leading/trailing characters) � Length (function returning the number of characters) � Position (start position of searched substring) � But: Not GROUP BY, ORDER BY, join, set operations, etc.) MMDB-2 J. Teuhola 2012 20

  3. Tree-structured representation � B-tree-type multi-level directory: Used e.g. in SQL Server, Oracle, … � Example architecture: EXODUS storage system (extensible OODBMS) � Very flexible management of large objects that can grow and shrink at arbitrary positions. � Not optimized for sequential processing speed (best for long text doc.) � Each object has a unique OID = <page no, slot no> � Two kinds of objects: (1) Small objects fit in one page. (2) Large objects occupy multiple pages, OID points to the header . � Two kinds of pages: (1) Slotted pages contain small objects & headers of large objects. (2) Other pages contain parts of large objects, each page being private to one object, only. � When a small object grows larger than a page, it is converted automatically into a large object. MMDB-2 J. Teuhola 2012 21

  4. Page allocation schematically Slotted pages LOB pages Pages of Small LOB x Small LOB x obj header obj Small Small Small obj obj obj Small LOB y Small obj header obj … Pages of LOB y free space MMDB-2 J. Teuhola 2012 22

  5. Tree-structured representation (cont.) � Physical representation: B + -tree, indexed on byte positions within the object. � Root is a header for the large object � Internal nodes : <count, pointer> pair for each child. - Count means the highest relative byte number (= offset within subtree) rooted at that node. - Pointer means page id (address). The count of the rightmost child is the size of the (sub)tree rooted by the current node. The number of <count, pointer> pairs in a node is between k and 2 k +1 (i.e. nodes are at least about half-full) where degree k is the B + -tree parameter. Internal nodes occupy one page, each. � Leaves are blocks of one or more pages (system parameter). Leaf blocks contain nothing else but actual data. Also leaves can vary from half-full to full. MMDB-2 J. Teuhola 2012 23

  6. Tree-structured representation: Example OID 421 786 120 282 421 192 365 120 bytes 162 bytes 139 bytes 192 bytes 173 bytes � Maximal object sizes for 4Kbyte pages, 4-byte pointers, 4-byte counts and 4-page leaf blocks: - 2-level tree: 8 Mbytes - 3-level tree: 4 Gbytes MMDB-2 J. Teuhola 2012 24

  7. Tree-structured representation (cont.) Notations: � Counts: c[i], pointers p[i], 1 ≤ i ≤ 2k+1. � For convenience, c[0] = 0 Retrieval algorithm: Get a sequence of N bytes, starting at S. begin Read the root page P. Let start = S. while P is a non-leaf node do Save P to a stack Find the smallest c[i] such that start ≤ c[i]. // e.g. binary search Set start := start − c[i-1]. // relative start index Read p[i] as the new page P. The first desired byte is at location start in P. // being in a leaf For the rest of the bytes, walk the tree in depth-first order using the stack. end MMDB-2 J. Teuhola 2012 25

  8. Tree-structured representation (cont.) Insert algorithm: Add a sequence of N bytes after position S. begin Search byte position S, as above, but on the path down, update the byte counts to reflect the insertion and save the path in a stack. Denote the reached leaf by L. if N bytes fit in L then do the insert within L else Allocate a sufficient number of new leaves, and distribute L’s old bytes and the N new bytes evenly among the leaves. Propagate the new counts and pointers upwards (use the stack) If an internal node overflows, it is handled in a similar way as the leaf overflow. end Note : Space utilization can be improved by inspecting the left and right neighbours of the found leaf, and using the available free space. MMDB-2 J. Teuhola 2012 26

  9. Tree-structured representation (cont.) Append algorithm: Add N bytes to the end of an object. (Special case of insert) begin Walk the rightmost path of the tree, add N to the counts, and save the path in a stack. if the rightmost leaf R has N free bytes then do the appending there, and stop else Access R’s left neighbour L. Allocate as many new leaves as required to accommodate L’s and R’s bytes plus the N new ones. Fill all but the last two pages completely, and the last two evenly (both become at least half-full). Propagate the counts and pointers upwards, using the stack. Handle internal node overflows as in insert. end Note : The advantage of this special insert is that it allows large objects to be built in pieces. The next piece fills the last two non-full leaves. MMDB-2 J. Teuhola 2012 27

  10. Tree-structured representation: Observations The organization is quite effective in practice: � Storage utilization is 70% for simple and 80% for advanced insertion. � Complexity of locating the correct position theoretically O(log N), in pratice almost constant. � Access speed is some tens of milliseconds, depending on disk speed and buffering. Not the best choice for streaming media . Extension: Versioning of large objects � Common parts of different versions can be shared. � Updates must not invalidate old versions: nodes on the update path must be copied for changing. � Old versions are not updated, but deletion should be allowed: Avoid deleting nodes shared by other versions. Expensive way: Mark nodes of all other versions, and then discard the unmarked ones. MMDB-2 J. Teuhola 2012 28

  11. Advanced 2-level representation � Example architecture: Starburst long field manager . (Experimental DBMS, developed at IBM research center.) � Suggests an elegant and extremely fast 2-level scheme for long fields. � Key idea: Build the field by allocating variable-size (with size units of exponential scale), physically contiguous disk extents. � Not arbitrary sizes, nor arbitrary starting points. Buddy system : � In a buddy space of 2 n pages, buddy segments can be allocated, so that a segment of size 2 k can start at address 0, 2 k , 2 × 2 k , 3 × 2 k , … � Two same-sized (2 k ) consecutive segments are buddies , if their concatenation is a legal buddy segment of size 2 k +1 . � The address of a segment XORed with its size gives the address of its buddy. � Advantage: Shorter pointers, because the repertoire of segment sizes is restricted. MMDB-2 J. Teuhola 2012 29

  12. Memory architecture in Starburst � The whole external memory is divided into database spaces , that may correspond to e.g. separate disks. � Each database space contains an array of buddy spaces. � A buddy space consists of � An allocation page (specially coded segment index) � 2 n data pages (buddies marked): 2 n 2 n -2 5·2 n -4 3 · 2 n -3 2 n -1 � Fragmentation (normal problem of buddy system) is partially avoided because the long field can be built from several segments. � For any long field, less than one disk page is lost due to fragmentation. MMDB-2 J. Teuhola 2012 30

  13. Long field descriptor in Starburst � The descriptor is a directory to the field components. � The descriptor size is at most 255 bytes and stored in the record where the long field logically belongs to. � The descriptor components: - Database space id - Field size - Number of buddy segments - Sizes of the first and last segment - Pointers (= offsets) to the buddy segments � The key solution to keep the field descriptor small is to have exponentially growing segment sizes. MMDB-2 J. Teuhola 2012 31

  14. Descriptor usage: schematic example ‘ Person’ table PID Name Addr Photo Segments storing the photo 12345 Smith NYC 23456 Jones Dallas 11223 Blake Miami 33211 Brown LA 54321 Clark Denver LOB descriptor, max 255 bytes MMDB-2 J. Teuhola 2012 32

Recommend


More recommend