database system implementation
play

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // - PowerPoint PPT Presentation

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #3: STORAGE MODELS 2 LAST CLASS Implications of availability of large DRAM chips for database systems Buffer Management Concurrency Control Logging


  1. DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #3: STORAGE MODELS

  2. 2 LAST CLASS Implications of availability of large DRAM chips for database systems → Buffer Management → Concurrency Control → Logging and Recovery → Query Processing How do these components fit together? How does a SQL query get executed within the system?

  3. 3 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System

  4. 4 ANATOMY OF A DATABASE SYSTEM Process Manager Connection Manager + Admission Control Query Parser Query Processor Query Optimizer Query Executor Query Lock Manager (Concurrency Control) Transactional Access Methods (or Indexes) Storage Manager Buffer Pool Manager Log Manager Shared Utilities Memory Manager + Disk Manager Networking Manager Source: Anatomy of a Database System

  5. 5 ANATOMY OF A DATABASE SYSTEM Process Manager → Manages client connections Query Processor → Parse, plan and execute queries on top of storage manager Transactional Storage Manager → Knits together buffer management, concurrency control, logging and recovery Shared Utilities → Manage hardware resources across threads

  6. 6 TODAY’S AGENDA Field Storage Format (Type Representation) Tuple Storage Format Table Storage Format (Storage Models)

  7. 7 DATA ORGANIZATION Index Fixed-Length Variable-Length Data Blocks Data Blocks Block Id + Offset

  8. 8 DATA ORGANIZATION One can think of an in-memory database as just a large array of bytes. → The schema tells the DBMS how to convert the bytes into the appropriate type (e.g., INTEGER , DATE ). → Each tuple is prefixed with a header that contains meta- data (e.g., last modified time-stamp). Storing tuples with as fixed-length data makes it easy to compute the starting point of any tuple. → No tuple indirection array as in the case of slotted pages with variable-length tuples in disk-oriented systems

  9. 9 MEMORY PAGES OS maps physical pages to virtual memory pages. The CPU's MMU maintains a TLB that contains the physical address of a virtual memory page. → The TLB resides in the CPU caches. → It can't obviously store every possible every possible entry for a large memory machine. When you allocate a block, the memory allocator keeps that it aligned to page boundaries to reduce memory fragmentation (e.g., glibc malloc ).

  10. 10 TRANSPARENT HUGE PAGES (THP) Maintain larger pages automatically (2MB to 1GB) → Each page has to be a contiguous blocks of memory. → Greatly reduces the # of TLB entries (meta-data) With THP, the OS will to reorganize pages in the background to keep things compact. → Split larger pages into smaller pages. → Combine smaller pages into larger pages. → Can cause the DBMS process to stall on memory access. Almost every DBMS says to disable this feature: → Oracle, MemSQL, NuoDB, MongoDB, Sybase IQ Source: Alexandr Nikitin

  11. 11 FIELD STORAGE FORMAT (TYPES) INTEGER / BIGINT / SMALLINT / TINYINT → C/C++ Representation (endianness depends on CPU) FLOAT / REAL/DOUBLE vs. NUMERIC / DECIMAL → Floating-point Decimals / Fixed-point Decimals VARCHAR / VARBINARY / TEXT / BLOB → Pointer to other location if type is ≥ 64-bits → Header with length and address to next location (if segmented), followed by data bytes. TIME / DATE / TIMESTAMP → 32/64-bit integer of (micro)seconds since Unix epoch

  12. 12 VARIABLE PRECISION NUMBERS Inexact, variable-precision floating point type that uses the “native” C/C++ types → Example: FLOAT , REAL , DOUBLE → FLOAT(n) : n is number of bits that are used to store the mantissa of the float number → REAL = FLOAT(24) → DOUBLE = FLOAT(53) Store directly as specified by IEEE-754. Typically faster than fixed precision numbers.

  13. 13 VARIABLE PRECISION NUMBERS Rounding Example #include <stdio.h> int main( int argc, char * argv[]) { float x = 0.1; float y = 0.2; printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

  14. 14 VARIABLE PRECISION NUMBERS Rounding Example #include <stdio.h> Output int main( int argc, char * argv[]) { float x = 0.1; x+y = 0.30000001192092895508 float y = 0.2; 0.3 = 0.29999999999999998890 printf("x+y = %.20f\n", x+y); printf("0.3 = %.20f\n", 0.3); }

  15. 15 FIXED PRECISION NUMBERS Numeric data types with arbitrary precision and scale. Used when round errors are unacceptable. → Example: NUMERIC , DECIMAL → NUMERIC = DECIMAL Typically stored in a exact, variable-length binary representation with additional meta-data. → Like a VARCHAR but not stored as a string → 2 times slower to sum one million values.

  16. 16 POSTGRES: NUMERIC typedef unsigned char NumericDigit; typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

  17. 17 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

  18. 18 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; int scale; int sign; NumericDigit *digits; } numeric;

  19. 19 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; NumericDigit *digits; } numeric;

  20. 20 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric;

  21. 21 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

  22. 22 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

  23. 23 POSTGRES: NUMERIC # of Digits typedef unsigned char NumericDigit; Weight of 1 st Digit typedef struct { int ndigits; int weight; Scale Factor int scale; int sign; Positive/Negative/NaN NumericDigit *digits; } numeric; Digit Storage

  24. 24 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( id INT PRIMARY KEY , value BIGINT );

  25. 25 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT );

  26. 26 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT );

  27. 27 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT );

  28. 28 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address)

  29. 29 TUPLE STORAGE FORMAT char[] CREATE TABLE AndySux ( header id value id INT PRIMARY KEY , value BIGINT ); reinterpret_cast< int32_t* > (address) Reinterpret cast does not compile to any instructions. It instructs the compiler to treat the sequence of bits as if it is of <int32_t> type.

  30. 30 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux VALUES (” His jokes are the worst that I have ever heard. I hate him so much. " );

  31. 31 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux VALUES (” His jokes are the worst that I have ever heard. I hate him so much. " );

  32. 32 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT His jokes are the VALUES (” His jokes are the worst that I have ever heard. worst that I have ever heard. I hate I hate him so much. " ); LENGTH NEXT him so much.

  33. 33 VARIABLE-LENGTH FIELDS char[] CREATE TABLE AndySux ( header 64-BIT POINTER His |64-BIT POINTER value VARCHAR(1024) ); Variable-Length Data Blocks INSERT INTO AndySux LENGTH NEXT His jokes are the VALUES (” His jokes are the worst that I have ever heard. worst that I have ever heard. I hate I hate him so much. " ); LENGTH NEXT him so much.

  34. 34 NULL DATA TYPES Choice #1: Special Values → Designate a value to represent NULL for a particular data type (e.g., INT32_MIN ). Choice #2: Null Column Bitmap Header → Store a bitmap in the tuple header that specifies what attributes are null. Limits the number of columns. Choice #3: Per Attribute Null Flag → Store a flag that marks that a value is null (not in header). → Have to use more space than just a single bit because this messes up with word alignment. Increases column size.

Recommend


More recommend