Data Structure and Storage Storage
CS386, Introduction to Database Systems Jay Urbain Credits: Database Management Systems, Raghu Ramkrishnan and Johannes Gehrke Data Management, Richard Watson
Data Structure and Storage Storage CS386, Introduction to Database - - PowerPoint PPT Presentation
Data Structure and Storage Storage CS386, Introduction to Database Systems Jay Urbain Credits: Database Management Systems, Raghu Ramkrishnan and Johannes Gehrke Data Management, Richard Watson Data Structures Disks (Durable storage) are
CS386, Introduction to Database Systems Jay Urbain Credits: Database Management Systems, Raghu Ramkrishnan and Johannes Gehrke Data Management, Richard Watson
Disks (Durable storage) are a bottleneck
The goal is to minimize disk access Disks are slow compared to main memory Appropriate data structures can reduce disk accesses Appropriate data structures can reduce disk accesses
Abstraction of data in a DBMS:
Collection of records or a file.
For MySQL MyISAM =>1 table = file.
Each file consists of one or more pages. Each file consists of one or more pages. Files and Access Layers organizes data to support fast
access to desired records.
Understanding how records are organized is essential to
using DBMS effectively.
File Organization – methods of arranging records in a file when a file is stored on disk.
Each file organization makes certain operations
efficient, but others expensive.
Example: File of employee records(age, name, salary) Query: Retrieve records in-order of increasing age Options for organization??? Options for organization???
Example: File of employee records(age, name, salary) Query: Retrieve records in-order of increasing age Options: Options:
Sorting file by age is a good file organization, but sort
You are often interested in supporting more than one
Retrieve all employees making > $5000 would require
scan of entire file!
DBMS File manager Disk manager Page read Page returned Record returned DBMS manager manager Record request Page request Read page command
Data must persist across program executions. Page – unit of information read/written to disk. Usually
4K to 8K. Configurable. Maps to block.
Cost of (disk) page I/O dominates cost of typical DB Cost of (disk) page I/O dominates cost of typical DB
DBMS are carefully optimized to minimize this cost. Disks most important storage mechanism.
Each record in a file has a unique identifier - record ID. Record ID (rid) uniquely identifies disk address of page
containing record.
Buffer Manager (BM) – reads/writes data between memory
and disk for persistent storage. and disk for persistent storage.
Files and access layer (FL) - asks BM to fetch pages with
Disk Space Manager (DSM) – manages space (pages) on
disk.
When FL needs additional space to hold new records in a
file, it asks the DSM to allocate an additional page.
Also informs DSM when a page is no longer needed. DSM tracks and attempts to reuse discarded pages.
Data stored on concentric tracks on a surface A disk drive can have multiple surfaces Disk access time
Spin up time Rotational delay - Waiting for the physical storage location of the
data to appear under the read/write head data to appear under the read/write head
Seek time - Moving the read/write head to the track on which the
storage location can be found.
Spindle RPM Average latency (ms) 4200 7.14 5400 5.55 7200 4.17 10000 3 15000 2
File of records abstraction implemented by FL. A relation (table) typically stored as a file of records. Also supports scan.
FL stores records in a file in a collection of disk pages,
FL stores records in a file in a collection of disk pages,
keeps track of pages allocated to each file, and space within pages allocated to file (insert/delete).
A database index is a data structure that improves the
speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.
Indexes can be created using one or more columns of a
database table.
The disk space required to store the index is typically
less than that required by the table, yielding the possibility to store indices in memory for a table whose data is too large to store in memory.
In a relational database, an index is a copy of one part of a
table.
Some databases extend the power of indexing by allowing
indices to be created on functions or conditional indices to be created on functions or conditional expressions.
Indices may be defined as unique or non-unique. A unique index acts as a constraint on the table by
preventing duplicate entries in the index and thus the backing table.
Choosing effective table organization and a good collection
An index is a small file that has data for one or
more fields of a file
Indexes reduce disk accesses
Heap File – Simplest file structure is unordered file.
Records in a heap file are stored in random order across the pages of the file. FL must track pages allocated to files.
Index – data structure that organizes data records on
disk to optimize certain kinds of retrieval operations.
Provides efficient retrieval for records satisfying
search conditions on the search key fields of the index.
Can create additional indexes on a given collection of
data records, each with a different search key.
Example: employee records(age, name, salary)
Can retrieve records in a file organized as an index on employee
age versus sorting the file by age.
Can create auxiliary index file based on salary, to speed queries
involving salary. involving salary.
Data entry refers to the records stored in an index file.
A data entry with search key value k contains enough information
to locate (one or more) data records with search key k.
Search an index to find the desired data entries, then use index
data to obtain record data (if needed).
Main alternatives for what to store as a data entry in an index:
1.
A data entry k* is an actual data record (with search key value k). Indexed file organization.
2.
A data entry is a <k, rid> pair - rid is the record id of a data record with search key value k. record with search key value k.
3.
A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data records with search key value k.
Option 1 avoids storing data records multiple times. Options 2 and 3 contain data entries that point to data records and are
independent of file organization.
Option 3 has more efficient space utilization. Option 2 allows entries of variable length.
Clustered – use search key of a clustered file, the rids in
qualifying data entries point to a contiguous collection of records, need to only retrieve a few data pages.
Un-clustered – each qualifying data entry could contain a
rid that points to a distinct data page, leading to many rid that points to a distinct data page, leading to many data page I/Os.
The data is present in random order, but the logical
Data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in
sorted order.
The leaf level of the index containing the pointer to the
page and the row number in the data page.
The physical order of the rows is not the same as the
index order.
Typically created on column used in JOIN, WHERE, and
ORDER BY clauses.
Good for tables whose values (and structure) may be
modified frequently.
There can be more than one non-clustered index on a
database table.
Often the default index (SQL Server).
Clustered - File is organized so that the ordering of data records is the same (or close) to the ordering of data entries in some index. Otherwise unclustered.
Option 1 is clustered by definition. Options 2 or 3 can be a clustered index only if the data
records are sorted on the search key field.
In practice (1) is clustered, (2) and (3) are unclustered.
Clustering alters the data block into a certain distinct
stored in order.
Therefore, only one clustered index can be created on a
given database table. given database table.
Clustered indices can greatly increase overall speed of
retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index (or when a range of items is selected).
Since the physical records are in this sort order on disk,
the next row item in the sequence is immediately before
required.
They are known as "index organized tables" They are known as "index organized tables"
under Oracle database.
In Oracle database, multiple tables can be joined into
a cluster.
Default PK index for MyISAM. A cluster can be keyed with a B-Tree index or a hash
table.
Primary index – index on a set of fields that includes the
primary key.
Secondary index – other indexes. Common to refer to option (1) as primary index, and (2) Common to refer to option (1) as primary index, and (2)
and (3) as secondary indexes.
Depending on the DBMS this is not necessarily true.
Data entries are duplicates if they have the same value
for the search key field associated with the index.
Primary keys, by definition can not have duplicates.
Secondary keys can.
The order in which columns are listed in the index
definition is important.
It’s possible to retrieve a set of row identifiers using only
the first indexed column.
It’s not possible or efficient (on most databases) to It’s not possible or efficient (on most databases) to
retrieve the set of row identifiers using only the second
Example:
Phone book that is organized by city first, then by last name, and
then by first name.
If you are given the city, you can easily extract the list of all
phone numbers for that city. Tedious to find all the phone numbers for a given last name.
SELECT first_name FROM people WHERE last_name = 'Smith';
To process this statement without an index the database
software must look at the last_name column on every software must look at the last_name column on every row in the table ( full table scan).
With an index the database simply follows the B-
tree data structure until the Smith entry has been found.
Much less computationally expensive than a full table
scan.
SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';
Query would yield an email address for every customer
whose email address ends with "@yahoo.com“.
Even if the email_address column has been indexed the
database must perform a full index scan. database must perform a full index scan.
Why?
SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';
Query would yield an email address for every customer
whose email address ends with "@yahoo.com“.
Even if the email_address column has been indexed the
database must perform a full index scan. database must perform a full index scan.
Why?
Index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the
database software is unable to use the underlying b-tree index.
Add another index created on reverse(email_address) and a
query like this:
SELECT email_address FROM customers WHERE
reverse(email_address) LIKE reverse('%@yahoo.com');.
Index that stores the bulk of its data as bit
arrays (bitmaps).
Answers most queries by performing bitwise logical
Most commonly used index, such as B+trees, are most
efficient if the values it indexes do not repeat or repeat a efficient if the values it indexes do not repeat or repeat a smaller number of times.
In contrast, the bitmap index is designed for cases where
the values of a variable repeat very frequently.
Example: gender field in a customer database usually
contains two distinct values: male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.
Dense index
Index in DB is a file with pairs of keys and pointers for
every record in the data file.
Every key in this file is associated with a particular pointer to a
record in the sorted data file.
Sparse index Sparse index
Index in databases is a file with pairs of keys and pointers for
every block in the data file.
Every key in this file is associated with a particular pointer to the
block in the sorted data file.
In clustered indices with duplicate keys, the sparse index
points to the lowest search key in each block. primary key is a sparse index.
Reverse index Reverses the key value before entering it in the index.
Can organize records using a technique called hashing to
quickly find records that have a given search key value.
Example: “hash” employee records on name field.
Records in a file are grouped in “buckets” consisting of
primary page, and (possibly) additional pages linked in a chain.
Determine a page’s bucket from hash function. Given a bucket #, can retrieve primary page for the
bucket in 1 or 2 disk I/Os.
Insert
Record is inserted into the appropriate bucket, with “overflow”
pages allocated as necessary.
Search Search
Apply hashing function to search key field to identify bucket to
which records belong.
May need to search linked pages - O(n). Look at all pages in that bucket. Note: If you do not have the search key value for the record, e.g.,
salary, you have to scan all pages!
hash address = remainder after dividing SSN by 10000
Notes:
Records that are often retrieved together should be
stored together
Intra-file clustering Intra-file clustering
Records within the one file
A sequential file
Inter-file clustering
Records in different files
A nation and its stocks
Data entries are arranged in sorted order by search key
value.
Hierarchical search data structure is maintained that directs
searches to the correct page of data entries.
Each node is typically a physical page, and retrieving a
page is a physical disk I/O.
The lowest level of the tree, the leaf, contains the data
entries.
Allows efficient location of data entries with search key
values in a desired range.
Leaf nodes contained in double-linked list for range queries
that span multiple leaf pages.
Each non-leaf node can accommodate a large number of
pointers.
Sequence set is a single level index with pointers to
records
Index set is a tree-structured index to the sequence set
The combination of index set (the B-tree) and the
sequence set is called a B+ tree
The number of data values and pointers for any given
node are not restricted node are not restricted
Free space is set aside to permit rapid expansion of a
file
Tradeoffs
Fast retrieval when pages are packed with data
values and pointers
Slow updates when pages are packed with data
values and pointers
Scan
Fetch all records in a file.
Search with equality selection
Fetch all records satisfying an equality selection.
Search with range selection Insert a record Insert a record
Identify page in the file the recod must be inserted Fetch that page from disk Modify it to include new record Write it back
Delete a record
Identify page Fetch it from disk Modify it Write it back
B – number of data pages when records packed onto pages
with no wasted space.
R – number of records/page. D – average time to read/write disk page. C – average time to process record, i.e., compare fields. C – average time to process record, i.e., compare fields. H – time to apply hash function. F – fan-out in tree indexes (usually at least 100).
D = 15 msecs (disk I/O dominates). C and H = 100 nsecs.
Scan – B(D+RC)
Retrieve each of B pages taking time D per page. For each page, process R records taking time C per record.
Search with Equality
Assuming selection matches a candidate key, on average scan half
the file: 0.5* B(D+RC). the file: 0.5* B(D+RC).
Otherwise must scan file.
Search with Range
Entire file must be scanned - B(D+RC)
Insert
Records always inserted at end of file Fetch last page, add record, write to disk: 2D+C
Delete
Find record, remove recod from page, write: search cost + D+C
Scan – B(D+RC)
All files must be examined.
Search with Equality
Dlog2B+Clog2R – significant improvement over search heap files!
Search with Range
Cost of search plus cost of retrieving set of records. First record is identified (like equality), remaining records are
scanned.
Insert
Search cost + B(D+RC)
Delete
Search cost + B(D+RC)
Manages physical I/O Sees the disk as a collection of pages Has a directory of each page on a disk Retrieves, replaces, and manages free pages
Manages the storage of files Sees the disk as a collection of stored files Each file has a unique identifier Each record within a file has a unique record identifier
Create a file Delete a file Retrieve a record from a file
Update a record in a file
Update a record in a file Add a new record to a file Delete a record from a file
Consider a file of 10,000 records each occupying 1 page Queries that require processing all records will require
10,000 accesses
e.g., Find all items of type 'E' e.g., Find all items of type 'E'
Many disk accesses are wasted if few records meet the
condition
An index is a small file that has data for one or
more fields of a file
Indexes reduce disk accesses
Read the index into memory Search the index to find records meeting the condition Access only those records containing required data
Or if index already has required data, skip this step Or if index already has required data, skip this step
Disk accesses are substantially reduced when the query
involves few records, or the index has the required data
Adding a record requires at least two disk accesses
Update the file Update the index
Trade-off
Trade-off
Faster queries Slower maintenance
Sequential processing of a portion of a file
Find all items with a type code in the range 'E' to 'K'
Direct processing
Find all items with a type code of 'E' or 'N'
Existence testing
Determining whether a record meeting the criteria
exists without having to retrieve it
Find red items of type 'C'
Both indexes can be searched to identify
Indexes are also called inverted lists
A list of record locations rather than data
Trade-off Trade-off
Faster retrieval Slower maintenance
Taking advantage of the physical sequence of a file Assume 2 records per page Tradeoffs
Fewer disk accesses required to read the index Existence tests not possible
A form of inverted list Default index for most relational databases Frequently used for relational systems Basis of IBM’s VSAM underlying DB2 Basis of IBM’s VSAM underlying DB2 Supports sequential and direct accessing Has two parts
Sequence set Index set
Sequence set is a single level index with pointers to
records
Index set is a tree-structured index to the sequence set
The combination of index set (the B-tree) and the
sequence set is called a B+ tree
The number of data values and pointers for any given
node are not restricted node are not restricted
Free space is set aside to permit rapid expansion of a
file
Tradeoffs
Fast retrieval when pages are packed with data
values and pointers
Slow updates when pages are packed with data
values and pointers
A technique for reducing disk access for direct access Avoids a large index data structure Number of accesses per record can be close to one
The hash field is converted to a hash address by a hash
The hash field is converted to a hash address by a hash
function
Different hash fields convert to the same hash address
Synonyms Store the colliding record in an overflow area with
linked list (chain) with linear time access. linked list (chain) with linear time access.
Long synonym chains degrade performance There can be only one hash field The file can no longer be processed sequentially I.e., does not help list, order by, range queries.
hash address = remainder after dividing SSN by 10000
A structure for inter-file clustering (not always, Oracle
has a hash-cluster index that allows tables(files) to be co-located - very fast for some applications.
An example of a parent/child structure An example of a parent/child structure
There can be two-way pointers, forward and
backward, to speed up deletion
Each child can have a pointer to its parent
Uses a single bit, rather than multiple bytes, to indicate
the specific value of a field
Color can have only three values, so use three bits
Itemcode Color Code Disk address Red Green Blue A N 1001 1 1 d1 1002 1 1 d2 1003 1 1 d3 1004 1 1 d4
A bit map index saves space and time compared to a
standard index
Less space => less disk IO => faster access
Itemcode Color Char(8) Code Char(1) Disk address 1001 Blue N d1 1002 Red A d2 1003 Red A d3 1004 Green A d4
Speed up joins by creating an index for the primary
key and foreign key pair
Oracle uses hash cluster
CREATE INDEX index_name
CREATE UNIQUE INDEX index_name
drop index doc_docnum_idx create index doc_docnum_idx on documents(docnum) // compound index alter table sentences add constraint pk_sentences
primary key(docid, parid, sentid)
create cluster index_hash_cluster (term varchar2(termSize)) hashkeys 1000 size queryClusterSize; create table invertedindex (termid number(8) create table invertedindex (termid number(8) primary key, term varchar2(termSize), idf number(8,3), df number(8), pf number(8), sf number(8), cf number(8), pavgt number(8,7), ptc number(8,7) ) cluster index_hash_cluster(term);
ASCII UNICODE
Each alphabetic, numeric, or special character is
represented by a 7-bit code
128 possible characters ASCII code usually occupies one byte ASCII code usually occupies one byte
A unique binary code for every character, no matter what
the platform, program, or language
Currently contains 34,168 distinct characters derived
from 24 supported language scripts
Covers the principal written languages Two encoding forms
A default 16-bit form A 8-bit form called UTF-8 for ease of use with existing
ASCII-based systems
The default encoding of HTML and XML The basis of global software
What data storage device will be used for
On-line data
Access speed Capacity Capacity
Back-up files
Security against data loss
Archival data
Long-term storage
Data volume Data volatility Access speed
Storage cost
Storage cost Medium reliability Legal standing of stored data
Up to 50% of IS hardware budgets are spent on
magnetic storage
A $50 billion market The major form of data storage The major form of data storage A mature and widely used technology Strong magnetic fields can erase data Magnetization decays with time
Sealed, permanently mounted Highly reliable Access times of 4-10 msec Access times of 4-10 msec Transfer rates as high as 1,300 Mbytes
Capacities of Gbytes to Tbytes
Redundant arrays of inexpensive or
Exploits economies of scale of disk
Can also give greater security Increases a systems fault tolerance Not a replacement for regular backup
Write
Identical copies of a file are written to each drive in an array
Read
Alternate pages are read simultaneously from each drive Alternate pages are read simultaneously from each drive Pages put together in memory Access time is reduced by approximately the number of
disks in the array
Read error
Read required page from another drive
Tradeoffs
Reduced access time Greater security More disk space
Three drive model Write
Half of file to first drive Half of file to second drive Parity bit to third drive Parity bit to third drive
Read
Portions from each drive are put together in memory
Read error
Lost bits are reconstructed from third drive’s parity data
Tradeoffs
Increased data security Less storage capacity than mirroring Not as fast as mirroring
All levels, except 0, have common
The operating system sees a set of The operating system sees a set of
Data are distributed across physical
Parity is used for data recovery
Level 0
Data spread across multiple drives No data recovery when a drive fails
Level 1
Level 1
Mirroring Critical non-stop applications
Level 3
Striping
Level 5
A variation of striping Parity data is spread across drives Less capacity than level 1 Higher I/O rates than level 3
Removable magnetic disk Magnetic tape Magnetic tape cartridge Magnetic tape cartridge Mass storage
Arrays of memory chips Can be 50 times faster than magnetic
More robust More robust 64GB E-Series P2 card suggested list
Magnetic disk is about $0.12 per Gbyte
Stock trading and video-streaming
Small Removable Solid state Solid state USB connector Up to 2 Gbytes capacity Around $2.50 per Gbyte
A more recent development than magnetic Use a laser for reading and writing data High storage densities High storage densities Low cost Direct access Long storage life Not susceptible to head crashes
CD can store data as well as sound Economies of scale because of common
ROM - read only memory Capacity of 650 M bytes Relatively slow device
100 ms access time
High capacity read-write medium 3.5" disk can store up to 256 M bytes Not as fast as fixed disk Not as fast as fixed disk
10 msec access time
Compact Reliable Suitable for data transfer, backup, and
The same physical size as a CD-ROM but up to 28 times the
capacity (i.e., 17 Gbytes)
DVD drives are likely to have transfer rates of around 2.76 M
bytes/sec and access times of 150 msec
DVD-ROM drive will play both audio CDs and CD-ROMs DVD-ROM drive will play both audio CDs and CD-ROMs Read-only versions
DVD-Video (movies) DVD-ROM (software) DVD-Audio (songs)
DVD-R
Recordable (write once, read many)
DVD-RAM
Erasable (write many, read many)
Storage area network Supports dynamic sharing of large amounts of
Communicates via pipelines that consist of an
A high speed data connection between computer
devices
Prices vary from $20-30,000 to 5 million
$# %""
& &&& & && & ' &&& &&& &&& && && & $( &&& &&& &&& && &&& & $# && && &&& && && & ' & & &&& & & & ) & && & &&& && & !" && &&& & &&& && & *" && &&& & &&& && & + &&& &&& &&& && &&& & !,$-* & && & && &&& &&& !,$ & && & && &&& && !,$. & && & && &&& & .-$* & &&& & &&& &&& && *", && &&& && &&& &&& & ,$-* & &&& & &&& &&& &&& ,$ & &&& & &&& &&& && ,$* & &&& && &&& &&& &
Encoding digital data so it requires less
Lossless
File can be restored to original state
Lossy
File cannot be restored to original state Used for graphics, video, and audio files
Disk drives are relatively slow compared to
A variety of techniques are used to
Storage devices vary on several
Select a storage device based on storage