Data Structure and Storage Storage CS386, Introduction to Database - - PowerPoint PPT Presentation

data structure and storage storage
SMART_READER_LITE
LIVE PREVIEW

Data Structure and Storage Storage CS386, Introduction to Database - - PowerPoint PPT Presentation

Data Structure and Storage Storage CS386, Introduction to Database Systems Jay Urbain Credits: Database Management Systems, Raghu Ramkrishnan and Johannes Gehrke Data Management, Richard Watson Data Structures Disks (Durable storage) are


slide-1
SLIDE 1

Data Structure and Storage Storage

CS386, Introduction to Database Systems Jay Urbain Credits: Database Management Systems, Raghu Ramkrishnan and Johannes Gehrke Data Management, Richard Watson

slide-2
SLIDE 2

Data Structures

Disks (Durable storage) are a bottleneck

The goal is to minimize disk access Disks are slow compared to main memory Appropriate data structures can reduce disk accesses Appropriate data structures can reduce disk accesses

slide-3
SLIDE 3

Storage and Indexing

Abstraction of data in a DBMS:

Collection of records or a file.

For MySQL MyISAM =>1 table = file.

Each file consists of one or more pages. Each file consists of one or more pages. Files and Access Layers organizes data to support fast

access to desired records.

Understanding how records are organized is essential to

using DBMS effectively.

slide-4
SLIDE 4

File Organization

File Organization – methods of arranging records in a file when a file is stored on disk.

Each file organization makes certain operations

efficient, but others expensive.

slide-5
SLIDE 5

File Organization

Example: File of employee records(age, name, salary) Query: Retrieve records in-order of increasing age Options for organization??? Options for organization???

slide-6
SLIDE 6

File Organization

Example: File of employee records(age, name, salary) Query: Retrieve records in-order of increasing age Options: Options:

Sorting file by age is a good file organization, but sort

  • rder is expensive to maintain.

You are often interested in supporting more than one

  • peration on a give collection of records.

Retrieve all employees making > $5000 would require

scan of entire file!

slide-7
SLIDE 7

Database access

DBMS File manager Disk manager Page read Page returned Record returned DBMS manager manager Record request Page request Read page command

slide-8
SLIDE 8

Data on External Storage

Data must persist across program executions. Page – unit of information read/written to disk. Usually

4K to 8K. Configurable. Maps to block.

Cost of (disk) page I/O dominates cost of typical DB Cost of (disk) page I/O dominates cost of typical DB

  • perations.

DBMS are carefully optimized to minimize this cost. Disks most important storage mechanism.

slide-9
SLIDE 9

Data on External Storage

Each record in a file has a unique identifier - record ID. Record ID (rid) uniquely identifies disk address of page

containing record.

Buffer Manager (BM) – reads/writes data between memory

and disk for persistent storage. and disk for persistent storage.

Files and access layer (FL) - asks BM to fetch pages with

  • rid. BM fetches from disk if not in memory.

Disk Space Manager (DSM) – manages space (pages) on

disk.

When FL needs additional space to hold new records in a

file, it asks the DSM to allocate an additional page.

Also informs DSM when a page is no longer needed. DSM tracks and attempts to reuse discarded pages.

slide-10
SLIDE 10

Disks

Data stored on concentric tracks on a surface A disk drive can have multiple surfaces Disk access time

Spin up time Rotational delay - Waiting for the physical storage location of the

data to appear under the read/write head data to appear under the read/write head

Seek time - Moving the read/write head to the track on which the

storage location can be found.

Spindle RPM Average latency (ms) 4200 7.14 5400 5.55 7200 4.17 10000 3 15000 2

slide-11
SLIDE 11

File Organizations and Indexing

File of records abstraction implemented by FL. A relation (table) typically stored as a file of records. Also supports scan.

FL stores records in a file in a collection of disk pages,

FL stores records in a file in a collection of disk pages,

keeps track of pages allocated to each file, and space within pages allocated to file (insert/delete).

slide-12
SLIDE 12

Database Index

A database index is a data structure that improves the

speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.

Indexes can be created using one or more columns of a

database table.

The disk space required to store the index is typically

less than that required by the table, yielding the possibility to store indices in memory for a table whose data is too large to store in memory.

slide-13
SLIDE 13

Database Index

In a relational database, an index is a copy of one part of a

table.

Some databases extend the power of indexing by allowing

indices to be created on functions or conditional indices to be created on functions or conditional expressions.

Indices may be defined as unique or non-unique. A unique index acts as a constraint on the table by

preventing duplicate entries in the index and thus the backing table.

Choosing effective table organization and a good collection

  • f indexes is critical for improving performance.
slide-14
SLIDE 14

Indexing

An index is a small file that has data for one or

more fields of a file

Indexes reduce disk accesses

slide-15
SLIDE 15

File Organizations and Indexing

Heap File – Simplest file structure is unordered file.

Records in a heap file are stored in random order across the pages of the file. FL must track pages allocated to files.

Index – data structure that organizes data records on

disk to optimize certain kinds of retrieval operations.

Provides efficient retrieval for records satisfying

search conditions on the search key fields of the index.

Can create additional indexes on a given collection of

data records, each with a different search key.

slide-16
SLIDE 16

File Organizations and Indexing

Example: employee records(age, name, salary)

Can retrieve records in a file organized as an index on employee

age versus sorting the file by age.

Can create auxiliary index file based on salary, to speed queries

involving salary. involving salary.

Data entry refers to the records stored in an index file.

A data entry with search key value k contains enough information

to locate (one or more) data records with search key k.

Search an index to find the desired data entries, then use index

data to obtain record data (if needed).

slide-17
SLIDE 17

File Organizations and Indexing

Main alternatives for what to store as a data entry in an index:

1.

A data entry k* is an actual data record (with search key value k). Indexed file organization.

2.

A data entry is a <k, rid> pair - rid is the record id of a data record with search key value k. record with search key value k.

3.

A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of data records with search key value k.

Option 1 avoids storing data records multiple times. Options 2 and 3 contain data entries that point to data records and are

independent of file organization.

Option 3 has more efficient space utilization. Option 2 allows entries of variable length.

slide-18
SLIDE 18

Types of Indexes

Clustered – use search key of a clustered file, the rids in

qualifying data entries point to a contiguous collection of records, need to only retrieve a few data pages.

Un-clustered – each qualifying data entry could contain a

rid that points to a distinct data page, leading to many rid that points to a distinct data page, leading to many data page I/Os.

slide-19
SLIDE 19

Non-Clustered Indexes

The data is present in random order, but the logical

  • rdering is specified by the index.

Data rows may be randomly spread throughout the table. The non-clustered index tree contains the index keys in

sorted order.

The leaf level of the index containing the pointer to the

page and the row number in the data page.

slide-20
SLIDE 20

Non-Clustered Indexes (cont.)

The physical order of the rows is not the same as the

index order.

Typically created on column used in JOIN, WHERE, and

ORDER BY clauses.

Good for tables whose values (and structure) may be

modified frequently.

There can be more than one non-clustered index on a

database table.

Often the default index (SQL Server).

slide-21
SLIDE 21

Clustered Indexes

Clustered - File is organized so that the ordering of data records is the same (or close) to the ordering of data entries in some index. Otherwise unclustered.

Option 1 is clustered by definition. Options 2 or 3 can be a clustered index only if the data

records are sorted on the search key field.

In practice (1) is clustered, (2) and (3) are unclustered.

slide-22
SLIDE 22

Clustered Indexes

Clustering alters the data block into a certain distinct

  • rder to match the index, resulting in the row data being

stored in order.

Therefore, only one clustered index can be created on a

given database table. given database table.

Clustered indices can greatly increase overall speed of

retrieval, but usually only where the data is accessed sequentially in the same or reverse order of the clustered index (or when a range of items is selected).

slide-23
SLIDE 23

Clustered Indexes

Since the physical records are in this sort order on disk,

the next row item in the sequence is immediately before

  • r after the last one, and so fewer data block reads are

required.

They are known as "index organized tables" They are known as "index organized tables"

under Oracle database.

In Oracle database, multiple tables can be joined into

a cluster.

Default PK index for MyISAM. A cluster can be keyed with a B-Tree index or a hash

table.

slide-24
SLIDE 24

Primary and Secondary Indexes

Primary index – index on a set of fields that includes the

primary key.

Secondary index – other indexes. Common to refer to option (1) as primary index, and (2) Common to refer to option (1) as primary index, and (2)

and (3) as secondary indexes.

Depending on the DBMS this is not necessarily true.

Data entries are duplicates if they have the same value

for the search key field associated with the index.

Primary keys, by definition can not have duplicates.

Secondary keys can.

slide-25
SLIDE 25

Column order

The order in which columns are listed in the index

definition is important.

It’s possible to retrieve a set of row identifiers using only

the first indexed column.

It’s not possible or efficient (on most databases) to It’s not possible or efficient (on most databases) to

retrieve the set of row identifiers using only the second

  • r greater indexed column.

Example:

Phone book that is organized by city first, then by last name, and

then by first name.

If you are given the city, you can easily extract the list of all

phone numbers for that city. Tedious to find all the phone numbers for a given last name.

slide-26
SLIDE 26

Applications and limitations

SELECT first_name FROM people WHERE last_name = 'Smith';

To process this statement without an index the database

software must look at the last_name column on every software must look at the last_name column on every row in the table ( full table scan).

With an index the database simply follows the B-

tree data structure until the Smith entry has been found.

Much less computationally expensive than a full table

scan.

slide-27
SLIDE 27

Applications and limitations

SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';

Query would yield an email address for every customer

whose email address ends with "@yahoo.com“.

Even if the email_address column has been indexed the

database must perform a full index scan. database must perform a full index scan.

Why?

slide-28
SLIDE 28

Applications and limitations

SELECT email_address FROM customers WHERE email_address LIKE '%@yahoo.com';

Query would yield an email address for every customer

whose email address ends with "@yahoo.com“.

Even if the email_address column has been indexed the

database must perform a full index scan. database must perform a full index scan.

Why?

Index is built with the assumption that words go from left to right. With a wildcard at the beginning of the search-term, the

database software is unable to use the underlying b-tree index.

Add another index created on reverse(email_address) and a

query like this:

SELECT email_address FROM customers WHERE

reverse(email_address) LIKE reverse('%@yahoo.com');.

slide-29
SLIDE 29

Index type: Bitmap

Index that stores the bulk of its data as bit

arrays (bitmaps).

Answers most queries by performing bitwise logical

  • perations on these bitmaps.

Most commonly used index, such as B+trees, are most

efficient if the values it indexes do not repeat or repeat a efficient if the values it indexes do not repeat or repeat a smaller number of times.

In contrast, the bitmap index is designed for cases where

the values of a variable repeat very frequently.

Example: gender field in a customer database usually

contains two distinct values: male or female. For such variables, the bitmap index can have a significant performance advantage over the commonly used trees.

slide-30
SLIDE 30

Indexes types (cont.)

Dense index

Index in DB is a file with pairs of keys and pointers for

every record in the data file.

Every key in this file is associated with a particular pointer to a

record in the sorted data file.

Sparse index Sparse index

Index in databases is a file with pairs of keys and pointers for

every block in the data file.

Every key in this file is associated with a particular pointer to the

block in the sorted data file.

In clustered indices with duplicate keys, the sparse index

points to the lowest search key in each block. primary key is a sparse index.

Reverse index Reverses the key value before entering it in the index.

slide-31
SLIDE 31

Hash-based Indexing

Can organize records using a technique called hashing to

quickly find records that have a given search key value.

Example: “hash” employee records on name field.

Records in a file are grouped in “buckets” consisting of

primary page, and (possibly) additional pages linked in a chain.

Determine a page’s bucket from hash function. Given a bucket #, can retrieve primary page for the

bucket in 1 or 2 disk I/Os.

slide-32
SLIDE 32

Hash-based Indexing

Insert

Record is inserted into the appropriate bucket, with “overflow”

pages allocated as necessary.

Search Search

Apply hashing function to search key field to identify bucket to

which records belong.

May need to search linked pages - O(n). Look at all pages in that bucket. Note: If you do not have the search key value for the record, e.g.,

salary, you have to scan all pages!

slide-33
SLIDE 33

Hashing

hash address = remainder after dividing SSN by 10000

slide-34
SLIDE 34

Clustering

Notes:

Records that are often retrieved together should be

stored together

Intra-file clustering Intra-file clustering

Records within the one file

A sequential file

Inter-file clustering

Records in different files

A nation and its stocks

slide-35
SLIDE 35

Tree-based Indexing

Data entries are arranged in sorted order by search key

value.

Hierarchical search data structure is maintained that directs

searches to the correct page of data entries.

Each node is typically a physical page, and retrieving a

page is a physical disk I/O.

The lowest level of the tree, the leaf, contains the data

entries.

Allows efficient location of data entries with search key

values in a desired range.

Leaf nodes contained in double-linked list for range queries

that span multiple leaf pages.

Each non-leaf node can accommodate a large number of

pointers.

slide-36
SLIDE 36

B-tree

Sequence set is a single level index with pointers to

records

Index set is a tree-structured index to the sequence set

slide-37
SLIDE 37

B+ tree

The combination of index set (the B-tree) and the

sequence set is called a B+ tree

The number of data values and pointers for any given

node are not restricted node are not restricted

Free space is set aside to permit rapid expansion of a

file

Tradeoffs

Fast retrieval when pages are packed with data

values and pointers

Slow updates when pages are packed with data

values and pointers

slide-38
SLIDE 38

Comparison Operations

Scan

Fetch all records in a file.

Search with equality selection

Fetch all records satisfying an equality selection.

Search with range selection Insert a record Insert a record

Identify page in the file the recod must be inserted Fetch that page from disk Modify it to include new record Write it back

Delete a record

Identify page Fetch it from disk Modify it Write it back

slide-39
SLIDE 39

Modeling Costs

B – number of data pages when records packed onto pages

with no wasted space.

R – number of records/page. D – average time to read/write disk page. C – average time to process record, i.e., compare fields. C – average time to process record, i.e., compare fields. H – time to apply hash function. F – fan-out in tree indexes (usually at least 100).

slide-40
SLIDE 40

Modeling Costs

D = 15 msecs (disk I/O dominates). C and H = 100 nsecs.

slide-41
SLIDE 41

Modeling Costs – Heap files

Scan – B(D+RC)

Retrieve each of B pages taking time D per page. For each page, process R records taking time C per record.

Search with Equality

Assuming selection matches a candidate key, on average scan half

the file: 0.5* B(D+RC). the file: 0.5* B(D+RC).

Otherwise must scan file.

Search with Range

Entire file must be scanned - B(D+RC)

Insert

Records always inserted at end of file Fetch last page, add record, write to disk: 2D+C

Delete

Find record, remove recod from page, write: search cost + D+C

slide-42
SLIDE 42

Modeling Costs – Sorted files

Scan – B(D+RC)

All files must be examined.

Search with Equality

Dlog2B+Clog2R – significant improvement over search heap files!

Search with Range

Cost of search plus cost of retrieving set of records. First record is identified (like equality), remaining records are

scanned.

Insert

Search cost + B(D+RC)

Delete

Search cost + B(D+RC)

slide-43
SLIDE 43

Disk manager

Manages physical I/O Sees the disk as a collection of pages Has a directory of each page on a disk Retrieves, replaces, and manages free pages

slide-44
SLIDE 44

File manager

Manages the storage of files Sees the disk as a collection of stored files Each file has a unique identifier Each record within a file has a unique record identifier

slide-45
SLIDE 45

File manager's tasks

Create a file Delete a file Retrieve a record from a file

Update a record in a file

Update a record in a file Add a new record to a file Delete a record from a file

slide-46
SLIDE 46

Sequential retrieval

Consider a file of 10,000 records each occupying 1 page Queries that require processing all records will require

10,000 accesses

e.g., Find all items of type 'E' e.g., Find all items of type 'E'

Many disk accesses are wasted if few records meet the

condition

slide-47
SLIDE 47

Indexing

An index is a small file that has data for one or

more fields of a file

Indexes reduce disk accesses

slide-48
SLIDE 48

Querying with an index

Read the index into memory Search the index to find records meeting the condition Access only those records containing required data

Or if index already has required data, skip this step Or if index already has required data, skip this step

Disk accesses are substantially reduced when the query

involves few records, or the index has the required data

slide-49
SLIDE 49

Maintaining an index

Adding a record requires at least two disk accesses

Update the file Update the index

Trade-off

Trade-off

Faster queries Slower maintenance

slide-50
SLIDE 50

Using indexes

Sequential processing of a portion of a file

Find all items with a type code in the range 'E' to 'K'

Direct processing

Find all items with a type code of 'E' or 'N'

Existence testing

Determining whether a record meeting the criteria

exists without having to retrieve it

slide-51
SLIDE 51

Multiple indexes

Find red items of type 'C'

Both indexes can be searched to identify

records to retrieve

slide-52
SLIDE 52

Multiple indexes

Indexes are also called inverted lists

A list of record locations rather than data

Trade-off Trade-off

Faster retrieval Slower maintenance

slide-53
SLIDE 53

Sparse indexes

Taking advantage of the physical sequence of a file Assume 2 records per page Tradeoffs

Fewer disk accesses required to read the index Existence tests not possible

slide-54
SLIDE 54

B-tree

A form of inverted list Default index for most relational databases Frequently used for relational systems Basis of IBM’s VSAM underlying DB2 Basis of IBM’s VSAM underlying DB2 Supports sequential and direct accessing Has two parts

Sequence set Index set

slide-55
SLIDE 55

B-tree

Sequence set is a single level index with pointers to

records

Index set is a tree-structured index to the sequence set

slide-56
SLIDE 56

B+ tree

The combination of index set (the B-tree) and the

sequence set is called a B+ tree

The number of data values and pointers for any given

node are not restricted node are not restricted

Free space is set aside to permit rapid expansion of a

file

Tradeoffs

Fast retrieval when pages are packed with data

values and pointers

Slow updates when pages are packed with data

values and pointers

slide-57
SLIDE 57

Hashing

A technique for reducing disk access for direct access Avoids a large index data structure Number of accesses per record can be close to one

The hash field is converted to a hash address by a hash

The hash field is converted to a hash address by a hash

function

slide-58
SLIDE 58

Shortcomings of hashing

Different hash fields convert to the same hash address

Synonyms Store the colliding record in an overflow area with

linked list (chain) with linear time access. linked list (chain) with linear time access.

Long synonym chains degrade performance There can be only one hash field The file can no longer be processed sequentially I.e., does not help list, order by, range queries.

slide-59
SLIDE 59

Hashing

hash address = remainder after dividing SSN by 10000

slide-60
SLIDE 60

Linked list

A structure for inter-file clustering (not always, Oracle

has a hash-cluster index that allows tables(files) to be co-located - very fast for some applications.

An example of a parent/child structure An example of a parent/child structure

slide-61
SLIDE 61

Linked lists

There can be two-way pointers, forward and

backward, to speed up deletion

Each child can have a pointer to its parent

slide-62
SLIDE 62

Bit map indexes

Uses a single bit, rather than multiple bytes, to indicate

the specific value of a field

Color can have only three values, so use three bits

Itemcode Color Code Disk address Red Green Blue A N 1001 1 1 d1 1002 1 1 d2 1003 1 1 d3 1004 1 1 d4

slide-63
SLIDE 63

Bit map indexes

A bit map index saves space and time compared to a

standard index

Less space => less disk IO => faster access

Itemcode Color Char(8) Code Char(1) Disk address 1001 Blue N d1 1002 Red A d2 1003 Red A d3 1004 Green A d4

slide-64
SLIDE 64

Join indexes

Speed up joins by creating an index for the primary

key and foreign key pair

Oracle uses hash cluster

slide-65
SLIDE 65

Creating indexes

CREATE INDEX index_name

ON table_name (column_name)

  • r
  • r

CREATE UNIQUE INDEX index_name

ON table_name (column_name)

slide-66
SLIDE 66

Sample Indexes

drop index doc_docnum_idx create index doc_docnum_idx on documents(docnum) // compound index alter table sentences add constraint pk_sentences

primary key(docid, parid, sentid)

slide-67
SLIDE 67

Oracle Hash Cluster Index

create cluster index_hash_cluster (term varchar2(termSize)) hashkeys 1000 size queryClusterSize; create table invertedindex (termid number(8) create table invertedindex (termid number(8) primary key, term varchar2(termSize), idf number(8,3), df number(8), pf number(8), sf number(8), cf number(8), pavgt number(8,7), ptc number(8,7) ) cluster index_hash_cluster(term);

slide-68
SLIDE 68

Data coding standards

ASCII UNICODE

slide-69
SLIDE 69

ASCII

Each alphabetic, numeric, or special character is

represented by a 7-bit code

128 possible characters ASCII code usually occupies one byte ASCII code usually occupies one byte

slide-70
SLIDE 70

UNICODE

A unique binary code for every character, no matter what

the platform, program, or language

Currently contains 34,168 distinct characters derived

from 24 supported language scripts

Covers the principal written languages Two encoding forms

A default 16-bit form A 8-bit form called UTF-8 for ease of use with existing

ASCII-based systems

The default encoding of HTML and XML The basis of global software

slide-71
SLIDE 71

Data storage devices

What data storage device will be used for

On-line data

Access speed Capacity Capacity

Back-up files

Security against data loss

Archival data

Long-term storage

slide-72
SLIDE 72

Key variables

Data volume Data volatility Access speed

Storage cost

Storage cost Medium reliability Legal standing of stored data

slide-73
SLIDE 73

Magnetic technology

Up to 50% of IS hardware budgets are spent on

magnetic storage

A $50 billion market The major form of data storage The major form of data storage A mature and widely used technology Strong magnetic fields can erase data Magnetization decays with time

slide-74
SLIDE 74

Fixed disks

Sealed, permanently mounted Highly reliable Access times of 4-10 msec Access times of 4-10 msec Transfer rates as high as 1,300 Mbytes

per second

Capacities of Gbytes to Tbytes

slide-75
SLIDE 75

A disk storage unit

slide-76
SLIDE 76

RAID

Redundant arrays of inexpensive or

independent drives

Exploits economies of scale of disk

Exploits economies of scale of disk manufacturing for the personal computer market

Can also give greater security Increases a systems fault tolerance Not a replacement for regular backup

slide-77
SLIDE 77

Mirroring

slide-78
SLIDE 78

Mirroring

Write

Identical copies of a file are written to each drive in an array

Read

Alternate pages are read simultaneously from each drive Alternate pages are read simultaneously from each drive Pages put together in memory Access time is reduced by approximately the number of

disks in the array

Read error

Read required page from another drive

Tradeoffs

Reduced access time Greater security More disk space

slide-79
SLIDE 79

Striping

slide-80
SLIDE 80

Striping

Three drive model Write

Half of file to first drive Half of file to second drive Parity bit to third drive Parity bit to third drive

Read

Portions from each drive are put together in memory

Read error

Lost bits are reconstructed from third drive’s parity data

Tradeoffs

Increased data security Less storage capacity than mirroring Not as fast as mirroring

slide-81
SLIDE 81

RAID levels

All levels, except 0, have common

features

The operating system sees a set of The operating system sees a set of

physical drives as one logical drive

Data are distributed across physical

drives

Parity is used for data recovery

slide-82
SLIDE 82

RAID levels

Level 0

Data spread across multiple drives No data recovery when a drive fails

Level 1

Level 1

Mirroring Critical non-stop applications

Level 3

Striping

Level 5

A variation of striping Parity data is spread across drives Less capacity than level 1 Higher I/O rates than level 3

slide-83
SLIDE 83

RAID 5

slide-84
SLIDE 84

Magnetic technology

Removable magnetic disk Magnetic tape Magnetic tape cartridge Magnetic tape cartridge Mass storage

slide-85
SLIDE 85

Solid State Drives

Arrays of memory chips Can be 50 times faster than magnetic

storage

More robust More robust 64GB E-Series P2 card suggested list

price of $998 => $15 per Gbyte

Magnetic disk is about $0.12 per Gbyte

Stock trading and video-streaming

applications

slide-86
SLIDE 86

External flash drive

Small Removable Solid state Solid state USB connector Up to 2 Gbytes capacity Around $2.50 per Gbyte

slide-87
SLIDE 87

Optical technology

A more recent development than magnetic Use a laser for reading and writing data High storage densities High storage densities Low cost Direct access Long storage life Not susceptible to head crashes

slide-88
SLIDE 88

Optical technology

slide-89
SLIDE 89

CD-ROM

CD can store data as well as sound Economies of scale because of common

components for CD players and CD-ROM components for CD players and CD-ROM drives

ROM - read only memory Capacity of 650 M bytes Relatively slow device

100 ms access time

slide-90
SLIDE 90

Magneto-optical disk

High capacity read-write medium 3.5" disk can store up to 256 M bytes Not as fast as fixed disk Not as fast as fixed disk

10 msec access time

Compact Reliable Suitable for data transfer, backup, and

archival purposes

slide-91
SLIDE 91

Digital Versatile Disc (DVD)

The same physical size as a CD-ROM but up to 28 times the

capacity (i.e., 17 Gbytes)

DVD drives are likely to have transfer rates of around 2.76 M

bytes/sec and access times of 150 msec

DVD-ROM drive will play both audio CDs and CD-ROMs DVD-ROM drive will play both audio CDs and CD-ROMs Read-only versions

DVD-Video (movies) DVD-ROM (software) DVD-Audio (songs)

DVD-R

Recordable (write once, read many)

DVD-RAM

Erasable (write many, read many)

slide-92
SLIDE 92

SAN

Storage area network Supports dynamic sharing of large amounts of

data, regardless of operating system or application application

Communicates via pipelines that consist of an

interface called Fibre Channel

A high speed data connection between computer

devices

Prices vary from $20-30,000 to 5 million

slide-93
SLIDE 93

Storage life

slide-94
SLIDE 94

Merit of data storage devices

  • !"#

$# %""

  • &&&

& &&& & && & ' &&& &&& &&& && && & $( &&& &&& &&& && &&& & $# && && &&& && && & ' & & &&& & & & ) & && & &&& && & !" && &&& & &&& && & *" && &&& & &&& && & + &&& &&& &&& && &&& & !,$-* & && & && &&& &&& !,$ & && & && &&& && !,$. & && & && &&& & .-$* & &&& & &&& &&& && *", && &&& && &&& &&& & ,$-* & &&& & &&& &&& &&& ,$ & &&& & &&& &&& && ,$* & &&& && &&& &&& &

slide-95
SLIDE 95

Data compression

Encoding digital data so it requires less

storage space and thus less network bandwidth

Lossless

File can be restored to original state

Lossy

File cannot be restored to original state Used for graphics, video, and audio files

slide-96
SLIDE 96

Key points

Disk drives are relatively slow compared to

main memory

A variety of techniques are used to

A variety of techniques are used to

  • vercome the disk access bottleneck

Storage devices vary on several

parameters

Select a storage device based on storage

and retrieval goals