File Organisation - 1 Dr. V. V. Subrahmanyam Associate Professor, SOCIS, IGNOU
Introduction • File is a collection of records. • Key element in the file management is concerned with the way in which the records themselves are organised inside the file. • This affects system performance heavily as far as records retrieval is concerned.
Contd… • Access method of records in a file is dependent upon the physical medium on which the files are stored. • For example, magnetic tape is sequential by nature. So, records will be read sequentially. • While using disks, random access of records is possible.
File Organisation • It is a way of arranging the records in a file when the file is stored on secondary storage devices.
Some important definitions • File: It is a collection of related data or facts. • Fields: Theses are the columns containing one type of information. • File is a group of records; Records contain fields; fields contain data items. Data items contain characters(alphabets, digits, special characters etc..). • Each character occupies one byte of storage space.
Example • In the context of a traditional library, the author catalogue is a file. Each individual author catalogue card is a record. Each column in the card such as author, title etc.. is a field.
Objectives of File Organisation • Optimal selection of records i.e; records should be accessed as fast as possible. • Any insert, update or delete transaction on records should be easy, quick and should not harm other records. • No duplicate records should be induced as a result of insert, update or delete. • Records should be stored efficiently so that cost of storage is minimal.
Data Files and Index Files • Database is a collection of files that together implement a logical data model. • The two types of files in a physical database structure are (i) data files and (ii) index files. • Data Files: These files store the facts that comprise the database. • Index Files: These files support access to the data files but usually do not themselves store facts other than key values.
Example • Consider a simple bibliographical database. It consists of records containing bibliographical details about books. Each record about a book consists several fields (Author, Title, Imprint etc..). For fast access to the records, index files or inverted file is created – each record in which may hold the index term (Author’s name or Subject descriptor etc..) and an index number. It is similar to back of the book index.
Structure/Organisation and Access method • The method of organising the record in a file is referred to as its structure or organisation. • The method of searching the file in order to retrieve the data is called the access method. – Sequential – Random
• For a particular file the most appropriate organisation is determined on the basis of the operational characteristics of the storage medium and nature of operations to be performed on the data. • Magnetic disks are examples of direct access storage devices and magnetic tapes are examples of sequential storage devices.
Types of file organisations • Sequential File Organization • Indexed Sequential Access Method • Heap File Organization • Hash/Direct File Organization • B+ Tree File Organization • Cluster File Organization
Sequential File Organisation • This is the simplest technique. • Records are written in a sequence in one long list. • They are arranged in the same sequence in which they were originally entered/written into the file. • The file is read from the beginning in the sequence in which the records are arranged.
Contd… • To retrieve start at the beginning of the file read one after other in sequence until the record is searched for. • Time consuming for large files. • These files are stored on sequential storage device like magnetic tapes. • Suitable for storing only for archive, backup and transport copies of databases.
Sequential Organisation
Inserting a New Record • Requires creation of a new file. • To maintain file sequence, records are copied to the point where amendment is required. • The changes are then made and copied into the new file. • Following this, the remaining records in the original file are copied to the new file.
Inserting a New Record
Disadvantages • Sorted file method always involves the effort for sorting the record. • Each time any insert/update/ delete transaction is performed, file is sorted. Hence identifying the record, inserting/ updating/ deleting the record, and then sorting them always takes some time and may make system slow.
Indexed Sequential Access Method (ISAM) • Is designed to overcome the limitations of the sequential file. • A file is sequenced on a particular field and an index for that file is built based on that very field. • This index provides a mechanism for faster search. • This technique allows both sequential and random processing.
ISAM Index File Data File PrimaryKey Block Pointer (in Blocks)
Inserting a Record • While inserting a record, in order to maintain the sequence of records sometimes this may necessitate shifting subsequent records. • For a large file this is a costly and inefficient process. • Instead, an overflow area is provided so that the records that overflow their logical area are shifted into a designated overflow area and pointer is provided to it to the overflow location.
Inserting a Record Original Logical Block 611 612 614 618 624 Overflow Block Original Logical Block 611 612 614 615 611 618
Advantages • Since each record has its data block address, searching for a record in larger database is easy and quick with proper primary key. • This method gives flexibility of using any column as key field and index will be generated based on that. In addition to the primary key and its index, we can have index generated for other fields too.
Contd… • It supports range retrieval, partial retrieval of records. Since the index is based on the key value, we can retrieve the data for the given range of values. In the same way, when a partial key value is provided, say student names starting with ‘JA’ can also be searched easily.
Disadvantages • An extra cost to maintain index has to be afforded. i.e.; we need to have extra space in the disk to store this index value. When there is multiple key-index combinations, the disk space will also increase. • As the new records are inserted, these files have to be restructured to maintain the sequence. Similarly, when the record is deleted, the space used by it needs to be released. Else, the performance of the database will slow down.
Recommend
More recommend