Limit Index: This index groups the records (keys) and only provides the location

FILE ORGANISATIONS

  Introduction

  Magnetic disk storage is available in many forms, including floppies, hard-disks, cartridge, exchangeable multi-platter, and fixed disks. The following deals with the concepts which are applied, in many different ways, to all of the above methods. A typical disk pack comprises of 6 disks held on a central spindle. As the top and bottom are disregarded with recording information, only 10 surfaces are used, each with 200 concentric recording tracks. A diagrammatic representation is shown in Figure 1. Each of the 200 tracks will be capable of holding equal amounts of information as this is a feature that is provided by the special software (housekeeping) that is used in conjunction with the handling of disk files. When the unit is in position in the drive, the tracks are accessed by a comb of 10 read/write heads held rigidly on an arm assembly which moves in and out between the disk as illustrated. In this way 10 tracks may be referenced without further movement of the heads once they are positioned over the first track. For the sake of economy it is therefore usual to record over a 'cylinder' of data (see Figure 1) and avoid unnecessary movement of the heads. The disk pack is loaded onto the drive and revolves at normal speed of approximately 3600 rpm when being accessed. Each track can hold upwards of 4000 bytes. Sometimes the sectors on a disk are used in a similar manner to inter-block gaps on a tape file but in other cases they are ignored as far as data handling is concerned.

  Because this device has the ability to locate an area of data directly it is known as a Direct Access Device and is popular for the versatility that this affords. Direct access devices provide a storage medium which is a satisfactory compromise between main store and magnetic tape.

  Figure 1 - Disk Pack There are a large number of ways records can be organised on disk or tape. The main methods of file organisation used for files are: Serial Sequential Indexed Sequential Random (or Direct)

  a) Serial Organisation Serial files are stored in chronological order, that is as each record is received it is stored in the next available storage position. In general it is only used on a serial medium such as magnetic tape. This type of file organisation means that the records are in no particular order and therefore to retrieve a single record the whole file needs to be read from the begging to end. Serial organisation is usually the method used for creating Transaction files (unsorted), Work and Dump files.

  b) Sequential Organisation Sequential files are serial files whose records are sorted and stored in an ascending or descending on a particular key field. The physical order of the records on the disk is not necessarily sequential, as most manufacturers support an organisation where certain records (inserted after the file has been set up) are held in a logical sequence but are physically placed into an overflow area. They are no longer physically contiguous with the preceding and following logical records, but they can be retrieved in sequence.

  c) Indexed Sequential Organisation Indexed Sequential file organisation is logically the same as sequential organisation, but an This method combines the advantages of a sequential file with the possibility of direct access using the Primary Key (the primary Key is the field that is used to control the sequence of the records). These days manufacturers providing Indexed Sequential Software allow for the building of indexes using fields other than the primary Key. These additional fields on which indexes are built are called Secondary Keys. There are three major types of indexes used:

  Basic Index: This provides a location for each record (key) that exists in the system. Implicit Index: This type of index gives a location of all possible records (keys) whether they exist or not.

  Limit Index: This index groups the records (keys) and only provides the location of the highest key in the group. Generally they form a hierarchical index.

  Data records are blocked before being written to disk. An index may consist of the highest key in each block, (or on each track).

  Index Data

  1 A0025 A0012

  2 A0053 A0017 Block 1

  3 A0075 A0025 A0037 A0038 Block 2 A0053 A0064 A0073 Block 3 A0075

  Figure 2 The Block Index In the above example, data records are shown as being 3 to a block. The index, then, holds the key of the highest record in each block. (An essential element of the index, which has been omitted from the diagram for simplicity, is the physical address of the block of data records). Should we wish to access record 5, whose key is A0038, we can quickly determine from the index that the record is held m block 2, since this key is greater than the highest key in block 1, A0025, and less the record we wish to retrieve, hence the term "direct access".

  d) Random (or Direct) A randomly organised file contains records arranged physically without regard to the sequence of the primary key. Records are loaded to disk by establishing a direct relationship between the Key of the record and its address on the file, normally by use of a formula (or algorithm) that converts the primary Key to a physical disk address. This relationship is also used for retrieval. The use of a formula (or algorithm) is known as 'Key Transformation' and there are several techniques that can be used:

  Division Taking Quotient

  Division Taking Remainder Truncation Folding Squaring Radix Conversion

  These methods are often mixed to produce a unique address (or location) for each record (key). The production of the same address for two different records is known as a synonym.

  Random files show a distinct advantage where: Hit Rate is low Data cannot be batched or sorted Fast response is required.

  Catering for expansion

  A normal tendency of master files is to expand. Records may be increased in size or may be added. Even if the total size or number of records does not increase, there will almost inevitably be changes. Although it is usual to update files on disk by overlay there must be provision for additions and preferably some means of re-utilising storage arising from deletion.

  Overflow arises from: i) A record being assigned to a block that is already full.

  There are a number of methods for catering for expansion: i) Specifying less than 100% block packing density on initial load. ii) Specifying less than 100% cylinder packing density. iii) Specifying extension blocks (usually at the end of the file). The first method is only effective and efficient if the expansion is regular. Where localised expansion occurs to any extent, even in one block, this system will fail. The other two method have the result of allowing space for first and second level overflow, respectively. Extension blocks are used when all other overflow facilities have been exhausted.

  Ideally, first level overflow is situated on the same cylinder as the overflowed block hence there is no penalty incurred in terms of head movement. Second level overflow normally consists of one or more blocks at the end of the file. If a significant number of accesses to second level overflow are made, run-time will increase considerably. There is the a need to reorganise the file to bring record back from overflow. Each file organisation can be accessed or processed in different ways, often combing the advantages of one organisation with the advantages of another.

  Summary of file organisation and access methods:

  

ACCESS METHODS

FILE Serial Sequential Random ORGANISATION Serial

  

X

Sequential

  X Indexed Sequential

  X X Random

  

X

X

  The transfer time of data from a direct storage device such as a disk drive can be calculated, however the formulae needed for the different types of file organisations differ. An example of these formula are shown on the following pages.

SEQUENTIAL FILE TRANSFER TIMINGS:

  This time taken to transfer ALL records will equal The transfer time for the file

  • Total SEEK time for the file
  • Total Latency for the file

  In order to calculate the above the following are required:

  1. Size of file in characters = Transfer time for file

  Transfer Rate

  2. Bytes per Track

  = Number of records per track

  Characters per record

  3. Number of Records

  = Number tracks required

  Number of records per track

  4. Number of tracks required

  = Number of cylinders required

  5. Number of cylinders

  X = Total seek time for file

  Minimum SEEK Time

  6. Number of cylinders

  X = Total Latency for file

  Average Latency For Ransom or Direct file organisations both the SEEK time and Latency between each record transferred needs to be included in the calculation:

  1. Size of file in Characters

  = Transfer time for file

  Transfer Rate

  2. Number of records in file X = Total SEEK time for file

  Average SEEK time

  3. Number of Records in file X = Total Latency for file

  Average Latency The sum of these three calculations will give the transfer time for All records.

  In order to calculate the Latency (Rotational Delay) the time for one rotation of the disk needs to be expressed in milliseconds.

  RPM

  = R.P.Seconds

  60

  1

  = Latency express as M. seconds

  R.P.Seconds Access time = SEEK + Average Latency.