Filesystem Structures Physical Filesystem Layout

Paging presents a regular exchange of data pages between the system memory and disk. Paging is an ordered process based on certain performance−related criteria. • Swapping presents an emergency situation when the system encounters a significant lack of the memory space and a lack of time to do that in an ordered way. Swapping is an irregular process and performance−wise it should never happen. • The swap partition is used as a raw partition. The complex filesystem structures would only make the swapping slower. Swap partition must be used in the simplest possible way and this is the flat organization provided by the MMS itself. Briefly, the swap partition does not know and does not care about UNIX filesystem. A logical question arises: Why does a disk−partitioning scheme have to be defined in advance, and why in such a strict way? Why was the decision about partitioning not left to the system administrator? Supposedly the UNIX designers wanted to make this sensitive and relatively tough administrative task easier to handle; less flexibility makes things simpler. But to fully understand such an approach, perhaps a closer look into the very early stages of UNIX systems is needed. In the early days of UNIX development, a number of disk control functions were determined on the hardware level, so the first disk controllers were quite restricted in the way they managed disk partitions; even the partition sizes were hardwired within the controller hardware. So at the time partition schemes were established, there were not a lot of choices. Since then, with the development of the technology, things have changed and most of the disk−related issues have been shifted into the software or sometimes the firmware. To keep the new UNIX systems compatible with the old ones, the slightly modified old partition scheme continued to exist. The partition size can be specified arbitrarily, and in that way the number of partitions. It makes the partition scheme sufficiently flexible even for todays standards. By simply assigning its size to zero, a partition could be skipped, and any partition combination become viable. At the same time, the required special device files for the selected partitions already exist, and all needs seem to be met. The partition scheme presented in Figure 6.1 was, and still is, implemented by Sun Microsystems. It was used by SunOS and is now used by Solaris. Despite the fact that today we can combine multiple disks or partitions in larger logical volumes, this partition scheme remains useful and used. UNIX accesses any disk partition through the corresponding special device file see Chapter 2. A special device file is a pointer to the disk driver within the kernel in UNIX all device drivers are part of the kernel. It is essential that the kernel supports implemented disk interface; otherwise the disk cannot be used at all. You should not worry about that because UNIX fully supports all usual disk interfaces, and the kernel has been built properly during the UNIX installation. Most UNIX flavors provide some kind of tool to create disk partitions the format utility on Solaris and SunOS, SAM on HP−UX, SMIT on AIX, etc.. This tool automatically creates the required special device files in the dev directory. A special device file can be created also manually: the UNIX mknod command is available. Its usage is trivial, only two arguments are required: the major and minor device number. Sometimes other front−end commands, or scripts, can also be available.

6.2.2 Filesystem Structures

Disk partitioning per se will not allow you to start to use the specified disk space. UNIX files cannot be stored directly in such raw storage entities. UNIX files can only reside within the UNIX 144 Similarly, a UNIX filesystem has to be created in each disk partition before we can start to use it for our UNIX files. When a filesystem is built in UNIX, certain system data structures are written into the reserved system part of the partition. This system data uniquely defines the physical layout of the filesystem. Its main task is to provide correct allocation of UNIX files within this partition. Filesystems are mutually separated; each filesystem has its own independent system data structures. A single file cannot be shared between two filesystems, i.e., two partitions. The most important filesystem data structure is the superblock. The superblock is a set of tables that contain important information about the filesystem such as its label, size, and a list of index nodes, better known by the shorter name inodes. The superblock determines the filesystem type, and all incompatibilities among different filesystems including between different UNIX filesystem types — see Chapter 5 are caused by the superblock differences. UNIX can use a specific filesystem only if knows how to read the filesystem superblock; without this understanding the disk is a compilation of senseless and useless data blocks. A visual depiction of the BSD and System V filesystem layouts are presented in Figure 6.2. The Berkeley filesystem layout included some additional information about filesystems like the cylinder group block, while System V included certain additional dynamic information about current free space. However, the main difference was that Berkeley filesystems originally spread multiple superblock copies over the available disk space. If a superblock is damaged, the filesystem becomes useless. It was a good idea to keep several superblock copies separately. If one copy is damaged, the Berkeley system automatically switches to another. 145 Figure 6.2: The filesystem layout. Through the years, the Berkeley filesystem proved to be faster and more robust, and provided better performance. Eventually the traditional System V known as the s5 filesystem became obsolete. System V release 4 discontinued with s5 filesystem and switched to the Berkeley filesystem. Additional filesystem development continues to evolve among the specific UNIX flavors. Today all filesystems have roots in the Berkeley version; the s5 filesystem disappeared. The filesystems are identified by different names: 4.2, ufs, efs, hfs, ext2, jfs, vxfs; they are mutually incompatible despite the fact that they all belong to the UNIX family of filesystems. The prevailing type in use is ufs, which stands for UNIX filesystem. Even if the filesystem name is the same, some incompatibilities among different UNIX vendors are quite possible. Throughout this text we will steer clear of flavor−specific details and elaborate on common filesystem issues. Another data structure, presented in Figure 6.2, is the single boot−block area reserved at the beginning of the filesystem. This area contained the bootstrap program that brings the UNIX system 146

6.2.3 Filesystem Creation