Disk Partitions Physical Filesystem Layout

We will use the general term data to refer to the system and user data stored on the disk. User data is the real data kept in files within the filesystem; system data is the data needed to identify and manage the user data. The system data presents a necessary overhead, but from the system standpoint this data is crucial for managing the filesystem. The data block is the smallest data unit. Each UNIX file consumes one or more blocks. If all the files blocks are known, the file itself can be easily managed. An additional step to identify the sequence of blocks that make the file is required. This is exactly why we organize files into a filesystem. We can look to the filesystem as a kind of umbrella that covers files and provides mechanisms for their use; system data keeps information needed for their accurate identification and allocation.

6.2 Physical Filesystem Layout

In our attempt to fully understand the filesystem layout, we will follow the traditional path in managing disk space. There are a few good reasons for such an approach: it is still prevailing in use; it is always easier to start with less complex issues and then go toward more complex ones; and the strongest argument — behind any logical structure is a physical layout that can never be bypassed. At the very end, each file must be physically stored in the magnetic disk media. Disks have cylinders: concentric circles within the disks plates that are farther divided into tracks, or segments we will use the term track. Data is always stored in blocks that are spread over the disk space; the block can be located in any track. Each track contains a well−defined number of blocks usually 512 blocks. Each block is uniquely identified by the block number. The disk controller knows how to allocate each block specified by its number within the whole disk space. Block allocation means mapping the block number into the disk geometry to the corresponding cylinder and track and a block in the track. Once a block is allocated, it can easily be accessed and processed. Disks cannot be used directly from shelves; they must be prepared for data storage. In UNIX terminology, it means the physical filesystem layout must be properly defined and put in the operation. In this section we will address main issues related to the physical filesystem layout. They are grouped around: Disk partitions — the way to specify a storage entity for the usage • Filesystem structures — mechanisms to manage data on the disk • File identification and allocation — the way to identify and access files on the disk • Performance−related issues — how to improve the performances of the filesystem • This section partially refers to Chapter 2, especially in the part about special device files.

6.2.1 Disk Partitions

For a long time the basic UNIX filesystem storage entity was a disk partition. This simply involved partitioning of the magnetic disk into several smaller pieces suitable for additional processing. You can compare this to putting filing cabinets here partitions within a filing closet the disk in an office. It is the first step to take, but still the cabinets are not prepared to store the files. Some items are still not ready; drawers and their inventories are not yet prepared. We just decided and specified the size of the storage space. 142 Both UNIX platforms, BSD and System V, organized disks around fixed−size partitions but different partitions had different sizes. UNIX treated disk partitions as independent devices; each of them was accessed as if it were a physically separate disk — consequently, the terms partition and disk could be used alternatively. One physical disk might be divided into several partitions, or be configured with only one partition. In the past disk, partitions were usually defined in advance by the OS. Thus they offered few division schemes. The number of partitions was fixed, while their size could be specified. Imagine that only a predetermined number of filing cabinets could go into the filing closet, but you could decide the size of each cabinet. Typically each disk was divided into multiple partitions: eight partitions for BSD and ten for System V, with some overlapping of the partitions. Simple BSD disk partition schemes are presented in Figure 6.1. Figure 6.1: Simple BSD disk partitioning. Eight different partitions might be defined for a disk, named by the letters a to h; a partition could be skipped if its size was 0. The c partition comprised the entire disk, including the forbidden inaccessible area. The g partition overlapped with the d, e, and f partitions. It was not possible to use them all simultaneously, since some of them included the same disk space — for example, either partitions d through f or the partition g could be accessed. Actually, this disk layout offered three different ways of using the disk: divided into four partitions, or six partitions, or to use the whole disk. Each partition might hold a filesystem, or it could be used as a swap partition. The OS offered this flexibility — from todays point of view it was not much, but it was adequate to manage everything in a decent way. The swap partition plays a special role in each UNIX system. UNIX memory management system MMS requires the dedicated disk space for normal paging and swapping. Recall the discussion of 143 Paging presents a regular exchange of data pages between the system memory and disk. Paging is an ordered process based on certain performance−related criteria. • Swapping presents an emergency situation when the system encounters a significant lack of the memory space and a lack of time to do that in an ordered way. Swapping is an irregular process and performance−wise it should never happen. • The swap partition is used as a raw partition. The complex filesystem structures would only make the swapping slower. Swap partition must be used in the simplest possible way and this is the flat organization provided by the MMS itself. Briefly, the swap partition does not know and does not care about UNIX filesystem. A logical question arises: Why does a disk−partitioning scheme have to be defined in advance, and why in such a strict way? Why was the decision about partitioning not left to the system administrator? Supposedly the UNIX designers wanted to make this sensitive and relatively tough administrative task easier to handle; less flexibility makes things simpler. But to fully understand such an approach, perhaps a closer look into the very early stages of UNIX systems is needed. In the early days of UNIX development, a number of disk control functions were determined on the hardware level, so the first disk controllers were quite restricted in the way they managed disk partitions; even the partition sizes were hardwired within the controller hardware. So at the time partition schemes were established, there were not a lot of choices. Since then, with the development of the technology, things have changed and most of the disk−related issues have been shifted into the software or sometimes the firmware. To keep the new UNIX systems compatible with the old ones, the slightly modified old partition scheme continued to exist. The partition size can be specified arbitrarily, and in that way the number of partitions. It makes the partition scheme sufficiently flexible even for todays standards. By simply assigning its size to zero, a partition could be skipped, and any partition combination become viable. At the same time, the required special device files for the selected partitions already exist, and all needs seem to be met. The partition scheme presented in Figure 6.1 was, and still is, implemented by Sun Microsystems. It was used by SunOS and is now used by Solaris. Despite the fact that today we can combine multiple disks or partitions in larger logical volumes, this partition scheme remains useful and used. UNIX accesses any disk partition through the corresponding special device file see Chapter 2. A special device file is a pointer to the disk driver within the kernel in UNIX all device drivers are part of the kernel. It is essential that the kernel supports implemented disk interface; otherwise the disk cannot be used at all. You should not worry about that because UNIX fully supports all usual disk interfaces, and the kernel has been built properly during the UNIX installation. Most UNIX flavors provide some kind of tool to create disk partitions the format utility on Solaris and SunOS, SAM on HP−UX, SMIT on AIX, etc.. This tool automatically creates the required special device files in the dev directory. A special device file can be created also manually: the UNIX mknod command is available. Its usage is trivial, only two arguments are required: the major and minor device number. Sometimes other front−end commands, or scripts, can also be available.

6.2.2 Filesystem Structures