We mentioned earlier that in UNIX everything is a file, or is file−like. Given what we now know about file ownership and file mode, perhaps it is more appropriate to say that in UNIX everything is
dressed like a file. This means everything appears like a file, but there are still differences in the file content and the way the file is managed and processed.
These differences result in different kinds of files, or in UNIX terminology, different file types. The type of a file determines how the file will be handled.
The long listing of the ls −l command also displays the file type; a leading single letter, or hyphen, in the leftmost position of the first column in the listing that presents the file mode, identifies a file type.
The file type is identified in the following way:
− Plain regular file d Directory
c Character special file b Block special file
l Symbolic link s Socket
p Named pipe
Here is an example:
ls−l drwx−−−−−− 2 bjl mail 24 Mar 24 18:19 Mail
−rwxrw−rw− 1 bjl users 20 May 2 18:26 file1 lrwxrwxrwx 1 bjl users 20 May 2 18:28 file2 − usrlocalbinfile2
Three different file types are displayed: a regular file −, a directory d, and a symbolic link l. A brief summary of file types follows.
2.2.4.1 Plain Regular File
A plain file is just a sequence of bytes: a data file, an ASCII file, a binary data file, executable binary program, etc. In most cases when we talk about files, we are thinking of plain files. They are
identified by the hyphen − in the long listing of a directory they reside in.
2.2.4.2 Directory
A binary file, a directory is a list of the files within it including any subdirectories. Entries are filename−inode pairs. In UNIX each file is identified by an inode an official name is index node.
For simplicity, we will assume that an inode fully specifies the file, and that by knowing the inode, UNIX actually knows everything about the file itself ownership, mode, type, other properties,
contents, location on the disk except its name. The directory relates the filename with the file itself; the filename−inode pairs that make a content of a directory itself actually establish this relationship.
Although it might seem odd to a beginner, UNIX can find a filename only in the corresponding directory. If a directory is corrupted, all of its filenames can be easily lost, while the corresponding
files remain unchanged and unnamed.
The special entries . and .. single and double dots refer to the directory itself and its parent directory, respectively. A directory in its long listing is identified with the letter d.
45
A special device file is used to describe the attached IO device. UNIX accesses devices via their special files. In UNIX, device drivers themselves software interfaces that control the devices are
part of the kernel, and can be accessed by using certain system calls UNIX internals. A special device file is a kind of pointer to the corresponding device driver within the kernel; it is a very simple
file that contains two pointers: major and minor numbers. The major number points to the device class, while the minor number points to the individual device within the class.
All special device files reside in the directory dev and its subdirectories on System V. There are two groups of special device files: block device files and character device files.
2.2.4.3.1 Block Device File
IO operations are provided through a group of buffers; the system maintains a buffer pool for all block devices. The block device is accessed in fixed−size blocks. Physically, the high−speed data
transfer is realized using a DMA mechanism direct memory access data transfer. The letter b in the long listing of a directory identifies the block device files. The following disk−related block device
files are examples of block device files: devdisk0a or devdskc1d1s5.
2.2.4.3.2 Character Device File
Nonbuffered IO operations are provided via a character or raw device. Physically, the data transfer is performed through a registered data exchange between the device and its controller. Character
devices include all devices that do not fit the block IO transfer. The letter c in the long listing of a directory identifies the character device files. The following disk related raw device files are
examples of character special files: devrdisk0a or devrdskc1d1s5.
2.2.4.4 Link
A link is a mechanism that allows multiple filenames to refer to a single file on a disk, i.e., a single inode. There are two kinds of links: hard links and symbolic links.
2.2.4.4.1 Hard Link
A hard link associates two or more filenames with an inode; each inode keeps a record of a number of linked filenames. Only when all filenames are deleted will the file itself also be deleted, and the
corresponding inode released and returned as free for new file assignments. Strictly speaking, a hard link is not a separate file type; each hard link represents an already existing file with an
additional filename. The only way to identify mutually hard−linked filenames is to list a directory or directories by using the ls −i command and check for identical inode numbers. The −i option
displays, beside the filename, the inode number for each displayed file in the listed directory.
Hard links always remain within the same filesystem; simply, inodes cannot be shared between filesystems, and two hard links are always associated with the same inode. A hard link never
creates a new file; it only attaches a new filename to the existing file. This means that a hard link only presents a new entry in a directory, a new record about a filename−inode pair.
To create a hard link use the ln command:
ln myfile hardlink
46
2.2.4.4.2 Symbolic Link
A symbolic link is a pointer file to another file elsewhere in the overall hierarchical directory tree. By creating a symbolic link, a new small file is also created; this new file contains the full−path filename
of the linked file. There is no restriction on the use of symbolic links; they span filesystem boundaries independently of the origin of the linked file. Symbolic links are very common this
cannot be said for hard links; they are easy to create, easy to maintain and easy to see. The letter l in the long listing of a directory identifies them; a linked file is also displayed in a visually
comprehensive way see previous example for file types.
To create a symbolic link use also the ln command with the option −s:
ln −s myfile symlink
This command creates another file named symlink in the current directory with a separate inode since this is a completely new file that points to the file myfile. Both types of links are presented in
Figure 2.1. Let me explain it in more detail.
Figure 2.1: Hard and symbolic links. For an existing file named myname, which is determined by the inode index node N1, both links
are created. The hard link hardlink is another name for the file myfile, and it corresponds to the same inode N1. The symbolic link symlink represents another file determined by the inode N2; its
contents point to the file myfile.
47
What will happen if another file named myfile is created in the same directory? This is a brand new file, determined by the new index node N3 and unrelated to the existing file hardlink, which
continues to exist as a different file. However, the file symlink is now linked with the new file myname, and it continues to point to the newly created file myfile.
2.2.4.5 Socket
A special type of file used for interprocess communication on a single system or between different systems; sockets enable connection between processes. There are several kinds of sockets, and
most of them are involved in network communications. UNIX domain sockets are local ones, used in local interprocess communication; they are referenced as filesystem objects. Sockets are created
by the use of a special system call, socket, but can be treated in a similar way as other files using the same system calls. However, a socket can be read or written only by processes directly
involved in the connection. For example, printing systems, X windowing, or error system logging use sockets. Sockets were originally developed in BSD and later included in System V. The most
probable place to find sockets is the tmp directory.
2.2.4.6 Named Pipe
Another mechanism, originated in System V, to facilitate interprocess communication; the named pipe presents a FIFO first−in first−out element in this communication. The output of one process
becomes an input to another process. Named pipes are very useful when a large amount of data is involved in the interprocess communication; sometimes some application, and even OS restrictions
could be bypassed by using the named pipe.
UNIX provides the command mknod pipename p to create a named pipe pipename. The same command is used to create special device files and we will return to this command later. The trailing
character p specifies the named pipe. Pay attention this is slightly different from the usual UNIX way in specifying the command option. In the long listing of a directory the leading letter p identifies
named pipes. Again the most probable place for named pipes is the tmp directory.
2.2.4.7 Conclusion Independent of a file type, the file must be mounted before it can be accessed. Mounting is a
special UNIX process of bringing online a storage device primarily a disk that keeps the files, making the files accessible and their contents readable. Only mounted files become visible and can
be searched, found, and processed. We will cover mounting in full details in Chapters 5 and 6.
All listed file types have different natures. They are created with file−type specific UNIX commands, but other UNIX commands are mostly applicable on all file types. The output of the same UNIX
command can be different depending on the file types, but the command itself would work. For example, the command:
cat filename
will display the contents of the file filename. But if filename is a symbolic link, the command will display the contents of the linked file.
48
2.3 Devices and Special Device Files
A device is a dedicated piece of hardware that provides a particular function within the computer system. A device itself can be located internally or externally. Regardless of the location, devices
are treated equally within their classes.
A device driver is a program that manages the systems interaction with a particular device; it presents a needed interface to translate between the hardware commands understood by the
device, and the kernel. Such a system structure keeps UNIX reasonably hardware−independent.
Device drivers are parts of the kernel; they are not user processes. However, they can be accessed both from within the kernel and from the user space. User−level access is provided through special
device files. The kernel transforms operations on these special files into calls to the driver code.
Special device files are also called device special files. Independent of their naming, these files are really special and different than regular files. Their mission is special in the UNIX paradigm. We will
use both names arbitrarily, or even simply special files.
Special device files are mapped to devices via two pointers: major and minor device numbers. These numbers are stored in the inode for a particular special file. The major device number
identifies a device driver for a specific class of devices a single driver can be used for a number of devices of the same type; the minor device number is a parameter within the specified device
driver.
Each device driver has routines for performing necessary functions in its interaction with the device. These basic functions are: probe, attach, open, close, read, reset, stop, select, strategy, dump,
psize, write, timeout, interrupt processing, and io control ioctl. The addresses of these functions for each driver independent of the character and block devices are stored in the jump table inside
the kernel. The major device number indexes the jump tables; this is provided through another table known as device switch table. Briefly, the mapping is performed in the following way: the major
device number points to the corresponding entry in the device switch table. The minor device number is passed as a parameter to the relevant function in the device driver. The device driver is
free to interpret the minor number as it sees fit, although in most cases it uses it as a port number as is the case when a single driver controls multiple devices of the same type. As soon as the
kernel catches the reference, it looks up the appropriate function name in the drivers jump table and transfers control to it. To perform a device−specific operation that does not have a direct analog in
the filesystem model for example, ejecting a floppy disk, the ioctl system call is used to transfer a request directly into the driver.
This treatment of devices in a file−like way is one of the fundamental design elements that make UNIX so powerful. Just as the proven solutions for files ownership, mode, access rights, and
protection have been implemented in the case of devices, the same has been done with user commands as well. Meanwhile, existing differences in command interpretations were maintained.
We will see what this all means in the following example of the copy command:
cp path1filename1 path2filename2
49