Clustering Files Cached Filesystems RAM Disks, tmpfs, cachefs

- 307 -

14.1.2 Clustering Files

Reading many files sequentially is faster if the files are clustered together on the disk, allowing the disk-head reader to flow from one file to the next. This clustering is best done in conjunction with defragmenting the disks. The overheads in finding the location of a file on the disk detailed in the previous section are also minimized for sequential reads if the files are clustered. If you cannot specify clustering files at the disk level, you can still provide similar functionality by putting all the files together into one large file as is done with the ZIP filesystem. This is fine if all the files are read-only files or if there is just one file that is writeable you place that at the end. However, when there is more than one writeable file, you need to manage the location of the internal files in your system as one or more grow. This becomes a problem, and is not usually worth the effort. If the files have a known bounded size, you can pad the files internally, thus regaining the single file efficiency.

14.1.3 Cached Filesystems RAM Disks, tmpfs, cachefs

Most operating systems provide the ability to map a filesystem into the system memory. This ability can speed up reads and writes to certain files in which you control your target environment. Typically, this technique has been used to speed up the reading and writing of temporary files. For example, some compilers of languages in general, not specifically Java generate many temporary files during compilation. If these files are created and written directly to the system memory , the speed of compilation is greatly increased. Similarly, if you have a set of external files that are needed by your application, it is possible to map these directly into the system memory, thus allowing their reads and writes to be speeded up greatly. But note that these types of filesystems are not persistent. In the same way the system memory of the machine gets cleared when it is rebooted, so these filesystems are removed on reboot. If the system crashes, anything in a memory-mapped filesystem is lost. For this reason, these types of filesystems are usually suitable only for temporary files or read-only versions of disk-based files such as mapping a CD-ROM into a memory resident filesystem. Remember that you do not have the same degree of fine control over these filesystems that you have over your application. A memory-mapped filesystem does not use memory resources as efficiently as working directly from your application. If you have direct control over the files you are reading and writing, it is usually better to optimize this within your application rather than outside it. A memory-mapped filesystem takes space directly from system memory. You should consider whether it would be better to let your application grow in memory instead of letting the filesystem take up that system memory. For multiuser applications, it is usually more efficient for the system to map shared files directly into memory, as a particular file then takes up just one memory location rather than being duplicated in each process. The actual creation of memory-mapped filesystems is completely system-dependent, and there is no guarantee that it is available on any particular system though most modern operating systems do support this feature. On Unix systems, the administrator needs to look at the documentation of the mount command and its subsections on cachefs and tmpfs . Under Windows , you should find details by looking at the documentation to set up a RAM disk: this is a portion of memory mapped to a logical disk drive. In a similar way, there are products available that precache shared libraries DLLs and even executables in memory. This usually means only that an application starts quicker or loads the - 308 - shared library quicker, and so may not be much help in speeding up a running system for example, Norton SpeedStart caches DLLs and device drivers in memory on Windows systems. But you can apply the technique of memory-mapping filesystems directly and quite usefully for applications in which processes are frequently started. Copy the Java distribution and all class files all JDK, application, and third-party class files onto a memory-mapped filesystem and ensure that all executions and classload s take place from that filesystem. Since everything executables, shared libraries, class files, resources, etc. is already in memory, the startup time is much faster. Because it is only the startup and classloading time that is affected, this technique is only a small boost for applications that are not frequently starting processes, but can be usefully applied if startup time is a problem.

14.1.4 Disk Fragmentation