Error Handling | Komputasi | Suatu Permulaan

9.8 Error Handling

The previous sections of this chapter were all about how basic operations get done. This one is about what to do when things go wrong. It is desirable to give the application a chance to recover The sad fact is that programs often ask the system to do something it cannot do. Sometimes it is the program’s fault. For example, a program should verify that a value is not zero before dividing by this value. But what should the system do when the program does not check, and the hardware flags an exception? Other times it is not the program’s fault. For example, when one process tries to communicate with another, it cannot be held responsible for misbehavior of the other process. After all, if it knew everything about the other process, it wouldn’t have tocommunicate in the first place. In either case, the simple way out is to kill the process that cannot proceed. How- ever, this is a rather extreme measure. A better solution would be to punt the problem back to the application, in case it has the capability to recover. System calls report errors via return values Handling an error depends on when it happens. The easy case is if it happens when the operating system is working on behalf of the process, that is, during the execution of a system call. This is quite common. For example, the program might pass the system some illegal argument, like an invalid file descriptor recall that in Unix a file descriptor is an index into the process’s file descriptors table, which points to an entry that was allocated when a file was opened. The indexes of entries that have not been allocated, numbers larger than the table size, and negative numbers are all invalid. Alternatively, the arguments may be fine technically, but still the requested action is impossible. For example, a request to open a file may fail because the named file does not exist, or because the open files table is full and there would be no place to store the required information about the file had it been opened. Exercise 142 What are possible reasons for failure of the fork system call? How about write ? And close ? When a system call cannot perform the requested action, it is said to fail. This is typically indicated to the calling program by means of the system call’s return value. For example, in Unix most system calls return -1 upon failure, and 0 or some non- negative number upon successful completion. It is up to the program to check the return value and act accordingly. If it does not, its future actions will probably run into trouble, because they are based on the unfounded assumption that the system call did what it was asked to do. 184 Exceptions require some asynchronous channel The more difficult case is problems that occur when the application code is running, e.g. division by zero or issuing of an illegal instruction. In this case the operating system is notified of the problem, but there is no obvious mechanism to convey the information about the error condition to the application. Unix uses signals In Unix, this channel is signals. A signal is analogous to an interrupt in software. There are a few dozen pre-defined signals, including floating-point exception e.g. division by zero, illegal instruction, and segmentation violation attempt to access an invalid address. An application may register handlers for these signals if it so wishes. When an exception occurs, the operating system sends a signal to the appli- cation. This means that before returning to running the application, the handler for the signal will be called. If no handler was registered, some default action will be taken instead. In the really problematic cases, such as those mentioned above, the default action is to kill the process. Exercise 143 Where should the information regarding the delivery of signals to an application? Once the signalling mechanism exists, it can also be used for other asynchronous events, not only for hardware exceptions. Examples include • The user sends an interrupt from the keyboard. • A timer goes off. • A child process terminates. Processes can also send signals to each other, using the kill system call. A problem with signals is what to do if the signalled process is blocked in a system call. For example, it may be waiting for input from the terminal, and this may take a very long time. The solution is to abort the system call and deliver the signal. Mach uses messages to a special port Another example is provided by the Mach operating system. In this system, processes called tasks in the Mach terminology are multithreaded, and communicate by send- ing messages to ports belonging to other tasks. In addition to the ports created at run time for such communication, each task has a pre-defined port on which it can receive error notifications. A task is supposed to create a thread that blocks trying to receive messages from this port. If and when an error condition occurs, the opearting system sends a message with the details to this port, and the waiting thread receives it. 185 These mechanisms are used to implement language-level constructs, such as trycatch in Java An example of the use of these mechanisms is the implementation of constructs such as the trycatch of Java, which expose exceptions at the language level. This construct includes two pieces of code: the normal code that should be executed the “try” part, and the handler that should run if exceptions are caught the “catch” part. To im- plement this on a Unix system, the catch part is turned into a handler for the signal representing the exception in question. Exercise 144 How would you implement a program that has sevaral different instances of catching the same type of exception? Bibliography [1] M. Aron and P. Druschel, “ Soft timers: efficient microsecond software timer sup- port for network processing ”. ACM Trans. Comput. Syst. 183, pp. 197–228, Aug 2000. [2] M. J. Bach, The Design of the UNIX Operating System. Prentice-Hall, 1986. [3] Y. Etsion, D. Tsafrir, and D. G. Feitelson, “ Effects of clock resolution on the scheduling of interactive and soft real-time processes ”. In SIGMETRICS Conf. Measurement Modeling of Comput. Syst. , pp. 172–183, Jun 2003. [4] K. Flautner and T. Mudge, “ Vertigo: automatic performance-setting for Linux ”. In 5th Symp. Operating Systems Design Implementation, pp. 105–116, Dec 2002. [5] A. Goel, L. Abeni, C. Krasic, J. Snow, and J. Walpole, “ Supporting time-sensitive applications on a commodity OS ”. In 5th Symp. Operating Systems Design Implementation , pp. 165–180, Dec 2002. 186 Chapter 10 SMPs and Multicore operating system developed in single-processor environment in the 1970s. now most servers and many desktops are SMPs. near future is chip multiprocessors, possibly heterogeneous.

10.1 Operating Systems for SMPs