Streams: Unix Pipes, FIFOs, and Sockets

while disregarding communication issues. However, it is only applicable on sets of computers that run the same system or at least the same runtime environment. Exercise 151 Can any C function be used by RPC?

12.2.2 Message Passing

RPC organized communication into structured and predefined pairs: send a set of arguments, and receive a return value. Message passing allows more flexibility by allowing arbitrary interaction patterns. For example, you can send multiple times, without waiting for a reply. Messages are chunks of data On the other hand, message passing retains the partitioning of the data into “chunks” — the messages. Sending and receiving are discrete operations, on one message. This allows for special features, such as dropping a message and proceeding directly to the next one. It is also the basis for collective operations, in which a whole set of processes participate e.g. broadcast, where one process sends information to a whole set of other processes. These features make message passing attractive for communication among the processes of parallel applications. It is not used much in networked settings.

12.2.3 Streams: Unix Pipes, FIFOs, and Sockets

Streams just pass the data piped through them in sequential, FIFO order. The dis- tinction from message passing is that they do not maintain the structure in which the data was created. This idea enjoys widespread popularity, and has been embodied in several mechanisms. Streams are similar to files One of the reasons for the popularity of streams is that their use is so similar to the use of sequential files. You can write to them, and the data gets accumulated in the order you wrote it. You can read from them, and always get the next data element after where you dropped off last time. Exercise 152 Make a list of all the attributes of files. Which would you expect to de- scribe streams as well? 201 Pipes are FIFO files with special semantics Once the objective is defined as inter-process communication, special types of files can be created. A good example is Unix pipes. A pipe is a special file with FIFO semantics: it can be written sequentially by one process and read sequentially by another. There is no option to seek to the middle, as is possible with regular files. In addition, the operating system provides some special related services, such as • If the process writing to the pipe is much faster than the process reading from it, data will accumulate unnecessarily. If this happens, the operating system blocks the writing process to slow it down. • If a process tries to read from an empty pipe, it is blocked rather then getting an EOF. • If a process tries to write to a pipe that no process can read because the read end was closed, the process gets a signal. The same happens to a process that tries to read from a pipe that no process can write. Exercise 153 How can these features be implemented efficiently? The problem with pipes is that they are unnamed: they are created by the pipe system call, and can then be shared with other processes created by a fork. They cannot be shared by unrelated processes. This gap is filled by FIFOs, which are es- sentially named pipes. To read more: See the man page for pipe. 202 Using Pipes to String Processes Together Section 3.1.4 described how Unix processes are created by the fork and exec system calls. The reason for the separation between these two calls is the desire to be able to ma- nipulate the file descriptors, and string the processes to each other with pipes. This is somewhat tricky, so we describe it in some detail. By default, a Unix process has two open files predefined: standard input stdin and stan- dard output stdout. These may be accessed via file descriptors 0 and 1, respectively. keyboard screen process stdin stdout 1 Again by default, both are connected to the user’s terminal. The process can read what the user types by reading from stdin, that is, from file descriptor 0 — just like reading from any other open file. It can display stuff on the screen by writing to stdout, i.e. to file descriptor 1. Using pipes, it is possible to string processes to each other, with the standard output of one connected directly to the standard input of the next. The idea is that the program does not have to know if it is running alone or as part of a pipe. It reads input from stdin, processes it, and writes output to stdout. The connections are handled before the program starts, that is, before the exec system call. To connect two processes by a pipe, the first process calls the pipe system call. This creates a channel with an input side and an output side. Both are available to the calling process, as file descriptors 3 and 4 file descriptor 2 is the predefined standard error, stderr. keyboard screen pipe stdout stdin 4 3 1 The process then forks, resulting in two processes that share their stdin, stdout, and the pipe. 203 screen keyboard pipe parent child 1 1 4 3 3 4 This is because open files are inherited across a fork, as explained on page 143. Exercise 154 What happens if the two processes actually read from their shared stdin, and write to their shared stdout? Hint: recall the three tables used to access open files in Unix. To create the desires connection, the first parent process now closes its original stdout, i.e. its connection to the screen. Using the dup system call, it then duplicates the output side of the pipe from its original file descriptor 3 to the stdout file descriptor 1, and closes the original 3. It also closes the input side of the pipe 4. pipe 3 4 1 3 4 1 3 4 1 The second child process closes its original stdin 0, and replaces it by the input side of the pipe 4. It also closes the output side of the pipe 3. As a result the first process has the keyboard as stdin and the pipe as stdout, and the second has the pipe as stdin and the screen as stdout. This completes the desired connection. If either process now does an exec, the program will not know the difference. screen keyboard pipe parent child 1 1 Exercise 155 What is the consequence if the child process does not close its output side of the pipe file desceriptor 3? 204 In the original Unix system, pipes were implemented using the file system infras- tructure, and in particular, inodes. In modern systems they are implemented as a pair of sockets, which are obtained using the socketpair system call. Sockets support client-server computing A much more general mechanism for creating stream connections among unrelated processes, even on different systems, is provided by sockets. Among other things, sockets can support any of a wide variety of communication protocols. The most com- monly used are the internet protocols, TCPIP which provides a reliable stream of bytes, and UDPIP which provides unreliable datagrams. The way to set up a connection is asymmetric, and operates as follows. The first process, which is called the server, first creates a socket. This means that the operat- ing system allocates a data structure to keep all the information about this communi- cation channel, and gives the process a file descriptor to serve as a handle to it. The server then binds this socket to a port number. In effect, this gives the socket a name that can be used by clients: the machine’s IP internet address together with the port are used to identify this socket you can think of the IP address as a street address, and the port number as a door or suite number at that address. Common services have predefined port numbers that are well-known to all Section 12.4. For other distributed applications, the port number is typically selected by the programmer. To complete the setup, the server then listens to this socket. This notifies the system that communication requests are expected. The other process, called the client, also creates a socket. It then connects this socket to the server’s socket by giving the server’s IP address and port. This means that the servers address and port are listed in the local socket data structure, and that a message regarding the communication request is sent to the server. The system on the server side finds the server’s socket by searching according to the port number. To actually establish a connection, the server has to accept it. This creates a new socket that is accessed via a new file descriptor. This new socket on the server’s side and the client’s socket are now connected, and data can be read and written to them in both directions. The asymmetry of setting up the connection is forgotten, and both processes now have equal standing. However, the original socket created by the server still exists, and it may accept additional connections from other clients. This is motivated below in Section 12.3. Exercise 156 What can go wrong in this process? What happens if it does? To read more: See the man pages for socket, bind, connect, listen, and accept.

12.2.4 Shared Memory