Error Correction Implementation Issues

And there are some useful applications too Several useful applications have also become part of the TCPIP protocol suite. These include • SMTP Simple Mail Transfer Protocol, with features like mailing lists and mail forwarding. It just provides the transfer service to a local mail service that takes care of things like editing the messages. • FTP File Transfer Protocol, used to transfer files across machines. This uses one TCP connection for control messages such as user information and which files are desired, and another for the actual transfer of each file. • Telnet, which provides a remote login facility for simple terminals. To read more: The TCPIP suite is described in Stallings [7] section 13.2, and very briefly also in Silberschatz and Galvin [5] section 15.6. It is naturally described in more detail in textbooks on computer communications, such as Tanenbaum [9] and Stallings [6]. Finally, ex- tensive coverage is provided in books specific to these protocols, such as Comer [2] or Stevens [8].

13.2 Implementation Issues

So far we have discussed what has to be done for successful communication, and how to organize the different activities. But how do you actually perform these functions? In this section we describe three major issues: error correction, flow control, and routing.

13.2.1 Error Correction

The need for error correction is the result of the well-known maxim “shit happens”. It may happen that a packet sent from one computer to another will not arrive at all e.g. because a buffer overflowed or some computer en-route crashed or will be corrupted e.g. because and alpha-particle passed through the memory bank in which it was stored. We would like such occurrences not to affect our important communications, be they love letters or bank transfers. If we know an error occurred, we can request a re-send The simplest way to deal with transmission errors is to keep the data around until it is acknowledged. If the data arrives intact, the recipient sends an acknowledgment ack, and the sender can discard the copy. If the data arrives with an error, the recipient sends a negative acknowledgement nack, and the sender sends it again. 220 This can be repeated any number of times, until the data finally arrives safely at its destination. This scheme is called automatic repeat request ARQ. But how does the destination know if the data is valid? After all, it is just a sequence of 0’s and 1’s. The answer is that the data must be encoded in such a way that allows corrupted values to be identified. A simple scheme is to add a parity bit at the end. The parity bit is the binary sum of all the other bits. Thus if the message includes an odd number of 1’s, the parity bit will be 1, and if it includes an even number of 1’s, it will be 0. After adding the parity bit, it is guaranteed that the total number of 1’s is even. A receiver that receives a message with an odd number of 1’s can therefore be sure that it was corrupted. Exercise 166 What if two bits were corrupted? or three bits? In other words, when does parity identify a problem and when does it miss? Or we can send redundant data to begin with An alternative approach is to encode the data in such a way that we can not only detect that an error has occurred, but we can also correct the error. Therefore data does not have to be resent. This scheme is called forward error correction FEC. In order to be able to correct corrupted data, we need a higher level of redundancy. For example, we can add 4 parity bits to each sequence of 11 data bits. The parity bits are inserted in locations numbered by powers of two counting from 2 = 1. Each of these parity bits is then computed as the binary sum of the data bits whose position includes the parity bit position in its binary representation. For example, parity bit 4 will be the parity of data bits 5–7 and 12–15. 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 4 8 2 1 3 5 6 7 9 10 11 12 13 14 15 If any bit is corrupted, this will show up in a subset of the parity bits. The positions of the affected parity bits then provide the binary representation of the location of the corrupted data bit. 1010 4 8 2 1 wrong parity wrong parity bit wrong Note that this also works if one of the parity bits is the one that is corrupted. 221 Exercise 167 Another way to provide error correction is to arrange n 2 data bits in a square, and compute the parity of each row and column. A corrupted bit then causes two parity bits to be wrong, and their intersection identifies the corrupt bit. How does this compare with the above scheme? The main reason for using FEC is in situations where ARQ is unwieldy. For ex- ample, FEC is better suited for broadcasts and multicasts, because it avoids the need to collect acknowledgments from all the recipients in order to verify that the data has arrived safely to all of them. Sophisticated codes provide better coverage The examples above are simple, but can only handle one corrupted bit. For example, if two data bits are corrupted, this may cancel out in one parity calculation but not in another, leading to a pattern of wrong parity bits that misidentifies a corrupted data bit. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 errors wrong parity The most commonly used error detection code is the cyclic redundancy check CRC. This can be explained as follows. Consider the data bits that encode your message as a number. Now tack on a few more bits, so that if you divide the resulting number by a predefined divisor, there will be no remainder. The receiver does just that; if there is no remainder, it is assumed that the message is valid, otherwise it was corrupted. To read more: There are many books on coding theory and the properties of the resulting codes, e.g. Arazi [1]. CRC is described in Stallings [6]. Timeouts are needed to cope with lost data When using an error detection code without correction capabilities the sender must retain the sent message until it is acknowledged, because a resend may be needed. But what if the recipient does not receive it at all? In this case it will neither send an ack nor a nack, and the original sender may wait indefinitely. The solution to this problem is to use timeouts. The sender waits for an ack for a limited time, and if the ack does not arrive, it assumes the packet was lost and retransmits it. But if the packet was only delayed, two copies may ultimately arrive The transport protocol must deal with such situations by numbering the packets, and discarding duplicate ones. 222 Exercise 168 What is a good value for the timeout interval?

13.2.2 Buffering and Flow Control