The Role of Authentication

digest algorithm has an internal accumulator that operates on all data fed into the engine. As each byte of data is fed into the engine, it is combined with the data in the accumulator to produce a new value, which is stored in the accumulator to provide input see Figure 7−3. Figure 7−3. The message digest accumulator As a simple example, consider a message digest algorithm based on the exclusive−or of all the input bytes. The accumulator starts with a value of 0. If the string O Time, thou must untangle this is passed to the engine, the engine considers the bytes one at a time. [1] The first byte, O, has a value of 0x4f, which will xor with the accumulator to provide a value of 0x4f. The next byte, a space 0x20, will xor with the accumulator to produce a value of 0x6f. And so on, such that the final result of the accumulator is 0x67. [1] Dont be confused by the fact that were dealing in bytes here when the characters in a Java string are two bytes long. The data passed to the message digest engine is treated as arbitrary binary data −− it doesnt matter if the data was originally ASCII that is, byte−oriented data, a Java character string, or a binary class file. There are a few differences between this example and a real message digest algorithm. First, standard algorithms typically operate on 4− or 8−byte quantities, so the bytes that are fed into the engine are first grouped into int s or long s, with padding added if the input data is not a multiple of the desired quantity. Second, they produce a digest that is usually 64 or 128 bits long rather than a single byte; this final digest may be the value left in the accumulator or it may be the value left in the accumulator subjected to additional operations. The difference in the output size is one of the crucial differences. At best, the example we just walked through could produce 256 different digests. Any two given inputs have a 1 in 256 chance of producing the same digest, which is clearly not a sufficient guarantee that a digest represents a given set of data. In the example above, the string O Time, thou must untangle this produced a digest of 0x67 −− but so does the string g. An algorithm that produces a 64−bit digest, on the other hand, produces over 18 quintillion unique digests, so the odds that two data sequences will produce the same digest are very remote indeed. This brings us to another of the crucial differences −− a successful message digest algorithm must provide an assurance that it is computationally infeasible to find two messages that produce the same digest. This ensures that a new set of data cannot be substituted for the original data so that each produces the same digest. Note also that a message digest in itself is not a secure entity. A digest is often provided with the data it represents; the recipient of the data then recalculates the digest to make sure that the data was not originally tampered with. But nothing in this scenario prevents someone from modifying both the original data and the digest since both are transmitted and since the calculation of the digest is a well−known operation requiring no key. However, secure message digests can be produced when you introduce a key into the mix; these types of digests are called message authentication codes.

7.3.3 Digital Signatures

The primary engine in the security package at least as far as authentication goes is the digital signature engine. Like a real signature, a digital signature is presumed to provide a unique identification of an entity that is, an individual or an organization. Like a real signature, a digital signature can be forged, although its much harder to forge a digital signature than a real signature. [2] Forging a digital signature requires access to the private key of the entity whose signature is being forged; this is yet another reason why it is important to keep your private keys private. Like a real signature, a digital signature can be smudged so that it is no longer recognizable. And because theyre based on key certificates, digital signatures have other properties, such as the fact that they can expire. [2] On the other hand, a forged digital signature is undetectable, unlike a forged real signature. Digital signatures rely on two things: the ability to generate a message digest and the ability to encrypt that digest. The entire process is shown in Figure 7−4. Figure 7−4. Generating a digital signature The process is as follows: A message digest is calculated that represents the input data. 1. The digest is then encrypted with the private key. 2. Note that encryption is performed on the digest and not on the data itself. In order to present this signature to another entity, you must present the original data with it −− the signature is just a message digest, and, as we mentioned earlier, you cannot reconstruct the input data from the message digest. Verifying a digital signature requires the same path; the message digest of the original data must be calculated. The signed digest is decrypted with the public key and if the decrypted digest matches the calculated digest, the signature is valid. Strictly speaking, the operations performed on the digests are not necessarily encryption and decryption; most digital signature algorithms cannot be used for encryption of arbitrary data. The symmetry of the operation is the same. Nothing prevents the signed data from being intercepted. So the data that accompanies the digital signature cannot be sensitive data; the digital signature only verifies that the message came from a particular entity and that the message was not altered in transit, but it does not actually protect that message from being read by anyone with access to it. If the data is altered, it will not produce the same message digest, which in turn will not produce the same digital signature. And its computationally infeasible to change the data, generate a new digest of that data,