Overview of Hashes and MACs

162

Chapter 7. Hashes and MACs

In the previous chapter, we looked at the most fundamental part of OpenSSLs cryptography library, symmetric ciphers. In this chapter, we look at the API for cryptographic hashing algorithms, also commonly called message digest algorithms or cryptographic one-way hash functions. Additionally, we will examine OpenSSLs interface to message authentication codes MACs, also known as keyed hashes.

7.1 Overview of Hashes and MACs

We introduced the basic concepts behind cryptographic hashes and MACs in Chapter 1 . Here, we describe the fundamental properties of these cryptographic primitives that you should understand before integrating them into your applications. As mentioned in Chapter 6 , we provide only the minimum background information that you need to understand as a developer. If you need more background, or would like to see under the hood of any of the algorithms we discuss, refer to a general-purpose cryptography reference, such as Bruce Schneiers Applied Cryptography. Cryptographic one-way hashes take arbitrary binary data as an input and produce a fixed-size binary string as an output, called the hash value or the message digest. Passing the same message through a single hash function always yields the same result. There are several important properties exhibited by cryptographic message digests. First, the digest value should contain no information that could be used to determine the original input. For that to be true, a one-bit change in the input data should change many bits in the digest value on average, half. Second, it should be extremely difficult to construct a second message that yields the same resulting hash value. Third, it should also be difficult to find any two messages that yield the same hash value. The most conservative characterization of the security afforded by a given hash function is measured by how hard it is to find two arbitrary messages that yield the same hash value. Generally, the security of a well-respected hash function that has a digest size of n bits should be about as secure as a well-respected symmetric cipher with half as many bits. For example, SHA1, which has a 160-bit digest size, is about as resistant to attack as RC5 with 80-bit keys. Some uses of these algorithms give security equal to their bit length thats just a good, conservative metric. People frequently use cryptographic hash functions that they believe provide security, but that dont really provide very much. For example, it is common to release software along with an MD5 digest of the software package MD5 is a common cryptographic hash function. The intention is to use the digest as a checksum. The person downloading software should also obtain the MD5 digest, and then calculate the digest himself on the downloaded software. If the two digests match, it would indicate that the downloaded software is unaltered. Unfortunately, there are easy ways to attack this scheme. Suppose an attacker has maliciously modified a copy of the distribution of software package X, resulting in package Y. If the attacker can break onto the server and replace X with Y, then certainly, the checksum MD5X is also easily replaceable with the checksum MD5Y. When a user validates the downloaded checksum, he will be none the wiser. Even without access to the actual server, attackers could replace X with Y and MD5X with MD5Y as they traverse the network. The fundamental problem is that nothing in this scenario is secret. A much better solution for this kind of scenario is a digital signature, which anyone can verify, but only someone with the correct private key can generate see Chapter 8 . 163 Hash functions by themselves arent often good for security purposes. The major exception is password storage. In such a situation, passwords are not stored, only hashes of passwords are stored, usually combined with a known salt value to avoid dictionary attacks in cases where the password database is stolen. When a user tries to log in, the hash of the entered password is compared against the one stored in the password database. If its the correct password, the hashes will be identical. Even this scenario works only if a trusted data source collects the authentication information through a trusted data path. If a client computes the hash and sends it in the clear over a network, an attacker can capture the hash and replay the information later to log in. Worse, if the server computes the hash, but the client sent the password in the clear over a network, an attacker could capture the transmission of the password. One common use of hashes is as primitives in other cryptographic operations. For example, digital signature schemes generally work by hashing the input, then encrypting the hash with a private key. Doing so is generally far more efficient than performing public key encryption on a large input. Another frequent use is to remove any trace of patterns in data such as cryptographic keys. For example, you should hash your key material to make an RC4 key, instead of using the key material directly. Another use of hashes is to ensure the message integrity of encrypted data, by encrypting the hash of a message along with the message itself. This is a primitive version of a message authentication code MAC. A MAC generally uses a regular hash function as a primitive. The MAC algorithm produces a hash value from the data to protect a secret key. Only people with the correct secret key can forge the hash value, and only people with the secret key can authenticate the hash value. One good thing about MACs is that they can provide integrity, even in the absence of encryption. Another good thing is that the best MACs tend to have provable security properties under reasonable assumptions about the strength of the hash algorithm in use. The algorithm we just described as an example doesnt have either of these advantages. Like other cryptographic primitives, you should avoid creating your own MAC algorithm, even if it seems easy. There are good algorithms with provable properties, such as HMAC, which is currently the only MAC provided by OpenSSL. Why take the risk?

7.2 Hashing with the EVP API