Hashing with the EVP API

163 Hash functions by themselves arent often good for security purposes. The major exception is password storage. In such a situation, passwords are not stored, only hashes of passwords are stored, usually combined with a known salt value to avoid dictionary attacks in cases where the password database is stolen. When a user tries to log in, the hash of the entered password is compared against the one stored in the password database. If its the correct password, the hashes will be identical. Even this scenario works only if a trusted data source collects the authentication information through a trusted data path. If a client computes the hash and sends it in the clear over a network, an attacker can capture the hash and replay the information later to log in. Worse, if the server computes the hash, but the client sent the password in the clear over a network, an attacker could capture the transmission of the password. One common use of hashes is as primitives in other cryptographic operations. For example, digital signature schemes generally work by hashing the input, then encrypting the hash with a private key. Doing so is generally far more efficient than performing public key encryption on a large input. Another frequent use is to remove any trace of patterns in data such as cryptographic keys. For example, you should hash your key material to make an RC4 key, instead of using the key material directly. Another use of hashes is to ensure the message integrity of encrypted data, by encrypting the hash of a message along with the message itself. This is a primitive version of a message authentication code MAC. A MAC generally uses a regular hash function as a primitive. The MAC algorithm produces a hash value from the data to protect a secret key. Only people with the correct secret key can forge the hash value, and only people with the secret key can authenticate the hash value. One good thing about MACs is that they can provide integrity, even in the absence of encryption. Another good thing is that the best MACs tend to have provable security properties under reasonable assumptions about the strength of the hash algorithm in use. The algorithm we just described as an example doesnt have either of these advantages. Like other cryptographic primitives, you should avoid creating your own MAC algorithm, even if it seems easy. There are good algorithms with provable properties, such as HMAC, which is currently the only MAC provided by OpenSSL. Why take the risk?

7.2 Hashing with the EVP API

Much like with symmetric cryptography, OpenSSLs cryptographic library has an API for each hash algorithm it provides, but the EVP API provides a single, simple interface to these algorithms. Just as with symmetric key encryption, there are three calls, one for initialization, one for updating adding text to the context, and one for finalization, which yields the message digest. At initialization time, you must specify the algorithm you wish to use. Currently, OpenSSL provides six different digest algorithms: MDC2, MD2, MD4, MD5, SHA1, and RIPEMD-160. The first four have digest sizes that are only 128 bits. We recommend that you avoid them except to support legacy applications. In addition, there are known attacks on MD4, and it is widely considered to be a broken algorithm. SHA1 is more common than RIPEMD-160 and is faster, but the latter is believed to have a slightly better security margin. For each digest, at least one function returns an instance of the algorithm. Look up algorithms by name by calling OpenSSL_add_all_digests and EVP_get_digestbyname , and passing in an appropriate identifier. In both cases, a data structure of type EVP_MD represents the algorithm. Table 7-1 shows all of the message digest algorithms supported by OpenSSL, including the EVP 164 call to get a reference to the algorithm, the digest name for lookup purposes, and the size of the resulting digests. Table 7-1. Message digests and the EVP interface Hash algorithm EVP call for getting EVP_MD String for lookup Digest length in bits MD2 EVP_md2 md2 128 MD4 EVP_md4 md4 128 MD5 EVP_md5 md5 128 MDC2 EVP_mdc2 mdc2 128 SHA1 EVP_sha1 EVP_dss1 sha1 dss1 160 RIPEMD-160 EVP_ripemd160 ripemd 160 The MDC2 algorithm is a construction for turning a block cipher into a hash function. It is usually used only with DES, and OpenSSL hardcodes this binding. The SHA1 and DSS1 algorithms are essentially the same; the only difference is that in a digital signature, SHA1 is used with RSA keys and DSS1 is used with DSA keys. The EVP_DigestInit function initializes a context object, and it must be called before a hash can be computed. void EVP_DigestInitEVP_MD_CTX ctx, const EVP_MD type; ctx The context object to be initialized. type The context for the message digest algorithm to use. This value is often obtained using one of the EVP calls listed in Table 7-1 . The OpenSSL engine package and the forthcoming Version 0.9.7 have a preferred version of this call named EVP_DigestInit_ex , which adds a third argument that is a pointer to an engine object. Passing in NULL will get you the default software implementation. Its return value is also different; it is an integer indicating success nonzero or failure zero. Be sure to check the return value from the function, because it can fail. The EVP_DigestUpdate function is used to include data in the computation of the hash. It may be called repeatedly to pass more data than will fit in a single buffer. For example, if youre computing the hash of a large amount of data, its reasonable to break the data into smaller bytes so that you neednt load an entire file into memory. void EVP_DigestUpdateEVP_MD_CTX ctx, const void buf, unsigned int len; ctx The context object that is being used to compute a hash. buf A buffer containing the data to be included in the computation of the hash. 165 len The number of bytes contained in the buffer. Once all data to be considered for the hash has been passed to EVP_DigestUpdate , the resulting hash value can be retrieved using EVP_DigestFinal . void EVP_DigestFinalEVP_MD_CTX ctx, unsigned char hash, unsigned int len; ctx The context object that is being used to compute a hash. hash A buffer into which the hash value will be placed. This buffer should always be at least EVP_MAX_MD_SIZE bytes in size. len A pointer to an integer that will receive the number of bytes placed into the hash value buffer. This argument may be specified as NULL if you dont want or need to know this value. Be sure to use EVP_DigestFinal_ex with EVP_DigestInit_ex , even though the arguments are no different. Once youve called EVP_DigestFinal or EVP_DigestFinal_ex , the context that you were using is no longer valid and must be re-initialized using EVP_DigestInit or EVP_DigestInit_ex before it can be used again. Also, be aware that the EVP_DigestFinal_ex function can fail. Example 7-1 shows a function that performs message digests as an all-in-one operation. You pass in the name of an algorithm to use, a buffer of data to hash, an unsigned integer that denotes how much data to take from the buffer, and a pointer to an integer. The integer pointed to by the final argument gets the length of the resulting digest placed in it, and may be NULL if youre not interested in its value. The digest value is allocated internal to the function and is returned as a result. If there is any sort of error, such as the specified algorithm not being found, the function returns NULL . Example 7-1. Computing a hash value using the EVP API unsigned char simple_digestchar alg, char buf, unsigned int len, int olen { const EVP_MD m; EVP_MD_CTX ctx; unsigned char ret; OpenSSL_add_all_digests; if m = EVP_get_digestbynamealg return NULL; if ret = unsigned char mallocEVP_MAX_MD_SIZE return NULL; EVP_DigestInitctx, m; EVP_DigestUpdatectx, buf, len; EVP_DigestFinalctx, ret, olen; return ret; 166 } Message digests cannot be printed directly because they are binary data. Traditionally, when theres a need to print a message digest, it is printed in hexadecimal. Example 7-2 shows a function that uses printf to print an arbitrary binary string in hexadecimal one byte at a time. It takes two parameters, the string, and an integer specifying the length of the string. Example 7-2. Printing the hexadecimal representation of a hash value void print_hexunsigned char bs, unsigned int n { int i; for i = 0; i n; i++ printf02x, bs[i]; } The code in Example 7-3 implements a simple sha1 command that is similar to the md5 command found on many systems. It gives SHA1 digests of files passed in on the command line. If the command is called with no arguments, then the standard input is hashed. Note that you can get the same results by running the command openssl sha1 see Chapter 2 . Example 7-3. Computing SHA1 hashes of files define READSIZE 1024 Returns 0 on error, file contents on success unsigned char read_fileFILE f, int len { unsigned char buf = NULL, last = NULL; unsigned char inbuf[READSIZE]; int tot, n; tot = 0; for ;; { n = freadinbuf, sizeofunsigned char, READSIZE, f; if n 0 { last = buf; buf = unsigned char malloctot + n; memcpybuf, last, tot; memcpybuf[tot], inbuf, n; if last freelast; tot += n; if feoff 0 { len = tot; return buf; } } else { if buf freebuf; break; } } 167 return NULL; } Returns NULL on error, the digest on success unsigned char process_fileFILE f, insigned int olen { int filelen; unsigned char ret, contents = read_filef, filelen; if contents return NULL; ret = simple_digestsha1, contents, filelen, olen; freecontents; return ret; } Return 0 on failure, 1 on success int process_stdinvoid { unsigned int olen; unsigned char digest = process_filestdin, olen; if digest return 0; print_hexdigest, olen; printf\n; return 1; } Returns 0 on failure, 1 on success int process_file_by_namechar fname { FILE f = fopenfname, rb; unsigned int olen; unsigned char digest; if f { perrorfname; return 0; } digest = process_filef, olen; if digest { perrorfname; fclosef; return 0; } fclosef; printfSHA1s= , fname; print_hexdigest, olen; printf\n; return 1; } int mainint argc, char argv[] { int i; if argc == 1 { 168 if process_stdin perrorstdin; } else { for i = 1; i argc; i++ process_file_by_nameargv[i]; } }

7.3 Using MACs