163
Hash functions by themselves arent often good for security purposes. The major exception is password storage. In such a situation, passwords are not stored, only hashes of passwords are
stored, usually combined with a known salt value to avoid dictionary attacks in cases where the password database is stolen. When a user tries to log in, the hash of the entered password is
compared against the one stored in the password database. If its the correct password, the hashes will be identical.
Even this scenario works only if a trusted data source collects the authentication information through a trusted data path. If a client computes the hash and sends it in the clear over a network,
an attacker can capture the hash and replay the information later to log in. Worse, if the server computes the hash, but the client sent the password in the clear over a network, an attacker could
capture the transmission of the password.
One common use of hashes is as primitives in other cryptographic operations. For example, digital signature schemes generally work by hashing the input, then encrypting the hash with a private
key. Doing so is generally far more efficient than performing public key encryption on a large input. Another frequent use is to remove any trace of patterns in data such as cryptographic keys.
For example, you should hash your key material to make an RC4 key, instead of using the key material directly.
Another use of hashes is to ensure the message integrity of encrypted data, by encrypting the hash of a message along with the message itself. This is a primitive version of a message authentication
code MAC. A MAC generally uses a regular hash function as a primitive. The MAC algorithm
produces a hash value from the data to protect a secret key. Only people with the correct secret key can forge the hash value, and only people with the secret key can authenticate the hash value.
One good thing about MACs is that they can provide integrity, even in the absence of encryption. Another good thing is that the best MACs tend to have provable security properties under
reasonable assumptions about the strength of the hash algorithm in use. The algorithm we just described as an example doesnt have either of these advantages.
Like other cryptographic primitives, you should avoid creating your own MAC algorithm, even if it seems easy. There are good algorithms with provable properties, such as HMAC, which is
currently the only MAC provided by OpenSSL. Why take the risk?
7.2 Hashing with the EVP API
Much like with symmetric cryptography, OpenSSLs cryptographic library has an API for each hash algorithm it provides, but the EVP API provides a single, simple interface to these algorithms.
Just as with symmetric key encryption, there are three calls, one for initialization, one for updating adding text to the context, and one for finalization, which yields the message digest.
At initialization time, you must specify the algorithm you wish to use. Currently, OpenSSL provides six different digest algorithms: MDC2, MD2, MD4, MD5, SHA1, and RIPEMD-160.
The first four have digest sizes that are only 128 bits. We recommend that you avoid them except to support legacy applications. In addition, there are known attacks on MD4, and it is widely
considered to be a broken algorithm. SHA1 is more common than RIPEMD-160 and is faster, but the latter is believed to have a slightly better security margin.
For each digest, at least one function returns an instance of the algorithm. Look up algorithms by name by calling
OpenSSL_add_all_digests
and
EVP_get_digestbyname
, and passing in an appropriate identifier. In both cases, a data structure of type
EVP_MD
represents the algorithm. Table 7-1
shows all of the message digest algorithms supported by OpenSSL, including the EVP
164
call to get a reference to the algorithm, the digest name for lookup purposes, and the size of the resulting digests.
Table 7-1. Message digests and the EVP interface Hash algorithm EVP call for getting EVP_MD
String for lookup Digest length in bits
MD2
EVP_md2 md2
128 MD4
EVP_md4 md4
128 MD5
EVP_md5 md5
128 MDC2
EVP_mdc2 mdc2
128 SHA1
EVP_sha1 EVP_dss1
sha1 dss1
160 RIPEMD-160
EVP_ripemd160 ripemd
160 The MDC2 algorithm is a construction for turning a block cipher into a hash function. It is usually
used only with DES, and OpenSSL hardcodes this binding. The SHA1 and DSS1 algorithms are essentially the same; the only difference is that in a digital signature, SHA1 is used with RSA keys
and DSS1 is used with DSA keys.
The
EVP_DigestInit
function initializes a context object, and it must be called before a hash can be computed.
void EVP_DigestInitEVP_MD_CTX ctx, const EVP_MD type;
ctx The context object to be initialized.
type The context for the message digest algorithm to use. This value is often obtained using
one of the EVP calls listed in Table 7-1
. The OpenSSL engine package and the forthcoming Version 0.9.7 have a preferred version of
this call named
EVP_DigestInit_ex
, which adds a third argument that is a pointer to an engine object. Passing in
NULL
will get you the default software implementation. Its return value is also different; it is an integer indicating success nonzero or failure zero. Be sure to check the return
value from the function, because it can fail. The
EVP_DigestUpdate
function is used to include data in the computation of the hash. It may be called repeatedly to pass more data than will fit in a single buffer. For example, if youre
computing the hash of a large amount of data, its reasonable to break the data into smaller bytes so that you neednt load an entire file into memory.
void EVP_DigestUpdateEVP_MD_CTX ctx, const void buf, unsigned int len;
ctx The context object that is being used to compute a hash.
buf A buffer containing the data to be included in the computation of the hash.
165
len The number of bytes contained in the buffer.
Once all data to be considered for the hash has been passed to
EVP_DigestUpdate
, the resulting hash value can be retrieved using
EVP_DigestFinal
.
void EVP_DigestFinalEVP_MD_CTX ctx, unsigned char hash, unsigned int len;
ctx The context object that is being used to compute a hash.
hash A buffer into which the hash value will be placed. This buffer should always be at least
EVP_MAX_MD_SIZE
bytes in size. len
A pointer to an integer that will receive the number of bytes placed into the hash value buffer. This argument may be specified as
NULL
if you dont want or need to know this value.
Be sure to use
EVP_DigestFinal_ex
with
EVP_DigestInit_ex
, even though the arguments are no different. Once youve called
EVP_DigestFinal
or
EVP_DigestFinal_ex
, the context that you were using is no longer valid and must be re-initialized using
EVP_DigestInit
or
EVP_DigestInit_ex
before it can be used again. Also, be aware that the
EVP_DigestFinal_ex
function can fail. Example 7-1
shows a function that performs message digests as an all-in-one operation. You pass in the name of an algorithm to use, a buffer of data to hash, an unsigned integer that denotes how
much data to take from the buffer, and a pointer to an integer. The integer pointed to by the final argument gets the length of the resulting digest placed in it, and may be
NULL
if youre not interested in its value. The digest value is allocated internal to the function and is returned as a
result. If there is any sort of error, such as the specified algorithm not being found, the function returns
NULL
.
Example 7-1. Computing a hash value using the EVP API
unsigned char simple_digestchar alg, char buf, unsigned int len, int olen
{ const EVP_MD m;
EVP_MD_CTX ctx; unsigned char ret;
OpenSSL_add_all_digests; if m = EVP_get_digestbynamealg
return NULL; if ret = unsigned char mallocEVP_MAX_MD_SIZE
return NULL; EVP_DigestInitctx, m;
EVP_DigestUpdatectx, buf, len; EVP_DigestFinalctx, ret, olen;
return ret;
166
}
Message digests cannot be printed directly because they are binary data. Traditionally, when theres a need to print a message digest, it is printed in hexadecimal.
Example 7-2 shows a
function that uses
printf
to print an arbitrary binary string in hexadecimal one byte at a time. It takes two parameters, the string, and an integer specifying the length of the string.
Example 7-2. Printing the hexadecimal representation of a hash value
void print_hexunsigned char bs, unsigned int n {
int i; for i = 0; i n; i++
printf02x, bs[i]; }
The code in Example 7-3
implements a simple
sha1
command that is similar to the
md5
command found on many systems. It gives SHA1 digests of files passed in on the command line. If the command is called with no arguments, then the standard input is hashed. Note that you can
get the same results by running the command
openssl sha1
see Chapter 2
.
Example 7-3. Computing SHA1 hashes of files
define READSIZE 1024 Returns 0 on error, file contents on success
unsigned char read_fileFILE f, int len {
unsigned char buf = NULL, last = NULL; unsigned char inbuf[READSIZE];
int tot, n; tot = 0;
for ;; {
n = freadinbuf, sizeofunsigned char, READSIZE, f; if n 0
{ last = buf;
buf = unsigned char malloctot + n; memcpybuf, last, tot;
memcpybuf[tot], inbuf, n; if last
freelast; tot += n;
if feoff 0 {
len = tot; return buf;
} }
else {
if buf freebuf;
break; }
}
167
return NULL; }
Returns NULL on error, the digest on success unsigned char process_fileFILE f, insigned int olen
{ int filelen;
unsigned char ret, contents = read_filef, filelen; if contents
return NULL; ret = simple_digestsha1, contents, filelen, olen;
freecontents; return ret;
} Return 0 on failure, 1 on success
int process_stdinvoid {
unsigned int olen; unsigned char digest = process_filestdin, olen;
if digest return 0;
print_hexdigest, olen; printf\n;
return 1; }
Returns 0 on failure, 1 on success int process_file_by_namechar fname
{ FILE f = fopenfname, rb;
unsigned int olen; unsigned char digest;
if f {
perrorfname; return 0;
} digest = process_filef, olen;
if digest {
perrorfname; fclosef;
return 0; }
fclosef; printfSHA1s= , fname;
print_hexdigest, olen; printf\n;
return 1; }
int mainint argc, char argv[] {
int i; if argc == 1
{
168
if process_stdin perrorstdin;
} else
{ for i = 1; i argc; i++
process_file_by_nameargv[i]; }
}
7.3 Using MACs