Audio compression

11.6.1 Audio compression

An audio file has very little byte-to-byte entropy, and compression schemes such as ZIP or Huffman will have little effect on the file size; however, if you open an audio file in a wave editor, such as Goldwave (www.gold- wave.com), you will notice a definite pattern when you look closely at the data in Figure 11.3.

The screenshot is of a recording of a girl’s voice. It contains only a frac- tion of a second (0.026 sec) of audio, but contains more than 2 Kb of data. To achieve CD-quality audio, a computer must output data at 44,100 (× 2) bytes per second.

Audio is made up of waves. Each sample in a wave is usually very similar to the preceding sample. The rate of change constantly increases and

11.6 Lossy compression 297

Figure 11.3

Typical speech waveform.

decreases in harmonic fashion. Therefore, instead of recoding the value of each sample, if the change between samples is recorded, then the amount of data is reduced.

In delta pulse code modulation (DPCM), an increase in sample value is represented by the bit 1 and a decrease is represented by the bit 0. During decompression, the sample value is incremented or decremented by 1, depending on the value of the current bit in the bitstream. This causes two detrimental effects: slope overload and granular noise. Slope overload is where the input signal changes substantially from sample to sample, result- ing in a muffling effect in the decompressed signal. Granular noise is where the input signal does not change at all, in which case the output sound oscillates around the true value, which causes either a hiss or a high-pitched shrill in the audio.

To counteract the muffling effect, adaptive DPCM, or ADPCM, can be used. This is where, during the decompression process, a number that dou- bles with each contiguous sample increases the sample value. This process more closely mimics the harmonic action of the sine wave, but can produce

a phase undershoot, which is a rasping, sharp noise. You may never have heard of ADPCM (although it is used heavily in

telecommunications and especially on international telephone lines), but Mp3 has become almost a household name. There is a good reason for this, in that Mp3 provides excellent compression ratios and acceptable sound quality and can be decompressed in real time by any PC and many portable digital music devices. Mp3 achieves this quality by recognizing how humans perceive sounds at an acoustic level. Our ears are designed to hear harmonic sounds, and standard lossy compression algorithms cause unnatural attenua- tions that are not pleasant to listen to. By filtering at the harmonic level, rather than at the byte level, a much more natural sound is produced.

Chapter 11

298 11.6 Lossy compression

Recognizing a pattern of cyclic values in a stream of data, which may be combined with thousands of other cyclic patterns, is not an easy task for a computer; however, a rather gifted mathematician developed a formula to produce a mathematical representation of the harmonics contained in a block of data.

In Figure 11.3, a pattern of waves can be seen in the audio; these are made primarily from a 300 Hz with a 2400 Hz harmonic. To extract this information from what appears to the computer as a block of ones and zeros, you need to use a mathematical formula known as the Discrete Cosine Transform (DCT):

= ------- 8 Cu () u ⋅ ∑ fx () cos ---------------------------

x + 1 )πu

Cu is equal to 0.7071 (the reciprocal of root 2); when u is zero, Cu is one, for all u not equal to zero.

When the above formula is applied to an array of eight numbers (i.e., f(1) to f(7)), the resultant array in S is a representation of the data in terms of frequencies. It is possible to represent any sequence of eight integers in terms of the values of the peaks and troughs of a wave composed of up to eight harmonics. When compressing audio data, most of the higher har- monics are zero or near zero and can be canceled out; thus the array in S can be compressed using traditional lossless encoding more efficiently.

The most famous audio compression format that uses DCT is the ubiq- uitous Mp3. This technology is not an easy implementation, and its exact format is a closely guarded secret. You can use third-party DLLs and appli- cations such as Lame, BladeEnc, and L3enc to perform the compression. Alternately, you can license the technology from Fraunhoffer.