The FFT Algorithm
4.7.1 The FFT Algorithm
Consider the discrete Fourier transform (DFT)
of a function, whose values f (x k ) are known at the grid points x k = 2πk/N, k = 0 : N − 1. According to Theorem 4.6.4 the coefficients are given by
1 N −1
f (x k )e −ijx k , j
= 0 : N − 1. (4.7.1)
k =0
The evaluation of expressions of the form (4.7.1) occur also in discrete approximations to the Fourier transform.
Setting ω N =e −2πi/N this becomes
1 N −1
c jk
f (x k ), j = 0 : N − 1, (4.7.2)
k =0
where ω N is an Nth root of unity, (ω N ) N = 1. It seems from (4.7.2) that computing the discrete Fourier coefficients would require N 2 complex multiplications and additions. As we shall see, only about N log 2 N complex multiplications and additions are required using an algorithm called the fast Fourier transform (FFT). The modern usage of the FFT started in 1965 with the publication of [78] by James W. Cooley of IBM Research and John W. Tukey, Princeton University. 162 In many areas of application (digital signal processing, image processing, time-series analysis, to name
a few), the FFT has caused a complete change of attitude toward what can be done using discrete Fourier methods. Without the FFT many modern devices such as cell phones, digital cameras, CT scanners, and DVDs would not be possible. Some applications considered in astronomy require FFTs of several gigapoints.
162 Tukey came up with the basic algorithm at a meeting of President Kennedy’s Science Advisory Committee. One problem discussed at this meeting was that the ratification of a US–Soviet nuclear test ban depended on a fast
method to detect nuclear tests by analyzing seismological time-series data.
504 Chapter 4. Interpolation and Approximation In the following we will use the common convention not to scale the sum in (4.7.2)
by 1/N.
Definition 4.7.1.
The DFT of the vector f ∈ C N is
(4.7.3) where F n
y =F N f,
∈C N ×N is the DFT matrix with elements
(4.7.4) where ω N =e −2πi/N . 163 From the definition it follows that the DFT matrix F is a complex Vandermonde
(F jk
N ) jk =ω N , j, k = 0 : N − 1,
matrix. Since ω N =ω N ,F N is symmetric. By Theorem 4.6.4
jk kj
= I,
where F H N is the complex conjugate transpose of F N . Hence the matrix √ 1 N F N is a unitary matrix and the inverse transform can be written as
f = H F N y. N
We now describe the central idea of the FFT algorithm, which is based on the divide and conquer strategy (see Sec. 1.2.3). Assume that N = 2 p and set
k = 1 2k 1 0≤k 1 + 1 if k is odd, ≤ m − 1, where m = N/2 = 2 p −1 . Split the DFT sum into an even and an odd part:
2k
if k is even,
Let β be the quotient and j 1 the remainder when j is divided by m, i.e., j = βm+j 1 . Then, since ω N N = 1,
= (ω N ) 1 (ω 2 N ) 1 1 =ω m 1 1 . Thus if, for j 1 = 0 : m − 1, we set
f 2k 1 +1 1 ω m 1 , (4.7.5)
k 1 =0
k 1 =0
163 Some authors set ω N =e 2πi/N . Which convention is used does not much affect the development.
4.7. The Fast Fourier Transform 505 then y j
j =φ j 1 +ω N ψ j 1 . The two sums on the right are elements of the DFTs of length N/ 2 applied to the parts of f with odd and even subscripts. The entire DFT of length N is obtained by combining these two DFTs! Since ω m N = −1, we have
(4.7.6) y j
j 1 =φ j
1 +ω N ψ j 1 ,
1 −ω N ψ j 1 , j 1 = 0 : N/2 − 1. (4.7.7) These expressions, noted already by Danielson and Lanczos [90], are often called butterfly
j 1 +N/2 =φ j
relations because of the data flow pattern. Note that these can be performed in place, i.e., no extra vector storage is needed.
The computation of φ j 1 and ψ j 1 means that one does two Fourier transforms with m = N/2 terms instead of one with N terms. If N/2 is even the same idea can be applied to these two Fourier transforms. One then gets four Fourier transforms, each of which has N/
4 terms. If N = 2 p , this reduction can be continued recursively until we get N DFTs with one term. But F 1 = I , the identity. A recursive MATLAB implementation of the FFT
algorithm is given in Problem 4.7.2.
Example 4.7.1.
For n = 2 2 = 4, we have ω 4 =e −πi/2 = −i, and the DFT matrix is 1 1 1 1 1 1 1 1
F 4 −i) = 2 4 6
−i (−i) 3 1 −i −1
1 i −1 −i It is symmetric and its inverse is
The number of complex operations (one multiplication and one addition) required to
compute {y p } from the butterfly relations when {φ
1 } and {ψ 1 } have been computed is 2 , assuming that the powers of ω are precomputed and stored. Thus, if we denote by q p the total number of operations needed to compute the DFT when N = 2 p , we have
p ≤ 2q p −1 +2 , p ≥ 1.
Since q 0 = 0, it follows by induction that q p ≤p·2 p = N · log 2 N . Hence, when N is a power of
2, the FFT solves the problem with at most N · log 2 N operations . For example, when N = 2 20 = 1,048,576 the FFT algorithm is theoretically a factor
of 84,000 faster than the “conventional” O(N 2 ) algorithm. On a 3 GHz laptop, a real FFT of this size takes about 0.1 second using MATLAB 6, whereas more than two hours would
be required by the conventional algorithm! The FFT not only uses fewer operations to evaluate the DFT, it also is more accurate. Whereas when using the conventional method the roundoff error is proportional to N, for the FFT algorithm it is proportional to log 2 N .
506Chapter 4. Interpolation and Approximation
Example 4.7.2.
Let N = 2 4 = 16. Then the 16-point DFT (0:1:15) can be split into two 8-point DFTs (0:2:14) and (1:2:15), which can each be split in two 4-point DFTs. Repeating these
splittings we finally get 16 one-point DFTs which are the identity F 1 = 1. The structure of this FFT is illustrated below.
[0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15] In most implementations the explicit recursion is avoided. Instead the FFT algorithm
is implemented in two stages: • a reordering stage in which the data vector f is permuted; • a second stage in which first N/2 FFT transforms of length 2 are computed on adjacent
elements, followed by N/4 transforms of length 4, etc., until the final result is obtained by merging two FFTs of length N/2.
We now consider each stage in turn. Each step of the recursion involves an even–odd permutation. In the first step the points with last binary digit equal to 0 are ordered first and those with last digit equal to 1 are ordered last. In the next step the two resulting subsequences of length N/2 are reordered according to the second binary digit, etc. It is not difficult to see that the combined effect of the reordering in stage 1 is a bit-reversal permutation of the data points. For i = 0 : N −1, let the index i have the binary expansion
i t =b 0 +b 1 ·2+···+b
t −1 ·2 −1
and set r(i)
t −1 +···+b 1 ·2 −2 +b 0 ·2 −1 . That is, r(i) is the index obtained by reversing the order of the binary digits. If i < r(i),
=b t
then exchange f i and f r(i) . This reordering is illustrated for N = 16 below.
4.7. The Fast Fourier Transform 507 Decimal Binary
Decimal Binary
15 1111 We denote the permutation matrix performing the bit-reversal ordering by P N . Note
that if an index is reversed twice we end up with the original index. This means that
P −1
=P N =P N ,
i.e., P N is symmetric. The permutation can be carried out “in place” by a sequence of pairwise interchanges or transpositions of the data points. For example, for N = 16 the pairs
(1,8), (2,4), (3,12), (5,10), (7,14), and (11,13) are interchanged. The bit-reversal permutation can take a substantial fraction of the total time to do the FFT. Which implementation is best depends strongly on the computer architecture.
We now consider the second stage of the FFT. The key observation to develop a matrix-oriented description of this stage is to note that the Fourier matrices F N after an odd–even permutation of the columns can be expressed as a 2 × 2 block matrix, where each
block is either F N/ 2 or a diagonal scaling of F N/ 2 .
Theorem 4.7.2 (Van Loan [366, Theorem 1.2.1]). Let Z T N
be the permutation matrix which applied to a vector groups the even-indexed components first and the odd-indexed last. 164 If N = 2m, then
F m −< m F m
I m −< m
(4.7.9) Proof. The proof essentially follows from the derivation of the butterfly relations (4.7.6)–
m = diag (1, ω N ,...,ω N −1 ), ω N =e −2πi/N .
164 Note that Z T N =Z −1 N is the so-called perfect shuffle permutation, which in the permuted vector Z T N f is obtained by splitting f in half and then “shuffling” the top and bottom halves.
508 Chapter 4. Interpolation and Approximation
Example 4.7.3.
We illustrate Theorem 4.7.2 for N = 2 2 = 4. The DFT matrix F 4 is given in Example 4.7.1. After a permutation of the columns F 4 can be written as a 2 × 2 block-
, < 2 1 −1 = diag (1, −i).
When N = 2 p the FFT algorithm can be interpreted as a sparse factorization of the DFT matrix
F N =A k ···A 2 A 1 P N ,
(4.7.10) where P N is the bit-reversal permutation matrix and A 1 ,...,A k are block-diagonal matrices,
A q = diag (B L ,...,B
L ), L =2 , r = N/L. (4.7.11)
8 9: r
Here the matrix B k
∈C L ×L is the radix-2 butterfly matrix defined by
< L/ L/ 2 = diag (1, ω L ,...,ω 2−1 L ), ω L =e −2πi/L . (4.7.13) The FFT algorithm described above is usually referred to as the Cooley–Tukey FFT
algorithm. Using the fact that both the bit-reversal matrix P N and the DFT matrix F n are symmetric, we obtain by transposing (4.7.10) the factorization
(4.7.14) This gives rise to a “dual” FFT algorithm, referred to as the Gentleman–Sande algo-
N =F N =P N A 1 A 2 ···A k .
rithm [155]. In this the bit-reversal permutation comes after the other computations. In many important applications, such as convolution and the solution of discretized Poisson equations (see Sec. 1.1.4), this permits the design of in-place FFT solutions that avoid bit- reversal altogether; see Van Loan [366, Secs. 4.1, 4.5].
In the operation count for the FFT above we assumed that the weights ω j L ,j=1: L − 1, ω L =e −2πi/L , are precomputed. To do this one could use that
L = cos(jθ) − i sin(jθ), θ = 2π/L,
for L = 2 q , q = 2 : k. This is accurate, but expensive, since it involves L−1 trigonometric functions calls. An alternative is to compute ω = cos(θ) − i sin(θ) and use repeated
multiplication,
= ωω j −1 , j = 2 : L − 1.
This replaces one sine/cosine call with a single complex multiplication, but has the drawback that accumulation of roundoff errors will give an error in ω j
L of order ju.
4.7. The Fast Fourier Transform 509