THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

6.7 THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

From the discussion of number representations in the previous section, it should be clear that a general infinite-precision real number must be as- signed to one of the finite representable number, given a specific structure for the finite-length register (that is, the arithmetic as well as the format). Usually in practice, there are two different operations by which this as- signment is made to the nearest number or level: the truncation operation and the rounding operation. These operations affect the accuracy as well as general characteristics of digital filters and DSP operations.

We assume, without loss of generality, that there are B + 1 bits in the fixed-point (fractional) arithmetic or in the mantissa of floating-point

Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).

The Process of Quantization and Error Characterizations 267

arithmetic including the sign bit. Then the resolution (∆) is given by

− B absolute in the case of fixed-point arithmetic

(6.45) relative in the case of floating-point arithmetic

6.7.1 FIXED-POINT ARITHMETIC The quantizer block diagram in this case is given by

−→ Quantizer Q[·] −→

Q[x]

Infinite−precision

Finite−precision

where B, the number of fractional bits, and ∆, the resolution, are the pa- rameters of the quantizer. We will denote the finite word-length number, after quantization, by Q[x] for an input number x. Let the quantization error be given by

(6.46) We will analyze this error for both the truncation and the rounding

e △ = Q[x] − x

operations. Truncation operation In this operation, the number x is truncated

beyond B significant bits (that is, the rest of the bits are eliminated) to obtain Q T [x]. In MATLAB, to obtain a B-bit truncation, we have to first

scale the number x upward by 2 B , then use the fix function on the scaled number, and finally scale the result down by 2 − B . Thus, the MATLAB statement xhat = fix(x*2^B)/2^B; implements the desired operation. We will now consider each of the 3 formats.

Sign-magnitude format If the number x is positive, then after truncation Q T [x] ≤ x since some value in x is lost. Hence quantizer error for truncation denoted by e T is less than or equal to 0 or e T ≤ 0. However, since there are B bits in the quantizer, the maximum error in terms of magnitude is

B bits

(6.48) Similarly, if the x < 0 then after truncation Q T [x] ≥ x since Q T [x] is less

−2 − B ≤e T ≤ 0, for x ≥ 0

negative, or e T ≥ 0. The largest magnitude of this error is again 2 − B or

0≤e T ≤2 − B , for x < 0

Chapter 6

IMPLEMENTATION OF DISCRETE-TIME FILTERS

1 0.75 x xhat 0.5 0.25 xhat 0 − 0.25 − 0.5 − 0.75 − 1

FIGURE 6.25 Truncation error characteristics in the sign-magnitude format

! EXAMPLE 6.20 Let −1 < x < 1 and B = 2. Using MATLAB, verify the truncation error characteristics.

Solution The resolution is ∆ = 2 − 2 = 0.25. Using the following MATLAB script, we can verify the truncation error e T relations given in (6.48) and (6.49).

x = [-1+2^(-10):2^(-10):1-2^(-10)]; % Sign-Mag numbers between -1 and 1 B = 2;

% Number of bits for Truncation xhat = fix(x*2^B)/2^B

% Truncation

plot(x,x,’g’,x,xhat,’r’,’linewidth’,1); % Plot

The resulting plots of x and ˆ x are shown in Figure 6.25. Note that the plot of x has a staircase shape and that it satisfies (6.48) and (6.49). ˆ

! One’s-complement format For x ≥ 0, we have the same character-

istics for e T as in sign-magnitude format—that is,

(6.50) For x < 0, the representation is obtained by complementing all bits in-

−2 − B ≤e T ≤ 0, for x ≥ 0

cluding sign bit. To compute maximum error, let

x=1 " b 1 b 2 ···b B 000 · · · = − { " (1 − b 1 ) (1 − b 2 ) · · · (1 − b B ) 111 · · ·}

After truncation, we obtain

Q T [x] = 1 " b 1 b 2 ···b B =−{ " (1 − b 1 ) (1 − b 2 ) · · · (1 − b B )}

The Process of Quantization and Error Characterizations 269

1 0.75 x xhat 0.5 0.25

FIGURE 6.26 Truncation error characteristics in the one’s-complement format Clearly, x is more negative than Q T [x] or x ≤ Q T [x] or e T ≥ 0. In fact,

the maximum truncation error is

e Tmax =0 "

00 · · · 0111 · · · = 2 − B (decimal)

(6.51) ! EXAMPLE 6.21 Again let −1 < x < 1 and B = 2 with the resolution ∆ = 2 − 2 = 0.25. Using

MATLAB script, verify the truncation error e T relations given in (6.50) and (6.51).

Solution The MATLAB script uses functions sm2oc and oc2sm, which are explored in Problem P6.25.

x = [-1+2^(-10):2^(-10):1-2^(-10)]; % Sign-Magnitude numbers between -1 and 1 B = 2;

% Select bits for Truncation y = sm2oc(x,B);

% Sign-Mag to One’s Complement yhat = fix(y*2^B)/2^B;

% Truncation

xhat = oc2sm(yhat,B); % Ones’-Complement to Sign-Mag plot(x,x,’g’,x,xhat,’r’,’linewidth’,1); % Plot

The resulting plots of x and ˆ x are shown in Figure 6.26. Note that the plot of ˆ x is identical to the plot in Figure 6.25 and that it satisfies (6.50) and (6.51).

Chapter 6

IMPLEMENTATION OF DISCRETE-TIME FILTERS

Two’s-complement format Once again, for x ≥ 0, we have

(6.52) For x < 0, the representation is given by 2 − |x| where |x| is the magnitude.

−2 − B ≤e T ≤ 0, for x ≥ 0

Hence the magnitude of x is given by

(6.53) with x = 1 " b 1 b 2 ···b B b B+1 · · ·. After truncation to B bits, we obtain

|x| = 2 − x

Q T [x] = 1 " b 1 b 2 ···b B the magnitude of which is

|Q T [x]| = 2 − Q T [x]

From (6.53) and (6.54)

|Q T [x]| − |x| = x − Q T [x] = 1 " b 1 b 2 ···b B b B+1 ···−1 " b 1 b 2 ···b B

The largest change in magnitude from (6.55) is

(6.56) Since the change in the magnitude is positive, then after truncation Q T [x]

00 · · · 0111 · · · = 2 − B (decimal)

becomes more negative, which means that Q T [x] ≤ x. Hence

−2 − B ≤e

T ≤ 0, for x < 0

(6.57) ! EXAMPLE 6.22 Again consider −1 < x < 1 and B = 2 with the resolution ∆ = 2 − 2 = 0.25.

Using MATLAB, verify the truncation error e T relations given in (6.52) and (6.57).

Solution The MATLAB script uses functions sm2tc and tc2sm, which are explored in Problem P9.4.

x = [-1+2^(-10):2^(-10):1-2^(-10)]; % Sign-Magnitude numbers between -1 and 1 B = 2;

% Select bits for Truncation y = sm2tc(x);

% Sign-Mag to Two’s Complement yhat = fix(y*2^B)/2^B;

% Truncation

xq = tc2sm(yq ); % Two’s-Complement to Sign-Mag plot(x,x,’g’,x,xhat,’r’,’linewidth’,1); % Plot

The resulting plots of x and ˆ x are shown in Figure 6.27. Note that the plot of x is also a staircase graph but is below the x graph and that it satisfies (6.52) ˆ and (6.57).

! Collecting results (6.48)–(6.52), and (6.57) along with in Figures 6.25–

6.27, we conclude that the truncation characteristics for fixed-point arithmetic are the same for the sign-magnitude and the one’s-complement formats but are different for the two’s-complement format.

The Process of Quantization and Error Characterizations 271

1 0.75 x xhat 0.5 0.25

FIGURE 6.27 Truncation error characteristics in the two’s-complement format

Rounding operation In this operation, the real number x is rounded to the nearest representable level, which we will refer to as Q R [x]. In MATLAB, to obtain a B-bit rounding approximation, we have to first

scale the number x up by 2 B , then use the round function on the scaled number, and finally scale the result down by 2 − B . Thus the MAT- LAB statement xhat = round(x*2^B)/2^B; implements the desired operation.

Since the quantization step or resolution is ∆ = 2 − B , the magnitude

of the maximum error is

|e R | max = ∆ = 2 − B (6.58)

Hence for all three formats, the quantizer error due to rounding, denoted by e R , satisfies

− 2 − B ≤e 1 R ≤ 2 − B (6.59)

2 2 ! EXAMPLE 6.23 Demonstrate the rounding operations and the corresponding error characteris-

tics on the signal of Examples 6.20–6.22 using the three formats. Solution

Since the rounding operation assigns values that can be larger than the unquan- tized values, which can create problems for the two’s- and one’s-complement format, we will restrict the signal over the interval [−1, 1 − 2 − B−1 ]. The following MATLAB script shows the two’s-complement format rounding, but other scripts are similar (readers are encouraged to verify).

Chapter 6

IMPLEMENTATION OF DISCRETE-TIME FILTERS

(b) Sign-Magnitude Format

(b) Two-Complement Format 0.75 x

(b) Ones-Complement Format

xhat

x xhat

1 − 0.75 − 0.5 − 0.25 x 0 0.25 0.5 0.75 − 1 − 0.75 − 0.5 − 0.25 0 0.25 0.5 0.75 − 1 − 0.75 x − 0.5 − 0.25 x 0 0.25 0.5 0.75

FIGURE 6.28 Rounding error characteristics in the fixed-point representation

B = 2;

% Select bits for Rounding

x = [-1:2^(-10):1-2^(-B-1)]; % Sign-Magnitude numbers between -1 and 1 y = sm2tc(x);

% Sign-Mag to Two’s Complement yq = round(y*2^B)/2^B;

% Rounding

xq = tc2sm(yq); % Two’-Complement to Sign-Mag

The resulting plots for the sign-magnitude, ones-, and two’s-complement formats are shown in Figure 6.28. These plots do satisfy (6.59).

! Comparing the error characteristics of the truncation and rounding

operations given in Figures 6.25 through 6.28, it is clear that the rounding operation is a superior one for the quantization error. This is because the error is symmetric with respect to zero (or equal positive and negative distribution) and because the error is the same across all three formats. Hence we will mostly consider the rounding operation for the floating- point arithmetic as well as for further analysis.

6.7.2 FLOATING-POINT ARITHMETIC In this arithmetic, the quantizer affects only the mantissa M. However,

the number x is represented by M × 2 E where E is the exponent. Hence the quantizer errors are multiplicative and depend on the magnitude of x. Therefore, the more appropriate measure of error is the relative error rather than the absolute error, (Q[x] − x). Let us define the relative error, ε , as

△ Q[x] − x ε =

x Then the quantized value Q[x] can be written as

Q[x] = x + εx = x (1 + ε)

Quantization of Filter Coefficients 273

When Q[x] is due to the rounding operation, then the error in the mantissa is between [− 1 2 − 2 B , 1 2 2 − B ]. In this case we will denote the relative

error by ε R . Then from (6.43), the absolute error, Q R [x] − x = ε R x, is between

− − 2 B 2 E )1 −

≤ε R x≤

2 B 2 E (6.62)

Now for a given E, and since the mantissa is between 1 2 ≤ M < 1 (this is

not the IEEE-754 model), the number x is between

2 E−1 ≤x<2 E (6.63) Hence from (6.62) and using the smallest value in (6.63), we obtain −2 − B ≤ε R ≤2 − B (6.64) This relative error relation, (6.64), will be used in subsequent analysis.

THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

6.7 THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

Parts

Dokumen yang terkait

Aplikasi Digital Signal Processing pada Simulasi Telephone Billing System - Ubaya Repository

Digital Signal Processing Using MATLAB 3rd Edition Slicer copy

Categorical Data Analysis Using SAS 3rd Edition pdf pdf

Academic Press Digital Signal Processing System Design 2nd Edition Apr 2008 ISBN 0123744903 pdf

California Technical Publishing The Scientist And Engineers Guide To Digital Signal Processing 2nd Edition Jan 1997 ISBN 0966017676 pdf

Wiley Digital Signal Processing Techniques And Applications In Radar Image Processing Aug 2008 ISBN 0470180927 pdf

Wiley Digital Signal Processing Laboratory Experiments Using C And The TMS320C31 DSK Nov 1998 ISBN 0471293628 pdf

CRC Press Adaptive Digital Filters Signal Processing And Communications 2nd Edition Jul 2001 ISBN 0824705637 pdf

Using Samba, 3rd Edition Free ebook download

Wiley Advanced Digital Signal Processing And Noise Reduction 2nd Edition Sep 2000 ISBN 0471626929 pdf

Dukungan

Links

THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

6.7 THE PROCESS OF QUANTIZATION AND ERROR CHARACTERIZATIONS

Parts

Dokumen yang terkait

Aplikasi Digital Signal Processing pada Simulasi Telephone Billing System - Ubaya Repository

Digital Signal Processing Using MATLAB 3rd Edition Slicer copy

Categorical Data Analysis Using SAS 3rd Edition pdf pdf

Academic Press Digital Signal Processing System Design 2nd Edition Apr 2008 ISBN 0123744903 pdf

California Technical Publishing The Scientist And Engineers Guide To Digital Signal Processing 2nd Edition Jan 1997 ISBN 0966017676 pdf

Wiley Digital Signal Processing Techniques And Applications In Radar Image Processing Aug 2008 ISBN 0470180927 pdf

Wiley Digital Signal Processing Laboratory Experiments Using C And The TMS320C31 DSK Nov 1998 ISBN 0471293628 pdf

CRC Press Adaptive Digital Filters Signal Processing And Communications 2nd Edition Jul 2001 ISBN 0824705637 pdf

Using Samba, 3rd Edition Free ebook download

Wiley Advanced Digital Signal Processing And Noise Reduction 2nd Edition Sep 2000 ISBN 0471626929 pdf

Dokumen yang Anda mencari sudah siap untuk unduhkan