GOODNESS-OF-FIT TESTS FOR DISTRIBUTIONS

7.5 GOODNESS-OF-FIT TESTS FOR DISTRIBUTIONS

The goodness-of-fit of a distribution to a sample is assessed by a statistical test (see Section 3.11), where the null hypothesis states that the candidate distribution is a suffi- ciently good fit to the data, while the alternate hypothesis states that it is not. Such statistical procedures serve in an advisory capacity: They need not be used as definitive decision rules, but merely provide guidance and suggestive evidence. In many cases, there is no clear-cut best-fit distribution. Nevertheless, goodness-of-fit tests are useful in providing a quanti- tative measure of goodness-of-fit (see Banks et al. [1999] and Law and Kelton [2000]).

The chi-square test and the Kolmogorov–Smirnov test are the most widely used tests for goodness-of-fit of a distribution to sample data. These tests are used by the Arena

Input Analyzer to compute the corresponding test statistic and the associated p-value (see Section 3.11), and will be reviewed next.

7.5.1 C HI -S QUARE T EST

The chi-square test compares the empirical histogram density constructed from sample data to a candidate theoretical density. Formally, assume that the empirical

Input Analysis 135 sample fx 1 ,...,x N g is a set of N iid realizations from an underlying random

variable, X. This sample is then used to construct an empirical histogram with J cells, where cell j corresponds to the interval ½l j ,r j ). Thus, if N j is the number of observations in cell j (for statistical reliability, it is commonly suggested that N j > 5), then

¼ 1, . . . , J

is the relative frequency of observations in cell j. Letting F X (x) be some theoretical candidate distribution whose goodness-of-fit is to be assessed, one computes the theoretical probabilities

p j ¼ Prfl j

j g, j ¼ 1, . . . , J:

Note that for continuous data, we have

p j ¼F X (r j ) F X (l j ) ¼

f X (x) dx,

where f X (x) is the pdf of X, while for discrete data, we have

p j ¼F X (r j ) F X (l j ) ¼

p X (x),

x ¼l j

where p X (x) is the pmf of X. The chi-square test statistic is then given by

Note that the quantity Np j is the (theoretical) expected number of observations in cell j predicted by the candidate distribution F X (x), while N j is the actual (empirical) number of observations in that cell. Consequently, the j-th term on the right side of (7.1) measures a relative deviation of the empirical number of observations in cell j from the theoretical number of observations in cell j. Intuitively, the smaller is the value of the chi-square statistic, the better the fit. Formally, the chi-square statistic is compared against

a critical value c (see Section 3.11), depending on the significance level, a, of the test. If w 2 < c, then we accept the null hypothesis (the distribution is an acceptably good fit); otherwise, we reject it (the distribution is an unacceptably poor fit). The chi-square critical values are readily available from tables in statistics books. These values are organized by degrees of freedom, d, and significance level, a. The degrees of freedom parameter is given by d ¼ J – E – 1, where E is the distribution-dependent number of parameters estimated from the sample data. For instance, the gamma distribution Gamm(a, b) requires E ¼ 2 parameters to be estimated from the sample, while the exponential distribution Expo(l) only requires E ¼ 1 parameter. An examination of chi-square tables reveals that for a given number of degrees of freedom, the critical value c increases as the significance level, a, decreases. Thus, one can trade off test significance (equivalently, confidence) for test stringency.

As an example, consider the sample data of size N ¼ 100, given in Table 7.1, for which a histogram with J ¼ 10 cells was constructed by the Input Analyzer, as shown in

136 Input Analysis

Table 7.3 Empirical and theoretical statistics for the empirical histogram

Relative Theoretical Cell number

Number of

Frequency Probability j

Cell Interval

Observations

p ^ j p j 1 [10,12)

Figure 7.1. The best-fit uniform distribution, found by the Input Analyzer, is depicted in Figure 7.2.

Table 7.3 displays the associated elements of the chi-square test. The table consists of the histogram's cell intervals, number of observations in each cell, and the corresponding empirical relative frequency for each cell. An examination of Table 7.3 reveals that the histogram ranges from a minimal value of 10 to a maximal value of 30, with the individual cell intervals being (10, 12), (12, 14), (14, 16), and so on. Note that the fitted uniform distribution has p

j ¼ 0:10 for each cell j. Thus, the w test statistic is calculated as

2 (13 10) 2 (8 10) w 2

A chi-square table shows that for significance level a ¼ 0:10 and d ¼ 10 – 2 – 1 ¼ 7 degrees of freedom, the critical value is c ¼ 12.0; recall that the uniform distribution Unif (a, b) has two parameters, estimated from the sample by ^ a ¼ minfx i and ^ b ¼ maxfx i

a and b are known, then d ¼ 10 – 1 ¼ 9. In fact, the Arena Input Analyzer computes d in this manner.) Since the test statistic computed above is w 2 ¼ 3:6 < 12:0, we accept the null hypothesis that the uniform distribution Unif(10, 30) is an acceptably good fit to the sample data of Table 7.1.

It is instructive to follow the best-fit actions taken by the Input Analyzer. First, it calculates the square error

p j ¼ (0:13 0 :10)

j ¼1

Next, although the histogram was declared to have 10 cells, the Input Analyzer employed a 2 test statistic with only 6 cells (to increase the number of observations in selected cells). It then proceeded to calculate it as w 2 ¼ 2:1, with a corresponding p- value of p > 0.75, clearly indicating that the null hypothesis cannot be rejected even at significance levels as large as 0.75. Thus, we are assured to accept the null hypothesis of a good uniform fit at a comfortably high confidence.

Input Analysis 137

7.5.2 K OLMOGOROV -S MIRNOV (K-S) T EST

While the chi-square test compares the empirical (observed) histogram pdf or pmf to a candidate (theoretical) counterpart, the Kolmogorov-Smirnov (K-S) test compares the empirical cdf to a theoretical counterpart. Consequently, the chi-square test requires

a considerable amount of data (to set up a reasonably “smooth” histogram), while the K-S test can get away with smaller samples, since it does not require a histogram.

The K-S test procedure first sorts the sample {x 1 ,x 2 ,...,x N } in ascending order x (1) ,x (2) ,...,x (N ) , and then constructs the empirical cdf, ^ F X (x), given by

F (j) ^ (x) fj:x

max

Thus, ^ F X (x) is just the relative frequency of sample observations not exceeding x. Since

a theoretical fit distribution F X (x) is specified, a reasonable measure of goodness-of-fit is the largest absolute discrepancy between ^ F X (x) and F X (x). The K-S test statistic is thus defined by

(7 :2) The smaller is the observed value of the K-S statistic, the better the fit.

KS ¼ max x fj^ F X (x) F X (x) jg:

Critical values for K-S tests depend on the candidate theoretical distribution. Tables of critical values computed for various distributions are scattered in the literature, but are omitted from this book. The Arena Input Analyzer has built-in tables of critical values for both the chi-square and K-S tests for all the distributions supported by it (see Table 7.2). Refer to Figures 7.2 through 7.4 for a variety of examples of calculated test statistics.