The Chi-Square Goodness of Fit Test

5.1.3 The Chi-Square Goodness of Fit Test

The previous binomial test applied to a dichotomised population. When there are more than two categories, one often wishes to assess whether the observed frequencies of occurrence in each category are in accordance to what should be expected. Let us start with the random variable 5.4 and square it:

where X 1 and X 2 are the random variables associated with the number of “successes” and “failures” in the n-sized sample, respectively. In the above

derivation note that denoting Q = 1 − P we have (nP – np) 2 = (nQ – nq) 2 . Formula

5.5 conveniently expresses the fitting of X 1 = nP and X 2 = nQ to the theoretical

values in terms of square deviations. Square deviation is a popular distance measure given its many useful properties, and will be extensively used in Chapter 7.

Let us now consider k categories of events, each one represented by a random variable X i , and, furthermore, let us denote by p i the probability of occurrence of each category. Note that the joint distribution of the X i is a multinomial distribution, described in B.1.6. The result 5.5 is generalised for this multinomial distribution, as follows (see property 5 of B.2.7):

X i − np i ) χ 2 ∑ ~ χ

i = 1 np

where the number of degrees of freedom, df = k – 1, is imposed by the restriction:

x i = n . 5.7

i =1

As a matter of fact, the chi-square law is only an approximation for the sampling distribution of ∗2 χ , given the dependency expressed by 5.7. In order to test the goodness of fit of the observed counts O i to the expected counts E i , that is, to test whether or not the following null hypothesis is rejected:

H 0 : The population has absolute frequencies E i for each of the i =1, .., k categories,

we then use test the statistic:

180 5 Non-Parametric Tests of Hypotheses

which, according to formula 5.6, has approximately a chi-square distribution with

df = k – 1 degrees of freedom. The approximation is considered acceptable if the following conditions are met:

i. For df = 1, no E i must be smaller than 5;

ii. For df > 1, no E i must be smaller than 1 and no more than 20% of the E i must be smaller than 5.

Expected absolute frequencies can sometimes be increased, in order to meet the above conditions, by merging adjacent categories. When the difference between observed (O i ) and expected counts (E i ) is large, the value of χ *2 will also be large and the respective tail probability small. For a

0.95 confidence level, the critical region is above χ 2 k − 1 , 0 . 95 .

Example 5.5

Q: A die was thrown 40 times with the observed number of occurrences 8, 6, 3, 10,

7, 6, respectively for the face value running from 1 through 6. Does this sample provide evidence that the die is not honest?

A: Table 5.5 shows the chi-square test results obtained with SPSS. Based on the high value of the observed significance, we do not reject the null hypothesis that the die is honest. Applying the R function chisq.test(c(8,6,3,10,7,6)) one obtains the same results as in Table 5.5b. This function can have a second argument with a vector of expected probabilities, which when omitted, as we did, assigns equal probability to all categories.

Table 5.5. Dataset (a) and results (b), obtained with SPSS, of the chi-square test for the die-throwing experiment (Example 5.5). The residual column represents the differences between observed and expected frequencies.

FACE Observed N Expected N Residual FACE

Asymp. Sig.

Example 5.6

Q: It is a common belief that the best academic freshmen students usually participate in freshmen initiation rites only because they feel compelled to do so.

5.1 Inference on One Population

Does the Freshmen dataset confirm that belief for the Porto Engineering College?

A: We use the categories of answers obtained for Question 6, “I felt compelled to participate in the Initiation”, of the freshmen dataset (see Appendix E). The respective EXCEL file contains the computations of the frequencies of occurrence of each category and for each question, assuming a specified threshold for the average results in the examinations. Using, for instance, the threshold = 10, we see that there are 102 “best” students, with average examination score not less than the threshold. From these 102, there are varied counts for the five categories of Question 6, ranging from 16 students that “fully disagree” to 5 students that “fully agree”.

Under the null hypothesis, the answers to Question 6 have no relation with the freshmen performance and we would expect equal frequencies for all categories. The chi-square test results obtained with SPSS are shown in Table 5.6. Based on these results, we reject the null hypothesis: there is evidence that the answer to Question 6 of the freshmen enquiry bears some relation with the student performance.

Table 5.6. Dataset (a) and results (b), obtained with SPSS, for Question 6 of the freshmen enquiry and 102 students with average score ≥10.

CAT Observed N Expected N Residual CAT

−15.4 Asymp. Sig.

Example 5.7

Q: Consider the variable ART representing the total area of defects of the Cork Stoppers’ dataset, for the class 1 (Super) of corks. Does the sample data provide evidence that this variable can be accepted as being normally distributed in that class?

A: This example illustrates the application of the chi-square test for assessing the goodness of fit to a known distribution. In this case, the chi-square test uses the deviations of the observed absolute frequencies vs. the expected absolute frequencies under the condition of the stated null hypothesis, i.e., that the variable ART is normally distributed.

In order to compute the absolute frequencies, we have to establish a set of intervals based on the percentiles of the normal distribution. Since the number of cases is n = 50, and we want the conditions for using the chi-square distribution to

be fulfilled, we use intervals corresponding to 20% of the cases. Table 5.7 shows

182 5 Non-Parametric Tests of Hypotheses

these intervals, under the “z-Interval” heading, which can be obtained from the tables of the standard normal distribution or using software functions, such as the ones already described for SPSS, STATISTICA, MATLAB and R.

The corresponding interval cutpoints, x cut , for the random variable under analysis, X, can now be easily determined, using:

x cut = x + z cut s X , 5.9

where we use the sample mean and standard deviation as well as the cutpoints determined for the normal distribution, z cut . In the present case, the mean and standard deviation are 137 and 43, respectively, which leads to the intervals under the “ART-Interval” heading.

The absolute frequency columns are now easily computed. With SPSS, STATISTICA and R we now obtain the value of *2 χ = 2.2. We must be careful,

however, when obtaining the corresponding significance in this application of the chi-square test. The problem is that now we do not have df = k – 1 degrees of freedom, but df = k – 1 – n p , where n p is the number of parameters computed from the sample. In our case, we derived the interval boundaries using the sample mean and sample standard deviation, i.e., we lost two degrees of freedom. Therefore, we have to compute the probability using df = 5 – 1 – 2 = 2 degrees of freedom, or equivalently, compute the critical region boundary as:

Since the computed value of the *2 χ is smaller than this critical region boundary, we do not reject at 5% significance level the null hypothesis of variable ART being normally distributed.

Table 5.7. Observed and expected (under the normality assumption) absolute frequencies, for variable ART of the cork-stopper dataset.

Expected

Observed

Cat. z-Interval Cumulative p ART-Interval

Frequencies Frequencies

5.1 Inference on One Population

Commands 5.4. SPSS, STATISTICA, MATLAB and R commands used to perform the chi-square goodness of fit test.

SPSS Analyze; Nonparametric Tests; Chi-Square Statistics; Nonparametrics; Observed

STATISTICA versus expected Χ 2 .

MATLAB [c,df,sig] = chi2test(x)

R chisq.test(x,p)

MATLAB does not have a specific function for the chi-square test. We provide in the book CD the chi2test function for that purpose.