Hypothesis Test Procedure

4.1 Hypothesis Test Procedure

Any hypothesis test procedure starts with the formulation of an interesting hypothesis concerning the distribution of a certain random variable in the population. As a result of the test we obtain a decision rule, which allows us to either reject or accept the hypothesis with a certain probability of error, referred to as the level of significance of the test.

In order to illustrate the basic steps of the test procedure, let us consider the following example. Two methods of manufacturing a special type of drill, respectively A and B, are characterised by the following average lifetime (in

continuous work without failure): µ A = 10 hours and µ B = 1300 hours. Both methods have an equal standard deviation of the lifetime, σ = 270 hours. A new

manufacturer of the same type of drills claims that his brand is of a quality identical to the best one, B, and with lower manufacture costs. In order to assess this claim, a sample of 12 drills of the new brand were tested and yielded an average lifetime of x = 1260 hours. The interesting hypothesis to be analysed is that there is no difference between the new brand and the old brand B. We call it

the null hypothesis and represent it by H 0 . Denoting by µ the average lifetime of the new brand, we then formalise the test as:

H 0 : µ =µ B =1300.

H 1 : µ =µ A =1100.

Hypothesis H 1 is a so-called alternative hypothesis. There can be many alternative hypotheses, corresponding to µ ≠µ B . However, for the time being, we assume that µ =µ A is the only interesting alternative hypothesis. We also assume

4 Parametric Tests of Hypotheses

that the lifetime of the drills, X, for all the brands, follows a normal distribution 1 with the same standard deviation . We know, therefore, that the sampling distribution of

X is also normal with the following standard error (see sections 3.2 and A.8.4): σ σ X =

The sampling distributions (pdf’s) corresponding to both hypotheses are shown in Figure 4.1. We seek a procedure to decide whether the 12-drill-sample provides statistically significant evidence leading to the acceptance of the null hypothesis

H 0 . Given the symmetry of the distributions, a “common sense” approach would

lead us to establish a decision threshold, x α , halfway between µ A and µ B , i.e. x α =1200 hours, and decide H 0 if x >1200, decide H 1 if x <1200, and arbitrarily if x =1200.

x α 1300

accept H 1 accept H 0 Figure 4.1. Sampling distribution (pdf) of X for the null and the alternative

hypotheses.

Let us consider the four possible situations according to the truth of the null hypothesis and the conclusion drawn from the test, as shown in Figure 4.2. For the decision threshold x α =1200 shown in Figure 4.1, we then have:

α = β = P ( Z ≤ ( 1200 − 1300 ) / 77 . 94 ) = N 0 , 1 ( − 1 . 283 ) = 0 . 10 ,

where Z is a random varable with standardised normal distribution.

Strictly speaking the lifetime of the drills cannot follow a normal distribution, since X > 0. Also, as discussed in chapter 9, lifetime distributions are usually skewed. We assume, however, in this example, the distribution to be well approximated by the normal law.

4.1 Hypothesis Test Procedure 113

Values of a normal random variable, standardised by subtracting the mean and dividing by the standard deviation, are called z-scores. In this case, the test errors α and β are evaluated using the z-score, −1.283.

In hypothesis tests, one is usually interested in that the probability of wrongly rejecting the null hypothesis is low; in other words, one wants to set a low value for the following Type I Error:

Type I Error: α = P(H 0 is true and, based on the test, we reject H 0 ).

This is the so-called level of significance of the test. The complement, 1– α, is the confidence level. A popular value for the level of significance that we will use throughout the book is α = 0.05, often given in percentage, α = 5%. Knowing the α percentile of the standard normal distribution, one can easily determine the decision threshold for this level of significance:

P ( Z ≤ 0 . 05 ) = − 1 . 64 ⇒ x α = 1300 − 1 . 64 × 77 . 94 = 1172 . 2 .

Decision Accept

Accept

H 0 Decision

Correct

Type I Error

lit

e Type II Error

H 1 Correct β Decision

Figure 4.2. Types of error in hypothesis testing according to the reality and the decision drawn from the test.

accept H 1 10

critical region accept H 0

Figure 4.3. The critical region for a significance level of α =5%.

4 Parametric Tests of Hypotheses

Figure 4.3 shows the situation for this new decision threshold, which delimits the so-called critical region of the test, the region corresponding to a Type I Error. Since the computed sample mean for the new brand of drills, x = 1260, falls in the non-critical region, we accept the null hypothesis at that level of significance (5%). In adopting this procedure, we expect that using it in a long run of sample-based

tests, under identical conditions, we would be erroneously rejecting H 0 about 5% of the times. In general, let us denote by

C the critical region. If, as it happens in Figure 4.1 or 4.3, x ∉ C, we may say that “we accept the null hypothesis at that level of significance”; otherwise, we reject it.

Notice, however, that there is a non-null probability that a value as large as x could be obtained by type A drills, as expressed by the non-null β. Also, when we consider a wider range of alternative hypotheses, for instance µ <µ B , there is always a possibility that a brand of drills with mean lifetime inferior to µ B is, however, sufficiently close to yield with high probability sample means falling in the non-critical region. For these reasons, it is often advisable to adopt a conservative attitude stating that there is no evidence to reject the null hypothesis “ at the α level of significance . ”

Any test procedure assessing whether or not H 0 should be rejected can be summarised as follows:

1. Choose a suitable test statistic t n (x), dependent on the n-dimensional sample

x= [ x 1 , x 2 , K , x n ] ’ , considered a value of a random variable, T ≡ t n ( X),

where

X denotes the n-dimensional random variable associated to the sampling process.

2. Choose a level of significance α and use it together with the sampling distribution of T in order to determine the critical region C for H 0 .

3. Test decision: If t n (x) ∈ C, then reject H 0 , otherwise do not reject H 0 . In the first case, the test is said to be significant (at level α); in the second case, the test is non-significant.

Frequently, instead of determining the critical region, we may determine the probability of obtaining a deviation of the statistical value corresponding to H 0 at

least as large as the observed one, i.e., p = P(T ≥ t n (x)) or p = P(T ≤ t n (x)). The probability p is the so-called observed level of significance. The value of p is then compared with a pre-set level of significance. This is the procedure used by statistical software products. For the previous example, the test statistic is:

mean ( x ) − 1300 x − 1300 t 12 ( x ) =

which, given the normality of

X, has a sampling distribution identical to the standard normal distribution, i.e., T=Z~N 0,1 . A deviation at least as large as the observed one in the left tail of the distribution has the observed significance:

4.2 Test Errors and Test Power 115

p = P ( Z ≤ ( x − µ B ) / σ X ) = P ( Z ≤ ( 1260 − 1300 ) / 77 . 94 ) = 0 . 304 .

If we are basing our conclusions on a 5% level of significance, and since p > 0.05, we then have no evidence to reject the null hypothesis. Note that until now we have assumed that we knew the true value of the standard deviation. This, however, is seldom the case. As already discussed in the previous chapter, when using the sample standard deviation – maintaining the assumption of normality of the random variable − one must use the Student’s t distribution. This is the usual procedure, also followed by statistical software products, where these parametric tests of means are called t tests.