Type I and Type II errors
6.2 Type I and Type II errors
Jerzy Neyman and E.S. Pearson developed the bulk of the classic approach to hypothesis testing between 1928 and 1933. Neyman and Pearson emphasized hypothesis testing as a procedure to make decisions rather than as a procedure to falsify hypothesis. In their own words
Without hoping to know whether each separate hypothesis is true or false, we may search for rules to govern our behavior with regard to them, in following which we insure that, in the long run experience, we shall not be too often wrong. [?]
They viewed the task of statistical hypothesis testing as similar to the detec- tion of signals in the presence of noise. Let me illustrate this concept with the following analogy. Suppose you are a fire detector. You are hanging up there in the ceiling of a house and your task is to decide whether the house is on fire. Once in a while you measure the amount of smoke passing through your sensors. If the amount is beyond a critical value you announce the house residents that the house is on fire. Otherwise you stay quiet.
In this analogy the null hypothesis is the theoretical possibility that the house is not on fire and the alternative hypothesis is that the house is on fire. Measuring how much smoke there is out there is the equivalent of conducting an experiment and summarizing the results with some statistic. The information available to us is imperfect and thus we never know for sure whether the house is or is not on fire. There is a myriad of intervening variables that may randomly change the amount of smoke. Sometimes there is a lot of smoke in the house but there is no fire, sometimes, due to sensor failure, there may be fire but our sensors do not activate. Due to this fact we can make two types of mistakes:
1. We can have false alarms, situations where there is no fire but we announce
that there is a fire. This type of error is known as a Type I error. In scientific research, type I errors occur when scientists reject null hypotheses which in fact are true.
2. We can also miss the fire. This type of error as Type II error. Type II errors
occur when scientists do not reject null hypothesis which in fact are false.
Note that for type I errors to happen two things must occur:
62 CHAPTER 6. INTRODUCTION TO STATISTICAL HYPOTHESIS TESTING
Type I errors
pdf when Hn is true
(False Alarms)
Probability Density
Smoke
Type II errors (Misses)
pdf when Hn is false
Probability Density
Smoke
Alarm Off Alarm On
Figure 6.1: An illustration of the process of statistical hypothesis testing. The upper figure shows the distribution of smoke when there is no fire. The lower figure shows the distribution when there is fire. It can be seen that on average there is more smoke when there is fire but there is overlap between the two conditions. The task of hypothesis testing is to decide whether there is a fire based only on the amount of smoke measured by the sensor.
(1) The null hypothesis must be true.
6.3. THE BAYESIAN APPROACH
and for type II errors to happen two things must occur:
(1) The null hypothesis must be false.
(2) We do not reject it.
6.2.1 Specifications of a decision system
The performance of a decision system can be specified in terms of its potential to make errors when the null hypothesis is true and when the null hypothesis is false.
• The type I error specification is the probability of making errors when the
null hypothesis is true. This specification is commonly represented with the symbol α. For example if we say that a test has α ≤ 0.05 we guarantee that if the null hypothesis is true the test will not make more than 120 mistakes.
• The type II error specification is the probability of making errors when
the null hypothesis is false. This specification is commonly represented with the symbol β. For example if we say that for a test β is unknown we say that we cannot guarantee how it will behave when the null hypothesis is actually false.
• The Power specification is the probability of correctly rejecting the null
hypothesis when it is false. Thus the power specification is 1.0 - β. The current standard in the empirical sciences dictates that for a scientific test to
be acceptable the type I error specification has to be smaller or equal to 120
(i.e., α ≤ 0.05). The standard does not dictate what the type II error specification should be. In the next chapter we will see examples of statistical tests that meet this type I error specification.