Some Statistical Considerations Discrete Random Variables and Probability Distributions

2.6 Some Statistical Considerations

109 this would be a number of good items from the process, say X, and we would surely use X in some way to estimate p. How precisely can we use X to estimate p? It would appear natural to estimate p by the proportion of good items in the sample, X n : Since X is a random variable, so is X n : We can calculate the expected value of this random variable as follows: E X n ½ = n P x=0 x n · P.X = x = 1 n · n P x=0 x · P.X = x; so E X n ½ = 1 n · n · p = p: This indicates that, on average, our estimate for p gives the true value, p. We say that our estimator, X n , is an unbiased estimator for p. This gives us a way of estimating p by a single value. This single value is dependent upon the sample, and if we choose another sample, we are likely to find another value of X, and hence arrive at another estimate of p. Could we also find a “likely” range for the value of p? To answer this, consider a related question. If we have a binomial situation with probability p and sample size n, what is a likely range for the observed values of the random variable, X? The answer of course depends upon the meaning of the word likely. Suppose that a likely range for the values of a random variable is a range in which the values of the variable occur with probability 0.95. With the considerable aid of our computer algebra system, we can evaluate a number of different binomial distributions. We vary n; the number of observations, and p; the probability of success. In each case we find the proportion of the values of X that lie within two standard deviations of the mean; that is, the proportion of the values of X that lie in the interval ¼ ± 2¦ = n · p ± 2 √ n · p · .1 − p: We select the constant two because we need to find a range that includes a large portion – 95 – of the values of X, and two appears to be a reasonable multiplier for the standard deviation. Table 2.6.1 shows the results of these calculations. Here P represents the probability that an observed value of the random variable X lies in the interval ¼ ± 2¦ = n · p ± 2 √ n · p · .1 − p: The values of n and p have been chosen so that the end-points of the intervals are integers. We are led to believe from the table, regardless of the value of p, that at least 95 of the values of the variable X lie in the interval ¼ ± 2¦: Later we will show, for large values of n, regardless of the value of p, that the probability is approximately 0:9545, a result supported by our calculations. So we have P.¼ − 2¦ ≤ X ≤ ¼ + 2¦ ≥ 0:95: 2.5 Solving the inequalities for ¼, we have P. X − 2¦ ≤ ¼ ≤ X + 2¦ ≥ 0:95: 2.6 110

Chapter 2 Discrete Random Variables and Probability Distributions