Normal Approximation to the Binomial

6.5 Normal Approximation to the Binomial

  Probabilities associated with binomial experiments are readily obtainable from the formula b(x; n, p) of the binomial distribution or from Table A.1 when n is small. In addition, binomial probabilities are readily available in many computer software packages. However, it is instructive to learn the relationship between the binomial and the normal distribution. In Section 5.5, we illustrated how the Poisson dis- tribution can be used to approximate binomial probabilities when n is quite large and p is very close to 0 or 1. Both the binomial and the Poisson distributions

  Chapter 6 Some Continuous Probability Distributions

  are discrete. The first application of a continuous probability distribution to ap- proximate probabilities over a discrete sample space was demonstrated in Example

  6.12, where the normal curve was used. The normal distribution is often a good approximation to a discrete distribution when the latter takes on a symmetric bell shape. From a theoretical point of view, some distributions converge to the normal as their parameters approach certain limits. The normal distribution is a conve- nient approximating distribution because the cumulative distribution function is so easily tabled. The binomial distribution is nicely approximated by the normal in practical problems when one works with the cumulative distribution function. We now state a theorem that allows us to use areas under the normal curve to approximate binomial properties when n is sufficiently large.

  Theorem 6.3: If X is a binomial random variable with mean μ = np and variance σ 2 = npq,

  then the limiting form of the distribution of

  as n → ∞, is the standard normal distribution n(z; 0, 1).

  It turns out that the normal distribution with μ = np and σ 2 = np(1 − p) not

  only provides a very accurate approximation to the binomial distribution when n is large and p is not extremely close to 0 or 1 but also provides a fairly good approximation even when n is small and p is reasonably close to 12.

  To illustrate the normal approximation to the binomial distribution, we first draw the histogram for b(x; 15, 0.4) and then superimpose the particular normal curve having the same mean and variance as the binomial variable X. Hence, we draw a normal curve with

  μ = np = (15)(0.4) = 6 and σ 2 = npq = (15)(0.4)(0.6) = 3.6.

  The histogram of b(x; 15, 0.4) and the corresponding superimposed normal curve, which is completely determined by its mean and variance, are illustrated in Figure

  Figure 6.22: Normal approximation of b(x; 15, 0.4).

  6.5 Normal Approximation to the Binomial

  The exact probability that the binomial random variable X assumes a given value x is equal to the area of the bar whose base is centered at x. For example, the exact probability that X assumes the value 4 is equal to the area of the rectangle with base centered at x = 4. Using Table A.1, we find this area to be

  P (X = 4) = b(4; 15, 0.4) = 0.1268,

  which is approximately equal to the area of the shaded region under the normal

  curve between the two ordinates x 1 = 3.5 and x 2 = 4.5 in Figure 6.23. Converting

  to z values, we have

  Figure 6.23: Normal approximation of b(x; 15, 0.4) and b(x; 15, 0.4).

  x=7

  If X is a binomial random variable and Z a standard normal variable, then

  P (X = 4) = b(4; 15, 0.4) ≈ P (−1.32 < Z < −0.79)

  = P (Z < −0.79) − P (Z < −1.32) = 0.2148 − 0.0934 = 0.1214.

  This agrees very closely with the exact value of 0.1268.

  The normal approximation is most useful in calculating binomial sums for large values of n. Referring to Figure 6.23, we might be interested in the probability that X assumes a value from 7 to 9 inclusive. The exact probability is given by

  P (7 ≤ X ≤ 9) =

  b(x; 15, 0.4) −

  b(x; 15, 0.4)

  which is equal to the sum of the areas of the rectangles with bases centered at x = 7, 8, and 9. For the normal approximation, we find the area of the shaded

  region under the curve between the ordinates x 1 = 6.5 and x 2 = 9.5 in Figure 6.23.

  The corresponding z values are

  = 0.26 and z 2 =

  Chapter 6 Some Continuous Probability Distributions

  Now,

  P (7 ≤ X ≤ 9) ≈ P (0.26 < Z < 1.85) = P (Z < 1.85) − P (Z < 0.26)

  = 0.9678 − 0.6026 = 0.3652.

  Once again, the normal curve approximation provides a value that agrees very closely with the exact value of 0.3564. The degree of accuracy, which depends on how well the curve fits the histogram, will increase as n increases. This is particu- larly true when p is not very close to 12 and the histogram is no longer symmetric. Figures 6.24 and 6.25 show the histograms for b(x; 6, 0.2) and b(x; 15, 0.2), respec- tively. It is evident that a normal curve would fit the histogram considerably better when n = 15 than when n = 6.

  x

  0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 11 13 15 x

  Figure 6.24: Histogram for b(x; 6, 0.2).

  Figure 6.25: Histogram for b(x; 15, 0.2).

  In our illustrations of the normal approximation to the binomial, it became apparent that if we seek the area under the normal curve to the left of, say, x, it is more accurate to use x + 0.5. This is a correction to accommodate the fact that a discrete distribution is being approximated by a continuous distribution. The correction +0.5 is called a continuity correction. The foregoing discussion leads to the following formal normal approximation to the binomial.

  Normal Let X be a binomial random variable with parameters n and p. For large n, X

  Approximation to has approximately a normal distribution with μ = np and σ 2 = npq = np(1 − p)

  the Binomial and

  b(k; n, p)

  k=0

  ≈ area under normal curve to the left of x + 0.5

  and the approximation will be good if np and n(1 − p) are greater than or equal to 5.

  As we indicated earlier, the quality of the approximation is quite good for large n. If p is close to 12, a moderate or small sample size will be sufficient for a reasonable approximation. We offer Table 6.1 as an indication of the quality of the

  6.5 Normal Approximation to the Binomial

  approximation. Both the normal approximation and the true binomial cumulative probabilities are given. Notice that at p = 0.05 and p = 0.10, the approximation is fairly crude for n = 10. However, even for n = 10, note the improvement for p = 0.50. On the other hand, when p is fixed at p = 0.05, note the improvement of the approximation as we go from n = 20 to n = 100.

  Table 6.1: Normal Approximation and True Cumulative Binomial Probabilities

  p = 0.05, n = 10

  p = 0.10, n = 10

  p = 0.50, n = 10

  r Binomial Normal Binomial Normal Binomial Normal

  r Binomial Normal Binomial Normal Binomial Normal

  Example 6.15: The probability that a patient recovers from a rare blood disease is 0.4. If 100

  people are known to have contracted this disease, what is the probability that fewer than 30 survive?

  Solution : Let the binomial variable X represent the number of patients who survive. Since

  n = 100, we should obtain fairly accurate results using the normal-curve approxi- mation with

  μ = np = (100)(0.4) = 40 and σ = √ npq =

  To obtain the desired probability, we have to find the area to the left of x = 29.5.

  Chapter 6 Some Continuous Probability Distributions

  The z value corresponding to 29.5 is

  and the probability of fewer than 30 of the 100 patients surviving is given by the shaded region in Figure 6.26. Hence,

  P (X < 30) ≈ P (Z < −2.14) = 0.0162.

  Figure 6.26: Area for Example 6.15.

  Figure 6.27: Area for Example 6.16.

  Example 6.16:

  A multiple-choice quiz has 200 questions, each with 4 possible answers of which only 1 is correct. What is the probability that sheer guesswork yields from 25 to

  30 correct answers for the 80 of the 200 problems about which the student has no knowledge? Solution : The probability of guessing a correct answer for each of the 80 questions is p = 14.

  If X represents the number of correct answers resulting from guesswork, then

  P (25 ≤ X ≤ 30) =

  b(x; 80, 14).

  x=25

  Using the normal curve approximation with

  μ = np = (80)

  σ= √ npq = (80)(14)(34) = 3.873,

  we need the area between x 1 = 24.5 and x 2 = 30.5. The corresponding z values

  = 1.16 and z 2 =

  The probability of correctly guessing from 25 to 30 questions is given by the shaded region in Figure 6.27. From Table A.3 we find that

  P (25 ≤ X ≤ 30) =

  b(x; 80, 0.25) ≈ P (1.16 < Z < 2.71)

  x=25

  = P (Z < 2.71) − P (Z < 1.16) = 0.9966 − 0.8770 = 0.1196.

  Exercises

  Exercises

  6.24 A coin is tossed 400 times. Use the normal curve

  a sample of 100 individuals and decide to accept the

  approximation to find the probability of obtaining

  claim if 75 or more are cured.

  (a) between 185 and 210 heads inclusive;

  (a) What is the probability that the claim will be re-

  (b) exactly 205 heads;

  jected when the cure probability is, in fact, 0.8?

  (c) fewer than 176 or more than 227 heads.

  (b) What is the probability that the claim will be ac- cepted by the government when the cure probabil- ity is as low as 0.7?

  6.25 A process for manufacturing an electronic com- ponent yields items of which 1 are defective. A qual-

  ity control plan is to select 100 items from the process, 6.31 One-sixth of the male freshmen entering a large

  and if none are defective, the process continues. Use state school are out-of-state students. If the students the normal approximation to the binomial to find

  are assigned at random to dormitories, 180 to a build- ing, what is the probability that in a given dormitory

  (a) the probability that the process continues given the

  sampling plan described;

  at least one-fifth of the students are from out of state? (b) the probability that the process continues even if 6.32 A pharmaceutical company knows that approx-

  the process has gone bad (i.e., if the frequency imately 5 of its birth-control pills have an ingredient of defective components has shifted to 5.0 defec- that is below the minimum strength, thus rendering tive).

  the pill ineffective. What is the probability that fewer than 10 in a sample of 200 pills will be ineffective?

  6.26 A process yields 10 defective items. If 100

  items are randomly selected from the process, what 6.33 Statistics released by the National Highway

  is the probability that the number of defectives

  Traffic Safety Administration and the National Safety Council show that on an average weekend night, 1 out

  (a) exceeds 13?

  of every 10 drivers on the road is drunk. If 400 drivers

  (b) is less than 8?

  are randomly checked next Saturday night, what is the probability that the number of drunk drivers will be

  6.27 The probability that a patient recovers from a (a) less than 32? delicate heart operation is 0.9. Of the next 100 patients (b) more than 49? having this operation, what is the probability that

  (c) at least 35 but less than 47?

  (a) between 84 and 95 inclusive survive? (b) fewer than 86 survive?

  6.34 A pair of dice is rolled 180 times. What is the probability that a total of 7 occurs

  6.28 Researchers at George Washington University (a) at least 25 times? and the National Institutes of Health claim that ap- proximately 75 of people believe “tranquilizers work (b) between 33 and 41 times inclusive? very well to make a person more calm and relaxed.” Of (c) exactly 30 times? the next 80 people interviewed, what is the probability that

  6.35 A company produces component parts for an en-

  (a) at least 50 are of this opinion?

  gine. Parts specifications suggest that 95 of items

  (b) at most 56 are of this opinion?

  meet specifications. The parts are shipped to cus- tomers in lots of 100.

  6.29 If 20 of the residents in a U.S. city prefer a (a) What is the probability that more than 2 items in white telephone over any other color available, what is

  a given lot will be defective?

  the probability that among the next 1000 telephones (b) What is the probability that more than 10 items in installed in that city

  a lot will be defective?

  (a) between 170 and 185 inclusive will be white?

  6.36 A common practice of airline companies is to

  (b) at least 210 but not more than 225 will be white?

  sell more tickets for a particular flight than there are seats on the plane, because customers who buy tickets

  6.30 A drug manufacturer claims that a certain drug do not always show up for the flight. Suppose that cures a blood disease, on the average, 80 of the time. the percentage of no-shows at flight time is 2. For To check the claim, government testers use the drug on

  a particular flight with 197 seats, a total of 200 tick-

  Chapter 6 Some Continuous Probability Distributions

  ets were sold. What is the probability that the airline 6.38 A telemarketing company has a special letter-

  overbooked this flight?

  opening machine that opens and removes the contents of an envelope. If the envelope is fed improperly into

  6.37 The serum cholesterol level X in 14-year-old the machine, the contents of the envelope may not be boys has approximately a normal distribution with removed or may be damaged. In this case, the machine mean 170 and standard deviation 30.

  is said to have “failed.”

  (a) Find the probability that the serum cholesterol (a) If the machine has a probability of failure of 0.01,

  level of a randomly chosen 14-year-old boy exceeds

  what is the probability of more than 1 failure oc-

  curring in a batch of 20 envelopes?

  (b) In a middle school there are 300 14-year-old boys. (b) If the probability of failure of the machine is 0.01

  Find the probability that at least 8 boys have a

  and a batch of 500 envelopes is to be opened, what

  serum cholesterol level that exceeds 230.

  is the probability that more than 8 failures will occur?