The mean squared error of an estimator

K minimizes the mean squared error of this estimator when the population distribution is normal? [Hint: It can be shown that E [S 2 2 ] ⫽ n ⫹ 1s 4 n ⫺ 1 In general, it is difficult to find to minimize , which is why we look only at unbiased estimators and minimize .] 35. Let X 1 , . . . , X n be a random sample from a pdf that is symmet- ric about . An estimator for that has been found to perform well for a variety of underlying distributions is the Hodges –Lehmann estimator. To define it, first compute for each i ⱕ j and each j ⫽ 1, 2, . . . , n the pairwise average i,j ⫽ X i ⫹ X j 2. Then the estimator is ⫽ the median of the i,j ’s. Compute the value of this estimate using the data of Exercise 44 of Chapter 1. [Hint: Construct a square table with the x i ’s listed on the left margin and on top. Then compute averages on and above the diagonal.] 36. When the population distribution is normal, the statistic median {| X 1 ⫺ |, . . . , | X n ⫺ |}.6745 can be used to estimate s. This estimator is more resistant to the effects of outliers observations far from the bulk of the data X | X | X m ˆ X m m V ˆu MSE ˆu ˆu than is the sample standard deviation. Compute both the corresponding point estimate and s for the data of Example 6.2. 37. When the sample standard deviation S is based on a random sample from a normal population distribution, it can be shown that Use this to obtain an unbiased estimator for s of the form cS . What is c when n ⫽ 20? 38. Each of n specimens is to be weighed twice on the same scale. Let X i and Y i denote the two observed weights for the ith specimen. Suppose X i and Y i are independent of one another, each normally distributed with mean value i the true weight of specimen i and variance s 2 . a. Show that the maximum likelihood estimator of s 2 is . [Hint: If ⫽ z 1 ⫹ z 2 2, then ⌺ z i ⫺ 2 ⫽ z 1 ⫺ z 2 2 2.] b. Is the mle an unbiased estimator of s 2 ? Find an unbiased estimator of s 2 . [Hint: For any rv Z, EZ 2 ⫽ V Z ⫹ [EZ] 2 . Apply this to Z ⫽ X i ⫺ Y i .] s ˆ 2 z z s ˆ 2 5 gX i 2 Y i 2 4n m E S 5 12n 2 1⌫n2s⌫n 2 12 Bibliography DeGroot, Morris, and Mark Schervish, Probability and Statistics 3rd ed., Addison-Wesley, Boston, MA, 2002. Includes an excellent discussion of both general properties and methods of point estimation; of particular interest are examples show- ing how general principles and methods can yield unsatisfac- tory estimators in particular situations. Devore, Jay, and Kenneth Berk, Modern Mathematical Statistics with Applications, Thomson-BrooksCole, Belmont, CA, 2007. The exposition is a bit more comprehensive and sophisticated than that of the current book. Efron, Bradley, and Robert Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York, 1993. The bible of the bootstrap. Hoaglin, David, Frederick Mosteller, and John Tukey, Understanding Robust and Exploratory Data Analysis, Wiley, New York, 1983. Contains several good chapters on robust point estimation, including one on M-estimation. Rice, John, Mathematical Statistics and Data Analysis 3rd ed., Thomson-BrooksCole, Belmont, CA, 2007. A nice blending of statistical theory and data. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 267 7 Statistical Intervals Based on a Single Sample INTRODUCTION A point estimate, because it is a single number, by itself provides no informa- tion about the precision and reliability of estimation. Consider, for example, using the statistic to calculate a point estimate for the true average breaking strength g of paper towels of a certain brand, and suppose that . Because of sampling variability, it is virtually never the case that . The point estimate says nothing about how close it might be to m. An alternative to reporting a single sensible value for the parameter being estimated is to calcu- late and report an entire interval of plausible values—an interval estimate or confidence interval CI. A confidence interval is always calculated by first selecting a confidence level, which is a measure of the degree of reliability of the interval. A confidence interval with a 95 confidence level for the true average breaking strength might have a lower limit of 9162.5 and an upper limit of 9482.9. Then at the 95 confidence level, any value of m between 9162.5 and 9482.9 is plausible. A confidence level of 95 implies that 95 of all samples would give an interval that includes m, or whatever other parame- ter is being estimated, and only 5 of all samples would yield an erroneous interval. The most frequently used confidence levels are 95, 99, and 90. The higher the confidence level, the more strongly we believe that the value of the parameter being estimated lies within the interval an interpretation of any particular confidence level will be given shortly. Information about the precision of an interval estimate is conveyed by the width of the interval. If the confidence level is high and the resulting interval is quite narrow, our knowledge of the value of the parameter is reasonably pre- cise. A very wide confidence interval, however, gives the message that there is x 5 m x 5 9322.7 X Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 7.1 Brand 1: Brand 2: Strength Strength Figure 7.1 CIs indicating precise brand 1 and imprecise brand 2 information about m a great deal of uncertainty concerning the value of what we are estimating. Figure 7.1 shows 95 confidence intervals for true average breaking strengths of two different brands of paper towels. One of these intervals suggests precise knowledge about m, whereas the other suggests a very wide range of plausible values. 7.1 Basic Properties of Confidence Intervals The basic concepts and properties of confidence intervals CIs are most easily intro- duced by first focusing on a simple, albeit somewhat unrealistic, problem situation. Suppose that the parameter of interest is a population mean m and that 1. The population distribution is normal 2. The value of the population standard deviation s is known Normality of the population distribution is often a reasonable assumption. However, if the value of m is unknown, it is typically implausible that the value of s would be available knowledge of a population’s center typically precedes information con- cerning spread. We’ll develop methods based on less restrictive assumptions in Sections 7.2 and 7.3. Industrial engineers who specialize in ergonomics are concerned with designing workspace and worker-operated devices so as to achieve high productivity and com- fort. The article “Studies on Ergonomically Designed Alphanumeric Keyboards” Human Factors, 1985: 175–187 reports on a study of preferred height for an exper- imental keyboard with large forearm–wrist support. A sample of trained typ- ists was selected, and the preferred keyboard height was determined for each typist. The resulting sample average preferred height was . Assuming that the preferred height is normally distributed with a value suggested by data in the article, obtain a CI for m, the true average preferred height for the population of all experienced typists. ■ The actual sample observations are assumed to be the result of a random sample from a normal distribution with mean value m and stan- dard deviation s. The results described in Chapter 5 then imply that, irrespective of the sample size n, the sample mean is normally distributed with expected value m and standard deviation . Standardizing by first subtracting its expected value and then dividing by its standard deviation yields the standard normal variable 7.1 Z 5 X 2 m s 1n X s 1n X X 1 , c , X n x 1 , x 2 , c , x n s 5 2.0 cm x 5 80.0 cm n 5 31 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Because the area under the standard normal curve between and 1.96 is .95, 7.2 Now let’s manipulate the inequalities inside the parentheses in 7.2 so that they appear in the equivalent form , where the endpoints l and u involve and . This is achieved through the following sequence of operations, each yielding inequalities equivalent to the original ones. 1. Multiply through by : 2. Subtract from each term: 3. Multiply through by to eliminate the minus sign in front of m which reverses the direction of each inequality: that is, The equivalence of each set of inequalities to the original set implies that 7.3 The event inside the parentheses in 7.3 has a somewhat unfamiliar appearance; previously, the random quantity has appeared in the middle with constants on both ends, as in . In 7.3 the random quantity appears on the two ends, whereas the unknown constant m appears in the middle. To interpret 7.3, think of a random interval having left endpoint and right endpoint . In interval notation, this becomes 7.4 The interval 7.4 is random because the two endpoints of the interval involve a ran- dom variable. It is centered at the sample mean and extends to each side of . Thus the interval’s width is , which is not random; only the location of the interval its midpoint is random Figure 7.2. Now 7.3 can be par- aphrased as “the probability is .95 that the random interval 7.4 includes or covers the true value of m .” Before any experiment is performed and any data is gathered, it is quite likely that m will lie inside the interval 7.4. X 2 1.96 s 1n X 1.96s 1n X aX 2 1.96 s 1n , X 1 1.96 s 1n b X 1 1.96 s 1n X 2 1.96 s 1n a Y b P aX 2 1.96 s 1n , m , X 1 1.96 s 1n b 5 .95 X 2 1.96 s 1n , m , X 1 1.96 s 1n X 1 1.96 s 1n . m . X 2 1.96 s 1n 2 1 2X 2 1.96 s 1n , 2m , 2X 1 1.96 s 1n X 2 1.96 s 1n , X 2 m , 1.96 s 1n s 1n s 1n X l , m , m P a21.96 , X 2 m s 1n , 1.96 b 5 .95 2 1.96 Figure 7.2 The random interval 7.4 centered at X X ⫺ 1.96 ␴ n 1.96 ␴ n 1.96 ␴ n X ⫹ 1.96 ␴ n X Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 7.2 Example 7.1 continued DEFINITION If, after observing , we compute the observed sample mean and then substitute into 7.4 in place of , the resulting fixed interval is called a 95 confidence interval for M. This CI can be expressed either as or as A concise expression for the interval is , where gives the left endpoint lower limit and gives the right endpoint upper limit. 1 2 x 6 1.96 s 1n x 2 1.96 s 1n , m , x 1 1.96 s 1n with 95 confidence ax 2 1.96 s 1n , x 1 1.96 s 1n b is a 95 CI for m X x x X 1 5 x 1 , X 2 5 x 2 , c , X n 5 x n The quantities needed for computation of the 95 CI for true average preferred height are , and . The resulting interval is That is, we can be highly confident, at the 95 confidence level, that . This interval is relatively narrow, indicating that m has been rather precisely estimated. ■ Interpreting a Confidence Level The confidence level 95 for the interval just defined was inherited from the prob- ability .95 for the random interval 7.4. Intervals having other levels of confidence will be introduced shortly. For now, though, consider how 95 confidence can be interpreted. Because we started with an event whose probability was .95—that the random interval 7.4 would capture the true value of m—and then used the data in Example 7.1 to compute the CI 79.3, 80.7, it is tempting to conclude that m is within this fixed interval with probability .95. But by substituting for , all randomness disappears; the interval 79.3, 80.7 is not a random interval, and m is a constant unfortunately unknown to us. It is therefore incorrect to write the state- ment . A correct interpretation of “95 confidence” relies on the long-run relative fre- quency interpretation of probability: To say that an event A has probability .95 is to say that if the experiment on which A is defined is performed over and over again, in the long run A will occur 95 of the time. Suppose we obtain another sample of typ- ists’ preferred heights and compute another 95 interval. Then we consider repeating this for a third sample, a fourth sample, a fifth sample, and so on. Let A be the event that . Since , in the long run 95 of our computed CIs will contain m. This is illustrated in Figure 7.3, where the vertical line cuts the measurement axis at the true but unknown value of m. Notice that 7 of the 100 intervals shown fail to contain m. In the long run, only 5 of the intervals so constructed would fail to contain m. According to this interpretation, the confidence level 95 is not so much a statement about any particular interval such as 79.3, 80.7. Instead it pertains to what would happen if a very large number of like intervals were to be constructed P A 5 .95 X 2 1.96 s 1n , m , X 1 1.96 s 1n P m lies in 79.3, 80.7 5 .95 X x 5 80.0 79.3 , m , 80.7 x 6 1.96 s 1n 5 80.0 6 1.96 2.0 131 5 80.0 6 .7 5 79.3, 80.7 x 5 80.0 s 5 2.0, n 5 31 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. using the same CI formula. Although this may seem unsatisfactory, the root of the difficulty lies with our interpretation of probability—it applies to a long sequence of replications of an experiment rather than just a single replication. There is another approach to the construction and interpretation of CIs that uses the notion of sub- jective probability and Bayes’ theorem, but the technical details are beyond the scope of this text; the book by DeGroot, et al. see the Chapter 6 bibliography is a good source. The interval presented here as well as each interval presented subsequently is called a “classical” CI because its interpretation rests on the classical notion of probability. Other Levels of Confidence The confidence level of 95 was inherited from the probability .95 for the initial inequalities in 7.2. If a confidence level of 99 is desired, the initial probability of .95 must be replaced by .99, which necessitates changing the z critical value from 1.96 to 2.58. A 99 CI then results from using 2.58 in place of 1.96 in the formula for the 95 CI. In fact, any desired level of confidence can be achieved by replacing 1.96 or 2.58 with the appropriate standard normal critical value. As Figure 7.4 shows, a probability of is achieved by using in place of 1.96. z a 2 1 2 a µ µ Figure 7.3 One hundred 95 CIs asterisks identify intervals that do not include m. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. ⫺z 2 ␣ z 2 ␣ z curve Shaded area ⫽ 2 ␣ 1 ⫺ ␣ Figure 7.4 P 2z a 2 Z , z a 2 5 1 2 a A confidence interval for the mean m of a normal population when the value of s is known is given by 7.5 or, equivalently, by . x 6 z a 2 s 1n ax 2 z a 2 s 1n , x 1 z a 2 s 1n b 1001 2 a The formula 7.5 for the CI can also be expressed in words as point estimate of . The production process for engine control housing units of a particular type has recently been modified. Prior to this modification, historical data had suggested that the distribution of hole diameters for bushings on the housings was normal with a standard deviation of .100 mm. It is believed that the modification has not affected the shape of the distribution or the standard deviation, but that the value of the mean diameter may have changed. A sample of 40 housing units is selected and hole diam- eter is determined for each one, resulting in a sample mean diameter of 5.426 mm. Let’s calculate a confidence interval for true average hole diameter using a confi- dence level of 90. This requires that , from which and corresponding to a cumulative z-curve area of .9500. The desired interval is then With a reasonably high degree of confidence, we can say that . This interval is rather narrow because of the small amount of variability in hole diameter . ■ Confidence Level, Precision, and Sample Size Why settle for a confidence level of 95 when a level of 99 is achievable? Because the price paid for the higher confidence level is a wider interval. Since the 95 interval extends to each side of , the width of the interval is . Similarly, the width of the 99 interval is . That is, we have more confidence in the 99 inter- val precisely because it is wider. The higher the desired degree of confidence, the wider the resulting interval will be. If we think of the width of the interval as specifying its precision or accuracy, then the confidence level or reliability of the interval is inversely related to its 22.58 s 1n 5 5.16 s 1n 21.96 s 1n 5 3.92 s 1n x 1.96 s 1n s 5 .100 5.400 , m , 5.452 5.426 6 1.645 .100 140 5 5.426 6 .026 5 5.400, 5.452 z a 2 5 z .05 5 1.645 a 5 .10 1001 2 a 5 90 m 6 z critical value standard error of the mean DEFINITION Example 7.3 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 7.4 The sample size necessary for the CI 7.5 to have a width w is n 5 a2z a 2 s wb 2 precision. A highly reliable interval estimate may be imprecise in that the endpoints of the interval may be far apart, whereas a precise interval may entail relatively low reliability. Thus it cannot be said unequivocally that a 99 interval is to be preferred to a 95 interval; the gain in reliability entails a loss in precision. An appealing strategy is to specify both the desired confidence level and inter- val width and then determine the necessary sample size. Extensive monitoring of a computer time-sharing system has suggested that response time to a particular editing command is normally distributed with standard deviation 25 millisec. A new operating system has been installed, and we wish to estimate the true average response time m for the new environment. Assuming that response times are still normally distributed with , what sample size is nec- essary to ensure that the resulting 95 CI has a width of at most 10? The sample size n must satisfy Rearranging this equation gives so Since n must be an integer, a sample size of 97 is required. ■ A general formula for the sample size n necessary to ensure an interval width w is obtained from equating w to and solving for n. 2 z a 2 s 1n n 5 9.80 2 5 96.04 1n 5 2 1.962510 5 9.80 10 5 2 1.9625 1n s 5 25 The smaller the desired width w, the larger n must be. In addition, n is an increasing function of s more population variability necessitates a larger sample size and of the confidence level as a decreases, increases. The half-width of the 95 CI is sometimes called the bound on the error of estimation associated with a 95 confidence level. That is, with 95 confidence, the point estimate will be no farther than this from m. Before obtain- ing data, an investigator may wish to determine a sample size for which a particular value of the bound is achieved. For example, with m representing the average fuel efficiency mpg for all cars of a certain type, the objective of an investigation may be to estimate m to within 1 mpg with 95 confidence. More generally, if we wish to estimate m to within an amount B the specified bound on the error of estimation with confidence, the necessary sample size results from replacing 2w by 1B in the formula in the preceding box. Deriving a Confidence Interval Let denote the sample on which the CI for a parameter u is to be based. Suppose a random variable satisfying the following two properties can be found: X 1 , X 2 , c , X n 1001 2 a x 1.96s 1n z a 2 1001 2 a Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Example 7.5 1. The variable depends functionally on both and u. 2. The probability distribution of the variable does not depend on u or on any other unknown parameters. Let denote this random variable. For example, if the pop- ulation distribution is normal with known s and , the variable satisfies both properties; it clearly depends functionally on m, yet has the standard normal probability distribution, which does not depend on m. In general, the form of the h function is usually suggested by exam- ining the distribution of an appropriate estimator . For any a between 0 and 1, constants a and b can be found to satisfy 7.6 Because of the second property, a and b do not depend on u. In the normal example, and . Now suppose that the inequalities in 7.6 can be manipu- lated to isolate u, giving the equivalent probability statement Then and are the lower and upper confidence limits, respectively, for a CI. In the normal example, we saw that and . A theoretical model suggests that the time to breakdown of an insulating fluid between electrodes at a particular voltage has an exponential distribution with parameter l see Section 4.4. A random sample of breakdown times yields the following sample data in min: . A 95 CI for l and for the true average breakdown time are desired. Let . It can be shown that this random variable has a probability distribution called a chi-squared distribution with 2n degrees of freedom df , where n is the parameter of a chi-squared distribution as men- tioned in Section 4.4. Appendix Table A.7 pictures a typical chi-squared density curve and tabulates critical values that capture specified tail areas. The relevant num- ber of df here is . The row of the table shows that 34.170 captures upper-tail area .025 and 9.591 captures lower-tail area .025 upper-tail area .975. Thus for , Division by isolates l, yielding The lower limit of the 95 CI for l is , and the upper limit is . For the given data, , giving the interval .00871, .03101. The expected value of an exponential rv is . Since the 95 CI for true average breakdown time is . This interval is obviously quite wide, reflecting substantial variability in breakdown times and a small sample size. ■ 32.24, 114.87 2 gx i 34.170, 2 gx i 9.591 5 P 2 g X i 34.170 , 1l , 2 g X i 9.591 5 .95 m 5 1l gx i 5 550.87 34.1702 gx i 9.5912 gx i P 9.5912 g X i , l , 34.1702 g X i 5 .95 2 gX i P 9.591 , 2l g X i , 34.170 5 .95 n 5 10 n 5 20 210 5 20 n 5 2n h X 1 , X 2 , c , X n ; l 5 2l gX i x 10 5 26.78 x 5 5 12.33, x 6 5 117.52, x 7 5 73.02, x 8 5 223.63, x 9 5 4.00, x 1 5 41.53, x 2 5 18.73, x 3 5 2.99, x 4 5 30.34, n 5 10 u X 1 , c , X n 5 X 1 z a 2 s 1n l X 1 , c , X n 5 X 2 z a 2 s 1n 1001 2 a u x 1 , c , x n l x 1 , x 2 , c , x n P lX 1 , X 2 , c , X n , u , uX 1 , X 2 , c , X n 5 1 2 a b 5 z a 2 a 5 2z a 2 P a , hX 1 , c , X n ; u , b 5 1 2 a uˆ h X 1 , c , X n ; m 5 X 2 ms 1n u 5 m h X 1 , X 2 , c , X n ; u X 1 , c , X n Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. In general, the upper and lower confidence limits result from replacing each , in 7.6 by and solving for u. In the insulating fluid example just considered, gives as the upper confidence limit, and the lower limit is obtained from the other equation. Notice that the two interval limits are not equidistant from the point estimate, since the interval is not of the form . Bootstrap Confidence Intervals The bootstrap technique was introduced in Chapter 6 as a way of estimating . It can also be applied to obtain a CI for u. Consider again estimating the mean m of a nor- mal distribution when s is known. Let’s replace m by u and use as the point estimator. Notice that is the 97.5th percentile of the distribution of [that is, ]. Similarly, is the 2.5th percentile, so That is, with 7.7 the CI for u is . In many cases, the percentiles in 7.7 cannot be calculated, but they can be estimated from bootstrap samples. Suppose we obtain boot- strap samples and calculate , and followed by the 1000 differences . The 25th largest and 25th smallest of these differences are estimates of the unknown percentiles in 7.7. Consult the Devore and Berk or Efron books cited in Chapter 6 for more information. uˆ 1 2 u , c , uˆ 1000 2 u u uˆ 1 , c , uˆ 1000 B 5 1000 l, u u 5 uˆ 2 2.5th percentile of uˆ 2 u l 5 uˆ 2 97.5th percentile of uˆ 2 u 5 P uˆ 2 2.5th percentile . u . uˆ 2 97.5th percentile .95 5 P2.5th percentile , uˆ 2 u , 97.5th percentile 2 1.96s 1n P X 2 m , 1.96s 1n 5 PZ , 1.96 5 .9750 uˆ 2 u 1.96s 1n uˆ 5 X s uˆ uˆ 6 c l 5 34.1702 gx i 2l gx i 5 34.170 EXERCISES Section 7.1 1–11 1. Consider a normal population distribution with the value of s known. a. What is the confidence level for the interval ? b. What is the confidence level for the interval ? c. What value of in the CI formula 7.5 results in a con- fidence level of 99.7? d. Answer the question posed in part c for a confidence level of 75. 2. Each of the following is a confidence interval for m true average i.e., population mean resonance frequency Hz for all tennis rackets of a certain type: 114.4, 115.6 114.1, 115.9 a. What is the value of the sample mean resonance frequency? b. Both intervals were calculated from the same sample data. The confidence level for one of these intervals is 90 and for the other is 99. Which of the intervals has the 90 confidence level, and why? z a 2 1.44s 1n x 6 2.81s 1n x 6 3. Suppose that a random sample of 50 bottles of a particular brand of cough syrup is selected and the alcohol content of each bottle is determined. Let m denote the average alcohol content for the population of all bottles of the brand under study. Suppose that the resulting 95 confidence interval is 7.8, 9.4. a. Would a 90 confidence interval calculated from this same sample have been narrower or wider than the given interval? Explain your reasoning. b. Consider the following statement: There is a 95 chance that m is between 7.8 and 9.4. Is this statement correct? Why or why not? c. Consider the following statement: We can be highly con- fident that 95 of all bottles of this type of cough syrup have an alcohol content that is between 7.8 and 9.4. Is this statement correct? Why or why not? d. Consider the following statement: If the process of select- ing a sample of size 50 and then computing the corre- sponding 95 interval is repeated 100 times, 95 of the resulting intervals will include m. Is this statement cor- rect? Why or why not? Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 4. A CI is desired for the true average stray-load loss m watts for a certain type of induction motor when the line current is held at 10 amps for a speed of 1500 rpm. Assume that stray- load loss is normally distributed with . a. Compute a 95 CI for m when and . b. Compute a 95 CI for m when and . c. Compute a 99 CI for m when and . d. Compute an 82 CI for m when and . e. How large must n be if the width of the 99 interval for m is to be 1.0? 5. Assume that the helium porosity in percentage of coal sam- ples taken from any particular seam is normally distributed with true standard deviation .75. a. Compute a 95 CI for the true average porosity of a cer- tain seam if the average porosity for 20 specimens from the seam was 4.85. b. Compute a 98 CI for true average porosity of another seam based on 16 specimens with a sample average poros- ity of 4.56. c. How large a sample size is necessary if the width of the 95 interval is to be .40? d. What sample size is necessary to estimate true average porosity to within .2 with 99 confidence? 6. On the basis of extensive tests, the yield point of a particular type of mild steel-reinforcing bar is known to be normally distributed with . The composition of bars has been slightly modified, but the modification is not believed to have affected either the normality or the value of s. a. Assuming this to be the case, if a sample of 25 modified bars resulted in a sample average yield point of 8439 lb, compute a 90 CI for the true average yield point of the modified bar. b. How would you modify the interval in part a to obtain a confidence level of 92? 7. By how much must the sample size n be increased if the width of the CI 7.5 is to be halved? If the sample size is increased by a factor of 25, what effect will this have on the width of the interval? Justify your assertions. 8. Let , with . Then a 1 1 a 2 5 a a 1 . 0, a 2 . s 5 100 x 5 58.3 n 5 100 x 5 58.3 n 5 100 x 5 58.3 n 5 100 x 5 58.3 n 5 25 s 5 3.0 a. Use this equation to derive a more general expression for a CI for m of which the interval 7.5 is a special case. b. Let and . Does this result in a narrower or wider interval than the interval 7.5?

9. a.

Under the same conditions as those leading to the interval 7.5, . Use this to derive a one-sided interval for m that has infinite width and provides a lower confidence bound on m. What is this interval for the data in Exercise 5a? b. Generalize the result of part a to obtain a lower bound with confidence level . c. What is an analogous interval to that of part b that pro- vides an upper bound on m? Compute this 99 interval for the data of Exercise 4a. 10. A random sample of heat pumps of a certain type yielded the following observations on lifetime in years: 2.0 1.3 6.0 1.9 5.1 .4 1.0 5.3 15.7 .7 4.8 .9 12.2 5.3 .6 a. Assume that the lifetime distribution is exponential and use an argument parallel to that of Example 7.5 to obtain a 95 CI for expected true average lifetime. b. How should the interval of part a be altered to achieve a confidence level of 99? c. What is a 95 CI for the standard deviation of the life- time distribution? [Hint: What is the standard deviation of an exponential random variable?] 11. Consider the next 1000 95 CIs for m that a statistical con- sultant will obtain for various clients. Suppose the data sets on which the intervals are based are selected independently of one another. How many of these 1000 intervals do you expect to capture the corresponding value of m? What is the probability that between 940 and 960 of these intervals contain the corresponding value of m? [Hint: Let Y the number among the 1000 intervals that contain m. What kind of random variable is Y?] n 5 15 1001 2 a P [X 2 ms 1n , 1.645] 5 .95 a 1 5 a 4, a 2 5 3a4 a 5 .05 1001 2 a P a2z a 1 , X 2 m s 1n , z a 2 b 5 1 2 a The CI for m given in the previous section assumed that the population distribution is normal with the value of s known. We now present a large-sample CI whose valid- ity does not require these assumptions. After showing how the argument leading to this interval generalizes to yield other large-sample intervals, we focus on an inter- val for a population proportion p. 7.2 Large-Sample Confidence Intervals for a Population Mean and Proportion Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. A Large-Sample Interval for m Let be a random sample from a population having a mean m and stan- dard deviation s. Provided that n is large, the Central Limit Theorem CLT implies that has approximately a normal distribution whatever the nature of the population distribution. It then follows that has approximately a standard normal distribution, so that An argument parallel to that given in Section 7.1 yields as a large- sample CI for m with a confidence level of approximately . That is, when n is large, the CI for m given previously remains valid whatever the popula- tion distribution, provided that the qualifier “approximately” is inserted in front of the confidence level. A practical difficulty with this development is that computation of the CI requires the value of s, which will rarely be known. Consider the standardized vari- able , in which the sample standard deviation S has replaced s. Previously, there was randomness only in the numerator of Z by virtue of . In the new standardized variable, both and S vary in value from one sample to another. So it might seem that the distribution of the new variable should be more spread out than the z curve to reflect the extra variation in the denominator. This is indeed true when n is small. However, for large n the subsititution of S for s adds little extra variability, so this variable also has approximately a standard normal distribution. Manipulation of the variable in a probability statement, as in the case of known s, gives a general large-sample CI for m. X X X 2 mS 1n 1001 2 a x 6 z a 2 s 1n P a2z a 2 , X 2 m s 1n , z a 2 b 1 2 a Z 5 X 2 ms 1n X X 1 , X 2 , c , X n PROPOSITION If n is sufficiently large, the standardized variable has approximately a standard normal distribution. This implies that 7.8 is a large-sample confidence interval for ␮ with confidence level approxi- mately . This formula is valid regardless of the shape of the pop- ulation distribution. 1001 2 a x 6 z a 2 s 1n Z 5 X 2 m S 1n In words, the CI 7.8 is point estimate of z critical value estimated standard error of the mean. Generally speaking, will be sufficient to justify the use of this interval. This is somewhat more conservative than the rule of thumb for the CLT because of the additional variability introduced by using S in place of s. Haven’t you always wanted to own a Porsche? The author thought maybe he could afford a Boxster, the cheapest model. So he went to www.cars.com on Nov. 18, 2009, and found a total of 1113 such cars listed. Asking prices ranged from 3499 n . 40 m 6 Example 7.6 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.