. . . , the kth population moment, or kth moment of the distribution f x,

To illustrate, the survival-time data mentioned in Example 4.24 is 152 115 109 94 88 137 152 77 160 165 125 40 128 123 136 101 62 153 83 69 with ⫽ 113.5 and . The estimates are These estimates of a and b differ from the values suggested by Gross and Clark because they used a different estimation technique. ■ a ˆ 5 113.5 2 14,087.8 2 113.5 2 5 10.7 bˆ 5 14,087.8 2 113.5 2 113.5 5 10.6 120 g x i 2 5 14,087.8 x a ˆ 5 X 2 1n gX i 2 2 X 2 bˆ 5 1n gX i 2 2 X 2 X Example 6.14 Let X 1 , . . . , X n be a random sample from a generalized negative binomial distribution with parameters r and p see Section 3.5. Since EX ⫽ r1 ⫺ pp and VX ⫽ r 1 ⫺ pp 2 , EX 2 ⫽ VX ⫹ [EX ] 2 ⫽ r 1 ⫺ pr ⫺ rp ⫹ 1p 2 . Equating EX to and EX 2 to eventually gives As an illustration, Reep, Pollard, and Benjamin “Skill and Chance in Ball Games,” J. of Royal Stat. Soc., 1971: 623–629 consider the negative binomial dis- tribution as a model for the number of goals per game scored by National Hockey League teams. The data for 1966–1967 follows 420 games: pˆ 5 X 1n gX i 2 2 X 2 rˆ 5 X 2 1n gX i 2 2 X 2 2 X 1n gX i 2 X Goals 1 2 3 4 5 6 7 8 9 10 Frequency 29 71 82 89 65 45 24 7 4 1 3 Then, ⫽ ⌺x i 420 ⫽ [029 ⫹ 171 ⫹ . . . ⫹ 103]420 ⫽ 2.98 and Thus, Although r by definition must be positive, the denominator of rˆ could be negative, indicating that the negative binomial distribution is not appropriate or that the moment estimator is flawed. ■ Maximum Likelihood Estimation The method of maximum likelihood was first introduced by R. A. Fisher, a geneti- cist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties see the proposition on page 262. pˆ 5 2.98 12.40 2 2.98 2 5 .85 rˆ 5 2.98 2 12.40 2 2.98 2 2 2.98 5 16.5 gx i 2 420 5 [0 2 29 1 1 2 71 1 c 1 10 2 3]420 5 12.40 x Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. A sample of ten new bike helmets manufactured by a certain company is obtained. Upon testing, it is found that the first, third, and tenth helmets are flawed, whereas the others are not. Let p ⫽ Pflawed helmet, i.e., p is the proportion of all such hel- mets that are flawed. Define Bernoulli random variables X 1 , X 2 , , X 10 by Then for the obtained sample, X 1 ⫽ X 3 ⫽ X 10 ⫽ 1 and the other seven X i ’s are all zero. The probability mass function of any particular X i is 1 ⫺ p 1⫺ , which becomes p if x i ⫽ 1 and 1 ⫺ p when x i ⫽ 0. Now suppose that the conditions of various helmets are independent of one another. This implies that the X i ’s are independent, so their joint probability mass function is the product of the individual pmf’s. Thus the joint pmf evaluated at the observed X i ’s is 6.4 Suppose that p ⫽ .25. Then the probability of observing the sample that we actually obtained is .25 3 .75 7 ⫽ .002086. If instead p ⫽ .50, then this probability is .50 3 .50 7 ⫽ .000977. For what value of p is the obtained sample most likely to have occurred? That is, for what value of p is the joint pmf 6.4 as large as it can be? What value of p maximizes 6.4? Figure 6.5a shows a graph of the likelihood 6.4 as a function of p. It appears that the graph reaches its peak above p ⫽ .3 ⫽ the proportion of flawed helmets in the sample. Figure 6.5b shows a graph of the nat- ural logarithm of 6.4; since ln[gu] is a strictly increasing function of gu, find- ing u to maximize the function gu is the same as finding u to maximize ln[gu]. f x 1 , . . . , x 10 ; p 5 p1 2 pp c p 5 p 3 1 2 p 7 x i p x i X 1 5 e 1 if 1st helmet is flawed 0 if 1st helmet isn’t flawed . . . X 10 5 e 1 if 10th helmet is flawed 0 if 10th helmet isn’t flawed c 0.0 0.0000 0.0005 0.0010 Likelihood lnlikelihood 0.0015 0.0020 0.0025 0.2 0.4 p p 0.6 0.8 1.0 0.0 –50 –40 –30 –20 –10 0.2 0.4 0.6 0.8 1.0 Figure 6.5 a Graph of the likelihood joint pmf 6.4 from Example 6.15 b Graph of the natural logarithm of the likelihood Example 6.15 We can verify our visual impression by using calculus to find the value of p that maximizes 6.4. Working with the natural log of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its log- arithm will be a sum. Here 6.5 ln[ fx 1 , . . . , x 10 ; p] 5 ln[p 3 1 2 p 7 ] 5 3lnp 1 7ln1 2 p Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Thus [the ⫺1 comes from the chain rule in calculus]. Equating this derivative to 0 and solving for p gives 31 ⫺ p ⫽ 7p, from which 3 ⫽ 10p and so p ⫽ 310 ⫽ .30 as conjectured. That is, our point estimate is ⫽ .30. It is called the maximum like- lihood estimate because it is the parameter value that maximizes the likelihood joint pmf of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5. Suppose that rather than being told the condition of every helmet, we had only been informed that three of the ten were flawed. Then we would have the observed value of a binomial random variable X ⫽ the number of flawed helmets. The pmf of X is For x ⫽ 3, this becomes The binomial coefficient is irrelevant to the maximization, so again ■ pˆ 5 .30. A 10 3 B p 3 1 2 p 7 . A 10 3 B A 10 x B p x 1 2 p

102x

. pˆ 5 3 p 2 7 1 2 p d dp 5ln[ fx 1 , . . . , x 10 ; p] 6 5 d dp 53lnp 1 7ln1 2 p6 5 3 p 1 7 1 2 p 21 Let X 1 , X 2 , . . . , X n have joint pmf or pdf f x 1 , x 2 , . . . , x n ; 1 , . . . , m 6.6 where the parameters 1 , . . . , m have unknown values. When x 1 , . . . , x n are the observed sample values and 6.6 is regarded as a function of 1 , . . . , m , it is called the likelihood function. The maximum likelihood estimates mle’s are those values of the i ’s that maximize the likelihood function, so that When the X i ’s are substituted in place of the x i ’s, the maximum likelihood estimators result. f x 1 , c , x n ; ˆu 1 , c , ˆu m fx 1 , c , x n ; u 1 , c , u m for all u 1 , c , u m u uˆ 1 , c , uˆ m u u u u u u DEFINITION The likelihood function tells us how likely the observed sample is as a func- tion of the possible parameter values. Maximizing the likelihood gives the parame- ter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most closely” with the observed data. Suppose X 1 , X 2 , . . . , X n is a random sample from an exponential distribution with parameter l. Because of independence, the likelihood function is a product of the individual pdf’s: f x 1 , . . . , x n ; l ⫽ le ⫺lx1 ⭈ . . . ⭈ le ⫺lxn ⫽ l n e ⫺l⌺x i The natural logarithm of the likelihood function is ln[ fx 1 , . . . , x n ; l] ⫽ n lnl ⫺ l⌺x i Equating ddl[lnlikelihood] to zero results in nl ⫺ ⌺x i ⫽ 0, or l ⫽ n ⌺x i ⫽ 1 . Thus the mle is ; it is identical to the method of moments estimator [but it is not an unbiased estimator, since E1 ⬆ 1E ]. ■ X X ˆl 5 1X x Example 6.16 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Let X 1 , . . . , X n be a random sample from a normal distribution. The likelihood function is so To find the maximizing values of and s 2 , we must take the partial derivatives of ln f with respect to and s 2 , equate them to zero, and solve the resulting two equa- tions. Omitting the details, the resulting mle’s are The mle of s 2 is not the unbiased estimator, so two different principles of estimation unbiasedness and maximum likelihood yield two different estimators. ■ In Chapter 3, we mentioned the use of the Poisson distribution for modeling the number of “events” that occur in a two-dimensional region. Assume that when the region R being sampled has area aR, the number X of events occurring in R has a Poisson distribution with parameter laR where l is the expected number of events per unit area and that nonoverlapping regions yield independent X’s. Suppose an ecologist selects n nonoverlapping regions R 1 , . . . , R n and counts the number of plants of a certain species found in each region. The joint pmf like- lihood is then The lnlikelihood is ln[ px 1 , . . . , x n ; l] ⫽ gx i ⴢ ln[aR i ] ⫹ lnl ⴢ g x i ⫺ l gaR i ⫺ glnx i Taking ddl [lnp] and equating it to zero yields so The mle is then . This is intuitively reasonable because l is the true density plants per unit area, whereas is the sample density since ⌺aR i is just the total area sampled. Because EX i ⫽ l ⴢ aR i , the estimator is unbiased. Sometimes an alternative sampling procedure is used. Instead of fixing regions to be sampled, the ecologist will select n points in the entire region ˆl ˆl 5 gX i gaR i l 5 g x i gaR i gx i l 2 gaR i 5 0 5 [aR 1 ] x 1 c [aR n ] x n l g x i e 2l g aR i x 1 c x n px 1 , c , x n ; l 5 [l a R 1 ] x 1 e 2l a R 1 x 1 c [l a R n ] x n e 2l a R n x n m ˆ 5 X s ˆ 2 5 g X i 2 X 2 n m m ln[fx 1 , c , x n ; m, s 2 ] 5 2 n 2 ln 2ps 2 2 1 2s 2 gx i 2 m 2 5 a 1 2ps 2 b n 2 e 2 g x i 2m 2 2s 2 f x 1 , c , x n ; m, s 2 5 1 12ps 2 e 2 x 1 2m 2 2s 2 c 1 12ps 2 e 2 x n 2m 2 2s 2 Example 6.17 Example 6.18 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. of interest and let y i ⫽ the distance from the ith point to the nearest plant. The cumulative distribution function cdf of Y ⫽ distance to the nearest plant is Taking the derivative of F Y y with respect to y yields If we now form the likelihood f Y y 1 ; l ⭈ . . . ⭈ f Y y n ; l, differentiate lnlikelihood, and so on, the resulting mle is which is also a sample density. It can be shown that in a sparse environment small l , the distance method is in a certain sense better, whereas in a dense environment the first sampling method is better. ■ Let X 1 , . . . , X n be a random sample from a Weibull pdf Writing the likelihood and lnlikelihood, then setting both ⭸⭸a[lnf ] ⫽ 0 and ⭸⭸b[ln f ] ⫽ 0, yields the equations These two equations cannot be solved explicitly to give general formulas for the mle’s and . Instead, for each sample x 1 , . . . , x n , the equations must be solved using an iterative numerical procedure. Even moment estimators of a and b are somewhat complicated see Exercise 21. ■ Estimating Functions of Parameters In Example 6.17, we obtained the mle of s 2 when the underlying distribution is nor- mal. The mle of , as well as that of many other mle’s, can be easily derived using the following proposition. s 5 1s 2 ˆb a ˆ a 5 c g x i a ln x i gx i a 2 g lnx i n d 2 1 b 5 a g x i a n b 1a f x; a, b 5 • a b a x a2 1 e 2 xba x otherwise ˆl 5 n p gY i 2 5 number of plants observed total area sampled f Y y; l 5 e 2plye 2lpy 2 y otherwise 5 1 2 e 2lpy 2 lpy 2 5 1 2 e 2l py 2 F Y y 5 PY y 5 1 2 PY . y 5 1 2 P a no plants in a circle of radius yb Example 6.19 The Invariance Principle Let be the mle’s of the parameters 1 , 2 , . . . , m . Then the mle of any function h 1 , 2 , . . . , m of these parameters is the function of the mle’s. h ˆu 1 , ˆu 2 , c , ˆu m u u u u u u ˆu 1 , ˆu 2 , c , ˆu m PROPOSITION Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. In the normal case, the mle’s of and s 2 are ⫽ and . To obtain the mle of the function , substitute the mle’s into the function: The mle of s is not the sample standard deviation S, though they are close unless n is quite small. ■ The mean value of an rv X that has a Weibull distribution is ⫽ b ⭈ ⌫ 1 ⫹ 1a The mle of is therefore , where and are the mle’s of a and b . In particular, is not the mle of , though it is an unbiased estimator. At least for large n, is a better estimator than . For the data given in Example 6.3, the mle’s of the Weibull parameters are , from which . This estimate is quite close to the sample mean 73.88. ■ Large Sample Behavior of the MLE Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s. m ˆ 5 73.80 a ˆ 5 11.9731 and bˆ 5 77.0153 X m ˆ m X bˆ a ˆ m ˆ 5 ˆb⌫1 1 1aˆ m m s ˆ 5 2sˆ 2 5 c 1 n gX i 2 X 2 d 12 h m, s 2 5 2s 2 5 s s ˆ 2 5 gX i 2 X 2 n X ˆm m Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter is approximately unbiased and has variance that is either as small as or nearly as small as can be achieved by any estimator. Stated another way, the mle is approximately the MVUE of . u ˆu [E ˆu u] u Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s though often numerical methods, such as Newton’s method, are necessary, maximum likelihood estimation is the most widely used estimation tech- nique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified. Some Complications Sometimes calculus cannot be used to obtain mle’s. Suppose my waiting time for a bus is uniformly distributed on [0, ] and the results x 1 , . . . , x n of a random sample from this distribution have been observed. Since fx; ⫽ 1 for 0 ⱕ x ⱕ and 0 otherwise, f x 1 , c , x n ; u 5 u 1 u n 0 x 1 u , c , 0 x n u otherwise u u u u Example 6.20 Example 6.17 continued Example 6.22 PROPOSITION Example 6.21 Example 6.19 continued Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. As long as maxx i ⱕ , the likelihood is 1 n , which is positive, but as soon as ⬍ maxx i , the likelihood drops to 0. This is illustrated in Figure 6.6. Calculus will not work because the maximum of the likelihood occurs at a point of discontinuity, but the figure shows that . Thus if my waiting times are 2.3, 3.7, 1.5, .4, and 3.2, then the mle is From Example 6.4, the mle is not unbiased. ■ A method that is often used to estimate the size of a wildlife population involves per- forming a capturerecapture experiment. In this experiment, an initial sample of M animals is captured, each of these animals is tagged, and the animals are then returned to the population. After allowing enough time for the tagged individuals to mix into the population, another sample of size n is captured. With X ⫽ the number of tagged animals in the second sample, the objective is to use the observed x to esti- mate the population size N. The parameter of interest is ⫽ N, which can assume only integer values, so even after determining the likelihood function pmf of X here, using calculus to obtain N would present difficulties. If we think of a success as a previously tagged animal being recaptured, then sampling is without replacement from a population containing M successes and N ⫺ M failures, so that X is a hypergeometric rv and the likelihood function is The integer-valued nature of N notwithstanding, it would be difficult to take the derivative of px; N. However, if we consider the ratio of px; N to px; N ⫺ 1, we have This ratio is larger than 1 if and only if iff N ⬍ Mnx. The value of N for which p x; N is maximized is therefore the largest integer less than Mnx. If we use stan- dard mathematical notation [r] for the largest integer less than or equal to r, the mle of N is ⫽ [Mnx]. As an illustration, if M ⫽ 200 fish are taken from a lake and tagged, and subsequently n ⫽ 100 fish are recaptured, and among the 100 there are x ⫽ 11 tagged fish, then ⫽ [20010011] ⫽ [1818.18] ⫽ 1818. The estimate is actually rather intuitive; xn is the proportion of the recaptured sample that is tagged, whereas MN is the proportion of the entire population that is tagged. The estimate is obtained by equating these two proportions estimating a population proportion by a sample proportion. ■ Nˆ Nˆ p x; N p x; N 2 1 5 N 2 M N 2 n N N 2 M 2 n 1 x p x; N 5 hx; n, M, N 5 Q M x R Q N 2 M n 2 x R Q N n R u ˆu 5 3.7. ˆu 5 maxX i u u u Example 6.23 maxx i ␪ Likelihood Figure 6.6 The likelihood function for Example 6.22 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. Suppose X 1 , X 2 , . . . , X n is a random sample from a pdf fx; that is symmetric about but that the investigator is unsure of the form of the f function. It is then desirable to use an estimator that is robust—that is, one that performs well for a wide variety of underlying pdf’s. One such estimator is a trimmed mean. In recent years, statisticians have proposed another type of estimator, called an M-estimator, based on a generalization of maximum likelihood estimation. Instead of maximizing the log likelihood ⌺ln[ fx; ] for a specified f, one maximizes ⌺r x i ; . The “objective function” r is selected to yield an estimator with good robustness properties. The book by David Hoaglin et al. see the bibliography contains a good exposition on this subject. u u ˆu u u EXERCISES Section 6.2 20–30 20. A diagnostic test for a certain disease is applied to n individ- uals known to not have the disease. Let X ⫽ the number among the n test results that are positive indicating pres- ence of the disease, so X is the number of false positives and p ⫽ the probability that a disease-free individual’s test result is positive i.e., p is the true proportion of test results from disease-free individuals that are positive. Assume that only X is available rather than the actual sequence of test results. a. Derive the maximum likelihood estimator of p. If n ⫽ 20 and x ⫽ 3, what is the estimate? b. Is the estimator of part a unbiased? c. If n ⫽ 20 and x ⫽ 3, what is the mle of the probability 1 ⫺ p 5 that none of the next five tests done on disease- free individuals are positive? 21. Let X have a Weibull distribution with parameters a and b , so E X ⫽ b ⭈ ⌫1 ⫹ 1a V X ⫽ b 2 {⌫1 ⫹ 2a ⫺ [⌫1 ⫹ 1a] 2 } a. Based on a random sample X 1 , . . . , X n , write equations for the method of moments estimators of b and a. Show that, once the estimate of a has been obtained, the esti- mate of b can be found from a table of the gamma func- tion and that the estimate of a is the solution to a complicated equation involving the gamma function. b. If n ⫽ 20, ⫽ 28.0, and , compute the estimates. [Hint: [⌫1.2] 2 ⌫1.4 ⫽ .95.] 22. Let X denote the proportion of allotted time that a randomly selected student spends working on a certain aptitude test. Suppose the pdf of X is where ⫺1 ⬍ . A random sample of ten students yields data x 1 ⫽ .92, x 2 ⫽ .79, x 3 ⫽ .90, x 4 ⫽ .65, x 5 ⫽ .86, x 6 ⫽ .47, x 7 ⫽ .73, x 8 ⫽ .97, x 9 ⫽ .94, x 10 ⫽ .77. u f x; u 5 e u 1 1x u 0 x 1 otherwise gx i 2 5 16,500 x a. Use the method of moments to obtain an estimator of , and then compute the estimate for this data. b. Obtain the maximum likelihood estimator of , and then compute the estimate for the given data. 23. Two different computer systems are monitored for a total of n weeks. Let X i denote the number of breakdowns of the first system during the ith week, and suppose the X i ’s are independent and drawn from a Poisson distribution with parameter m 1 . Similarly, let Y i denote the number of break- downs of the second system during the ith week, and assume independence with each Y i Poisson with parameter m 2 . Derive the mle’s of m 1 , m 2 , and m 1 ⫺ m 2 . [Hint: Using independence, write the joint pmf likelihood of the X i ’s and Y i ’s together.] 24. A vehicle with a particular defect in its emission control sys- tem is taken to a succession of randomly selected mechanics until r ⫽ 3 of them have correctly diagnosed the problem. Suppose that this requires diagnoses by 20 different mechan- ics so there were 17 incorrect diagnoses. Let p ⫽ Pcorrect diagnosis, so p is the proportion of all mechanics who would correctly diagnose the problem. What is the mle of p? Is it the same as the mle if a random sample of 20 mechan- ics results in 3 correct diagnoses? Explain. How does the mle compare to the estimate resulting from the use of the unbi- ased estimator given in Exercise 17? 25. The shear strength of each of ten test spot welds is deter- mined, yielding the following data psi: 392 376 401 367 389 362 409 415 358 375 a. Assuming that shear strength is normally distributed, estimate the true average shear strength and standard deviation of shear strength using the method of maxi- mum likelihood. b. Again assuming a normal distribution, estimate the strength value below which 95 of all welds will have their strengths. [Hint: What is the 95th percentile in terms of and s? Now use the invariance principle.] m u u Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. 26. Refer to Exercise 25. Suppose we decide to examine another test spot weld. Let X ⫽ shear strength of the weld. Use the given data to obtain the mle of PX ⱕ 400. [Hint: P X ⱕ 400 ⫽ ⌽400 ⫺ s.] 27. Let X 1 , . . . , X n be a random sample from a gamma distribu- tion with parameters a and b. a. Derive the equations whose solutions yield the maximum likelihood estimators of a and b. Do you think they can be solved explicitly? b. Show that the mle of ⫽ ab is . 28. Let X 1 , X 2 , . . . , X n represent a random sample from the Rayleigh distribution with density function given in Exercise 15. Determine a. The maximum likelihood estimator of , and then calcu- late the estimate for the vibratory stress data given in that exercise. Is this estimator the same as the unbiased esti- mator suggested in Exercise 15? b. The mle of the median of the vibratory stress distribu- tion. [Hint: First express the median in terms of .] 29. Consider a random sample X 1 , X 2 , . . . , X n from the shifted exponential pdf f x; l, u 5 e le 2l x2u x u otherwise u u ˆm 5 X m m Taking ⫽ 0 gives the pdf of the exponential distribution considered previously with positive density to the right of zero. An example of the shifted exponential distribution appeared in Example 4.5, in which the variable of interest was time headway in traffic flow and ⫽ .5 was the mini- mum possible time headway. a. Obtain the maximum likelihood estimators of and l. b. If n ⫽ 10 time headway observations are made, result- ing in the values 3.11, .64, 2.55, 2.20, 5.44, 3.42, 10.39, 8.93, 17.82, and 1.30, calculate the estimates of and l. 30. At time t ⫽ 0, 20 identical components are tested. The life- time distribution of each is exponential with parameter l. The experimenter then leaves the test facility unmonitored. On his return 24 hours later, the experimenter immediately terminates the test after noticing that y ⫽ 15 of the 20 com- ponents are still in operation so 5 have failed. Derive the mle of l. [Hint: Let Y ⫽ the number that survive 24 hours. Then Y ⬃ Binn, p. What is the mle of p? Now notice that p ⫽ P X i ⱖ 24, where X i is exponentially distributed. This relates l to p, so the former can be estimated once the latter has been.] u u u u

31. An estimator is said to be consistent if for any ⬎ 0,

as n . That is, is consistent if, as the sample size gets larger, it is less and less likely that will be further than from the true value of . Show that is a consistent estimator of when s 2 ⬍ by using Chebyshev’s inequality from Exercise 44 of Chapter 3. [Hint: The inequality can be rewritten in the form Now identify Y with .]

32. a.

Let X 1 , . . . , X n be a random sample from a uniform distri- bution on [0, ]. Then the mle of is . ˆu 5 Y 5 maxX i u u X P |Y 2 m Y | P s Y 2 P ` m X u P ˆu ˆu ` S P | ˆu 2 u| P S 0 P ˆu 33. At time t ⫽ 0, there is one individual alive in a certain pop- ulation. A pure birth process then unfolds as follows. The time until the first birth is exponentially distributed with parameter l. After the first birth, there are two individuals alive. The time until the first gives birth again is exponential with parameter l, and similarly for the second individual. Therefore, the time until the next birth is the minimum of two exponential l variables, which is exponential with parameter 2l. Similarly, once the second birth has occurred, there are three individuals alive, so the time until the next birth is an exponential rv with parameter 3l, and so on the memoryless property of the exponential distribution is being used here. Suppose the process is observed until the sixth birth has occurred and the successive birth times are 25.2, 41.7, 51.2, 55.5, 59.5, 61.8 from which you should calculate the times between successive births. Derive the mle of l. [Hint: The likelihood is a product of exponential terms.]

34. The mean squared error of an estimator

is . If is unbiased, then but in general . Consider the esti- MSE ˆu 5 V ˆu 1 bias 2 MSE ˆu 5 V ˆu, ˆu MSE ˆu 5 E ˆu 2 u 2 ˆu SUPPLEMENTARY EXERCISES 31–38 Use the fact that Y ⱕ y iff each X i ⱕ y to derive the cdf of Y. Then show that the pdf of Y ⫽ maxX i is b. Use the result of part a to show that the mle is biased but that n ⫹ 1maxX i n is unbiased. f Y y 5 u ny n2 1 u n 0 y u otherwise mator , where S 2 ⫽ sample variance. What value of s ˆ 2 5 KS 2 Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it. K minimizes the mean squared error of this estimator when the population distribution is normal? [Hint: It can be shown that E [S 2 2 ] ⫽ n ⫹ 1s 4 n ⫺ 1 In general, it is difficult to find to minimize , which is why we look only at unbiased estimators and minimize .] 35. Let X 1 , . . . , X n be a random sample from a pdf that is symmet- ric about . An estimator for that has been found to perform well for a variety of underlying distributions is the Hodges –Lehmann estimator. To define it, first compute for each i ⱕ j and each j ⫽ 1, 2, . . . , n the pairwise average i,j ⫽ X i ⫹ X j 2. Then the estimator is ⫽ the median of the i,j ’s. Compute the value of this estimate using the data of Exercise 44 of Chapter 1. [Hint: Construct a square table with the x i ’s listed on the left margin and on top. Then compute averages on and above the diagonal.] 36. When the population distribution is normal, the statistic median {| X 1 ⫺ |, . . . , | X n ⫺ |}.6745 can be used to estimate s. This estimator is more resistant to the effects of outliers observations far from the bulk of the data X | X | X m ˆ X m m V ˆu MSE ˆu ˆu than is the sample standard deviation. Compute both the corresponding point estimate and s for the data of Example 6.2. 37. When the sample standard deviation S is based on a random sample from a normal population distribution, it can be shown that Use this to obtain an unbiased estimator for s of the form cS . What is c when n ⫽ 20? 38. Each of n specimens is to be weighed twice on the same scale. Let X i and Y i denote the two observed weights for the ith specimen. Suppose X i and Y i are independent of one another, each normally distributed with mean value i the true weight of specimen i and variance s 2 . a. Show that the maximum likelihood estimator of s 2 is . [Hint: If ⫽ z 1 ⫹ z 2 2, then ⌺ z i ⫺ 2 ⫽ z 1 ⫺ z 2 2 2.] b. Is the mle an unbiased estimator of s 2 ? Find an unbiased estimator of s 2 . [Hint: For any rv Z, EZ 2 ⫽ V Z ⫹ [EZ] 2 . Apply this to Z ⫽ X i ⫺ Y i .] s ˆ 2 z z s ˆ 2 5 gX i 2 Y i 2 4n m E S 5 12n 2 1⌫n2s⌫n 2 12 Bibliography DeGroot, Morris, and Mark Schervish, Probability and Statistics 3rd ed., Addison-Wesley, Boston, MA, 2002. Includes an excellent discussion of both general properties and methods of point estimation; of particular interest are examples show- ing how general principles and methods can yield unsatisfac- tory estimators in particular situations. Devore, Jay, and Kenneth Berk, Modern Mathematical Statistics with Applications, Thomson-BrooksCole, Belmont, CA, 2007. The exposition is a bit more comprehensive and sophisticated than that of the current book. Efron, Bradley, and Robert Tibshirani, An Introduction to the Bootstrap, Chapman and Hall, New York, 1993. The bible of the bootstrap. Hoaglin, David, Frederick Mosteller, and John Tukey, Understanding Robust and Exploratory Data Analysis, Wiley, New York, 1983. Contains several good chapters on robust point estimation, including one on M-estimation. Rice, John, Mathematical Statistics and Data Analysis 3rd ed., Thomson-BrooksCole, Belmont, CA, 2007. A nice blending of statistical theory and data. Copyright 2010 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook andor eChapters. Editorial review has deemed that any suppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.