.098 . Alternatively, since the largest value of pq is attained when p ⫽ q ⫽ .5, an upper bound on the standard error is 11(4n) 5 .10.
1(.6)(.4)25 5 .098 . Alternatively, since the largest value of pq is attained when p ⫽ q ⫽ .5, an upper bound on the standard error is 11(4n) 5 .10.
■ When the point estimator has approximately a normal distribution, which will uˆ
often be the case when n is large, then we can be reasonably confident that the true value of lies within approximately 2 standard errors (standard deviations) of . Thus u uˆ if a sample of n ⫽ 36 component lifetimes gives m ˆ 5 x 5 28.50 and s ⫽ 3.60, then s1n 5 .60 , so within 2 estimated standard errors, m ˆ translates to the interval
28.50 ⫾ (2)(.60) ⫽ (27.30, 29.70). If is not necessarily approximately normal but is unbiased, then it can be uˆ
shown that the estimate will deviate from by as much as 4 standard errors at most u
6 of the time. We would then expect the true value to lie within 4 standard errors of (and this is a very conservative statement, since it applies to any unbiased ). Summarizing, the standard error tells us roughly within what distance of we can uˆ expect the true value of to lie. u
The form of the estimator may be sufficiently complicated so that standard
statistical theory cannot be applied to obtain an expression for . This is true, for s uˆ example, in the case u ⫽ s, uˆ 5 S ; the standard deviation of the statistic S, s S , cannot in general be determined. In recent years, a new computer-intensive
method called the bootstrap has been introduced to address this problem. Suppose that the population pdf is f (x; ), a member of a particular parametric family, u
and that data x 1 ,x 2 ,...,x n gives uˆ 5 21.7 . We now use the computer to obtain
“bootstrap samples” from the pdf f(x; 21.7), and for each sample we calculate a “bootstrap estimate” uˆ :
CHAPTER 6 Point Estimation
First bootstrap sample: x 1 , x 2 , c, x ; estimate 5 uˆ n 1 Second bootstrap sample: x 1 , x , c, x 2 ; estimate 5 uˆ n 2
( Bth bootstrap sample: 1 , x x 2 , c, x n ; estimate 5 uˆ B
B ⫽ 100 or 200 is often used. Now let u 5 ⌺uˆ i B , the sample mean of the bootstrap
estimates. The bootstrap estimate of ’s standard error is now just the sample stan- u ˆ dard deviation of the uˆ i ’s :
(In the bootstrap literature, B is often used in place of B ⫺ 1; for typical values of B, there is usually little difference between the resulting estimates.)
Example 6.11
A theoretical model suggests that X, the time to breakdown of an insulating fluid between electrodes at a particular voltage, has f (x; l) ⫽ le ⫺lx , an exponential distri- bution. A random sample of n ⫽ 10 breakdown times (min) gives the following data:
Since E(X) ⫽ 1l, E( ) ⫽ 1l, so a reasonable estimate of l is X lˆ 5 1 x 5 155.087
5 .018153 . We then used a statistical computer package to obtain B ⫽ 100 bootstrap
samples, each of size 10, from f (x; .018153). The first such sample was 41.00, 109.70, 16.78, 6.31, 6.76, 5.62, 60.96, 78.81, 192.25, 27.61, from which
gx i 5 545.8 and lˆ 1 5 154.58 5 .01832 . The average of the 100 bootstrap esti-
mates is l 5 .02153 , and the sample standard deviation of these 100 estimates is
s l ˆ 5 .0091 , the bootstrap estimate of ˆl ’s standard error. A histogram of the 100 ˆl i ’s
was somewhat positively skewed, suggesting that the sampling distribution of ˆl also has this property.
■
Sometimes an investigator wishes to estimate a population characteristic without assuming that the population distribution belongs to a particular parametric family. An instance of this occurred in Example 6.7, where a 10 trimmed mean was proposed for estimating a symmetric population distribution’s center . The data of Example 6.2 gave u uˆ 5 x tr(10) 5 27.838 , but now there is no assumed f (x; ), so how can we obtain a boot- u strap sample? The answer is to regard the sample itself as constituting the population (the n ⫽ 20 observations in Example 6.2) and take B different samples, each of size n, with replacement from this population. The book by Bradley Efron and Robert Tibshirani or the one by John Rice listed in the chapter bibliography provides more information.
EXERCISES Section 6.1 (1–19)
1. The accompanying data on flexural strength (MPa) for con-
a. Calculate a point estimate of the mean value of strength
crete beams of a certain type was introduced in Example 1.2.
for the conceptual population of all beams manufactured in this fashion, and state which estimator you used. [Hint:
5.9 7.2 7.3 6.3 8.1 6.8 7.0 ⌺x i ⫽ 219.8.]
7.6 6.8 6.5 7.0 6.3 7.9 9.0 b. Calculate a point estimate of the strength value that sepa- rates the weakest 50 of all such beams from the
8.2 8.7 7.8 9.7 7.4 7.7 9.7 strongest 50, and state which estimator you used.
6.1 Some General Concepts of Point Estimation
c. Calculate and interpret a point estimate of the population
a. Use rules of expected value to show that X ⫺ is an unbi- Y
standard deviation s. Which estimator did you use? [Hint:
ased estimator of 1 ⫺ m 2 . Calculate the estimate for the
gx i 2
given data.
d. Calculate a point estimate of the proportion of all such
b. Use rules of variance from Chapter 5 to obtain an expres-
beams whose flexural strength exceeds 10 MPa. [Hint:
sion for the variance and standard deviation (standard
Think of an observation as a “success” if it exceeds 10.]
error) of the estimator in part (a), and then compute the
e. Calculate a point estimate of the population coefficient of
estimated standard error.
variation s , and state which estimator you used.
c. Calculate a point estimate of the ratio s 1 s 2 of the two
2. A sample of 20 students who had recently taken elementary
standard deviations.
statistics yielded the following information on the brand of
d. Suppose a single beam and a single cylinder are randomly
calculator owned (T ⫽ Texas Instruments, H ⫽ Hewlett
selected. Calculate a point estimate of the variance of the dif-
Packard, C ⫽ Casio, S ⫽ Sharp):
ference X ⫺ Y between beam strength and cylinder strength.
T
H T C T
S C H 5. As an example of a situation in which several different statis- tics could reasonably be used to calculate a point estimate,
consider a population of N invoices. Associated with each
a. Estimate the true proportion of all such students who own
invoice is its “book value,” the recorded amount of that
a Texas Instruments calculator.
invoice. Let T denote the total book value, a known amount.
b. Of the 10 students who owned a TI calculator, 4 had
Some of these book values are erroneous. An audit will be
graphing calculators. Estimate the proportion of students
carried out by randomly selecting n invoices and determining
who do not own a TI graphing calculator.
the audited (correct) value for each one. Suppose that the sample gives the following results (in dollars).
3. Consider the following sample of observations on coating thickness for low-viscosity paint (“Achieving a Target Value
Invoice
for a Manufacturing Process: A Case Study,” J. of Quality Technology, 1992: 22–26):
Book value
Audited value
Assume that the distribution of coating thickness is normal (a normal probability plot strongly supports this assumption).
Let
a. Calculate a point estimate of the mean value of coating thickness, and state which estimator you used.
Y ⫽ sample mean book value
b. Calculate a point estimate of the median of the coating
X ⫽ sample mean audited value
D
thickness distribution, and state which estimator you used.
⫽ sample mean error
c. Calculate a point estimate of the value that separates the
largest 10 of all values in the thickness distribution from the
Propose three different statistics for estimating the total
remaining 90, and state which estimator you used. [Hint:
audited (i.e., correct) value—one involving just N and , X
Express what you are trying to estimate in terms of m and s.]
another involving T, N, and , and the last involving T and D
d. Estimate P(X ⬍ 1.5), i.e., the proportion of all thickness
X . If N Y ⫽ 5000 and T ⫽ 1,761,300, calculate the three
values less than 1.5. [Hint: If you knew the values of
corresponding point estimates. (The article “Statistical
and s, you could calculate this probability. These values
Models and Analysis in Auditing,” Statistical Science, 1989:
are not available, but they can be estimated.]
2–33 discusses properties of these estimators.)
e. What is the estimated standard error of the estimator that
6. Consider the accompanying observations on stream flow
you used in part (b)?
(1000s of acre-feet) recorded at a station in Colorado for the
4. The article from which the data in Exercise 1 was extracted also
period April 1–August 31 over a 31-year span (from an arti-
gave the accompanying strength observations for cylinders:
cle in the 1974 volume of Water Resources Research). 6.1 5.8 7.8 7.1 7.2 9.2 6.6 8.3 7.0 8.3 127.96 210.07 203.24 108.91 178.21
Prior to obtaining data, denote the beam strengths by X ,...,
X m and the cylinder strengths by Y 1 ,...,Y n . Suppose that
the X i ’s constitute a random sample from a distribution with
mean m 1 and standard deviation s 1 and that the Y i ’s form a
random sample (independent of the X i ’s) from another
distribution with mean m 2 and standard deviation s 2 .
CHAPTER 6 Point Estimation
An appropriate probability plot supports the use of the log-
a. Find an unbiased estimator of m and compute the estimate
normal distribution (see Section 4.5) as a reasonable model
for the data. [Hint: E(X ) ⫽ m for X Poisson, so E( ) ⫽ ?] X
for stream flow.
b. What is the standard deviation (standard error) of your
a. Estimate the parameters of the distribution. [Hint:
estimator? Compute the estimated standard error. [Hint:
Remember that X has a lognormal distribution with
2 X 5m parameters and for X Poisson.] s if ln(X) is normally distributed with
s 2
s
mean and variance 2 .]
10. Using a long rod that has length , you are going to lay out
a square plot in which the length of each side is . Thus the
b. Use the estimates of part (a) to calculate an estimate of the
area of the plot will be 2 . However, you do not know the
expected value of stream flow. [Hint: What is E(X)?]
value of , so you decide to make n independent measure-
7. a. A random sample of 10 houses in a particular area, each
ments X 1 ,X 2 ,...,X n of the length. Assume that each X i has
of which is heated with natural gas, is selected and the
mean m (unbiased measurements) and variance s 2 .
amount of gas (therms) used during the month of January
a. Show that 2 is not an unbiased estimator for m 2 . [Hint: For
is determined for each house. The resulting observations
any rv Y, E(Y 2 ) ⫽ V(Y) ⫹ [E(Y)] 2 . Apply this with Y X
⫽ .] ⫺ kS unbiased
are 103, 156, 118, 89, 125, 147, 122, 109, 138, 99. Let
b. For what value of k is the estimator X 2 2
denote the average gas usage during January by all houses
for m 2 ? [Hint: Compute E( X 2 ⫺ kS 2 ).]
in this area. Compute a point estimate of . m
11. Of n randomly selected male smokers, X smoked filter cig-
b. Suppose there are 10,000 houses in this area that use nat-
arettes, whereas of n
2 randomly selected female smokers, X 2
ural gas for heating. Let t denote the total amount of gas
smoked filter cigarettes. Let p
1 and p 2
denote the probabili-
used by all of these houses during January. Estimate t
ties that a randomly selected male and female, respectively,
using the data of part (a). What estimator did you use in
smoke filter cigarettes.
computing your estimate?
a. Show that (X
1 n 1 ) ⫺ (X 2 n 2 ) is an unbiased estimator for
c. Use the data in part (a) to estimate p, the proportion of all
p ⫺p 2 . [Hint: E(X i ) ⫽n i p
1 i for i ⫽ 1, 2.]
houses that used at least 100 therms.
b. What is the standard error of the estimator in part (a)?
d. Give a point estimate of the population median usage (the
c. How would you use the observed values x 1 and x
2 to esti-
middle value in the population of all houses) based on the
mate the standard error of your estimator?
sample of part (a). What estimator did you use?
d. If n 1 ⫽n 2 ⫽ 200, x 1 ⫽ 127, and x 2 ⫽ 176, use the esti-
8. In a random sample of 80 components of a certain type, 12
mator of part (a) to obtain an estimate of p 1 ⫺p 2 .
are found to be defective.
e. Use the result of part (c) and the data of part (d) to esti-
a. Give a point estimate of the proportion of all such compo-
mate the standard error of the estimator.
nents that are not defective.
12. Suppose a certain type of fertilizer has an expected yield per
b. A system is to be constructed by randomly selecting two
acre of 1 with variance s 2 , whereas the expected yield for
of these components and connecting them in series, as
a second type of fertilizer is m 2 with the same variance s 2 .
shown here.
Let S 1 2 and S 2 denote the sample variances of yields based on sample sizes n 1 and n 2 , respectively, of the two fertilizers.
Show that the pooled (combined) estimator
The series connection implies that the system will function if and only if neither component is defective (i.e., both com- 2 (n
sˆ 2 5 1 2 1)S 1 1 (n 2 2 1)S 2
ponents work properly). Estimate the proportion of all such
n 1 1n 2 22
systems that work properly. [Hint: If p denotes the probabil- ity that a component works properly, how can P(system
is an unbiased estimator of s 2 .
works) be expressed in terms of p?]
13. Consider a random sample X 1 ,...,X n from the pdf
9. Each of 150 newly manufactured items is examined and the number of scratches per item is recorded (the items are sup-
f (x; ) ⫽ .5(1 ⫹ x)
⫺1 ⱕ x ⱕ 1
posed to be free of scratches), yielding the following data:
where ⫺1 ⱕ ⱕ 1 (this distribution arises in particle u
Number of
physics). Show that uˆ 5 3X is an unbiased estimator of . u
[Hint: First determine m ⫽ E(X) ⫽ E( ).] X
scratches
per item
0 1 2 3 4 5 6 7 14. A sample of n captured Pandemonium jet fighters results in serial numbers x
1 ,x 2 ,x 3 ,...,x n . The CIA knows that the air- craft were numbered consecutively at the factory starting with
Observed
frequency
18 37 42 30 13 7 2 1 a and ending with b, so that the total number of planes manu- factured is b ⫺ a ⫹ 1 (e.g., if a ⫽ 17 and b ⫽ 29, then 29 ⫺
Let X ⫽ the number of scratches on a randomly chosen
17 ⫹ 1 ⫽ 13 planes having serial numbers 17, 18, 19, . . . ,
item, and assume that X has a Poisson distribution with
28, 29 were manufactured). However, the CIA does not know
parameter m.
the values of a or b. A CIA statistician suggests using the
6.2 Methods of Point Estimation
estimator max(X i ) ⫺ min(X i ) ⫹ 1 to estimate the total number
a. Suppose that r ⱖ 2. Show that
of planes manufactured.
⫽ (r ⫺ 1)(X ⫹ r ⫺ 1)
a. If n ⫽ 5, x 1 ⫽ 237, x 2 ⫽ 375, x 3 ⫽ 202, x 4 ⫽ 525, and
x 5 ⫽ 418, what is the corresponding estimate?
is an unbiased estimator for p. [Hint: Write out E( ) and pˆ
cancel x
⫹ r ⫺ 1 inside the sum.]
b. Under what conditions on the sample will the value of
b. A reporter wishing to interview five individuals who
the estimate be exactly equal to the true total number of
support a certain candidate begins asking people whether
planes? Will the estimate ever be larger than the true
(S) or not (F ) they support the candidate. If the sequence
total? Do you think the estimator is unbiased for estimat-
of responses is SFFSFFFSSS, estimate p
⫽ the true pro-
ing b
⫺ a ⫹ 1? Explain in one or two sentences.
portion who support the candidate.
15. Let X 1 ,X 2 ,...,X n represent a random sample from a
18. Let X
1 ,X 2 ,...,X n
be a random sample from a pdf f (x) that is symmetric about m, so that X | is an unbiased estimator of m . If n is large, it can be shown that V( ) | X 2
Rayleigh distribution with pdf
⬇ 1(4n[ f(m)] ).
f (x; u) 5 |
x 2x 2
e (2u)
x.0
a. Compare V( ) to V( ) when the underlying distribution is normal.
u
a. It can be shown that E(X 2 )
⫽ 2 . Use this fact to con- b. When the underlying pdf is Cauchy (see Example 6.7),
V( ) X ⫽ , so is a terrible estimator. What is V( ) in ` X | X
struct an unbiased estimator of based on u gX 2 (and use
i
rules of expected value to show that it is unbiased).
this case when n is large?
b. Estimate from the following n u ⫽ 10 observations on
19. An investigator wishes to estimate the proportion of stu-
vibratory stress of a turbine blade under specified
dents at a certain university who have violated the honor
conditions:
code. Having obtained a random sample of n students, she realizes that asking each, “Have you violated the honor
16.88 10.23 4.59 6.66 13.68 code?” will probably result in some untruthful responses. 14.23 19.87 9.40 6.51 10.95 Consider the following scheme, called a randomized
16. Suppose the true average growth
of one type of plant
response technique. The investigator makes up a deck of
during a 1-year period is identical to that of a second type,
100 cards, of which 50 are of type I and 50 are of type II.
but the variance of growth for the first type is s 2 , whereas
Type I: Have you violated the honor code (yes or no)? for the second type the variance is 4s 2 . Let X 1 ,...,X m be Type II: Is the last digit of your telephone number a 0, 1,
m independent growth observations on the first type [so
or 2 (yes or no)?
E(X i ) ⫽ m, V(X i ) ⫽s 2 ], and let Y 1 ,...,Y n
be n independ-
ent growth observations on the second type [E(Y i ) ⫽, m
Each student in the random sample is asked to mix the deck,
draw a card, and answer the resulting question truthfully.
i
a. Show that for any d between 0 and 1, the estimator
Because of the irrelevant question on type II cards, a yes
m 5 dX 1 (1 2 d)Y ˆ is unbiased for . m
response no longer stigmatizes the respondent, so we assume
b. For fixed m and n, compute V( ˆ m) , and then find the value
that responses are truthful. Let p denote the proportion of
of d that minimizes V( ˆ m) . [Hint: Differentiate with V( ˆ m)
honor-code violators (i.e., the probability of a randomly
selected student being a violator), and let l ⫽ P(yes response). Then l and p are related by l ⫽ .5p ⫹ (.5)(.3).
respect to d.]
17. In Chapter 3, we defined a negative binomial rv as the num-
a. Let Y denote the number of yes responses, so Y ⬃ Bin
ber of failures that occur before the rth success in a
(n, l). Thus Yn is an unbiased estimator of l. Derive an
sequence of independent and identical successfailure trials.
estimator for p based on Y. If n ⫽ 80 and y ⫽ 20, what is
The probability mass function (pmf) of X is
your estimate? [Hint: Solve l ⫽ .5p ⫹ .15 for p and then substitute Yn for l.]
nb(x; r, p) ⫽
b. Use the fact that E(Yn) ⫽ l to show that your estimator
x1r21
pˆ is unbiased.
c. If there were 70 type I and 30 type II cards, what would
x
be your estimator for p?