Some General Concepts of Point Estimation
6.1 Some General Concepts of Point Estimation
Statistical inference is almost always directed toward drawing some type of conclu sion about one or more parameters (population characteristics). To do so requires that an investigator obtain sample data from each of the populations under study. Conclusions can then be based on the computed values of various sample quantities. For example, let m (a parameter) denote the true average breaking strength of wire connections used in bonding semiconductor wafers. A random sample of n 5 10 connections might be made, and the breaking strength of each one determined,
resulting in observed strengths x 1 ,x 2 , … ,x 10 . The sample mean breaking strength x could then be used to draw a conclusion about the value of m. Similarly, if s 2 is the
variance of the breaking strength distribution (population variance, another parame
ter), the value of the sample variance s 2 can be used to infer something about s 2 .
When discussing general concepts and methods of inference, it is conveni ent to have a generic symbol for the parameter of interest. We will use the Greek letter u for this purpose. In many investigations, u will be a population mean m, a
difference m 1 2m 2 between two population means, or a population proportion of
“successes” p. The objective of point estimation is to select a single number, based on sample data, that represents a sensible value for u. As an example, the parameter of interest might be m, the true average lifetime of batteries of a certain type. A
random sample of n 5 3 batteries might yield observed lifetimes (hours) x 1 5 5.0,
x 2 5 6.4, x 3 5 5.9. The computed value of the sample mean lifetime is x 5 5.77, and
it is reasonable to regard 5.77 as a very plausible value of m—our “best guess” for the value of m based on the available sample information.
Suppose we want to estimate a parameter of a single population (e.g., m or s) based on a random sample of size n. Recall from the previous chapter that before data
is available, the sample observations must be considered random variables (rv’s) X 1 ,
X 2 , … ,X n . It follows that any function of the X i ’s—that is, any statistic—such as the sample mean X or sample standard deviation S is also a random variable. The same is true if available data consists of more than one sample. For example, we can rep
resent tensile strengths of m type 1 specimens and n type 2 specimens by X 1 , … ,X m
and Y 1 , … ,Y n , respectively. The difference between the two sample mean strengths is
X 2 Y; this is the natural statistic for making inferences about m 1 2 m 2 ,the difference
between the population mean strengths.
DEFINITION A point estimate of a parameter u is a single number that can be regarded as
a sensible value for u. It is obtained by selecting a suitable sta tistic and com puting its value from the given sample data. The selected statistic is called the point estimator of u.
In the foregoing battery example, the estimator used to obtain the point estimate of m was X, and the point estimate of m was 5.77. If the three observed lifetimes had
instead been x 1 5 5.6, x 2 5 4.5, and x 3 5 6.1, use of the estimator X would have
resulted in the estimate x 5 (5.6 1 4.5 1 6.1) y3 5 5.40. The symbol uˆ (“theta hat”) is customarily used to denote both the estimator of u and the point estimate resulting from a given sample. Thus mˆ 5 X is read as “the point estimator of m is the sample
Following earlier notation, we could use Qˆ (an uppercase theta) for the estimator, but this is cumber
some to write.
6.1 Some General Concepts of point estimation 249
mean X.” The statement “the point estimate of m is 5.77” can be written concisely as mˆ 5 5.77. Notice that in writing uˆ 5 72.5, there is no indication of how this point estimate was obtained (what statistic was used). It is recommended that both the esti mator and the resulting estimate be reported.
ExamplE 6.1
An automobile manufacturer has developed a new type of bumper, which is sup posed to absorb impacts with less damage than previous bumpers. The manufacturer has used this bumper in a sequence of 25 controlled crashes against a wall, each at
10 mph, using one of its compact car models. Let X 5 the number of crashes that result in no visible damage to the automobile. The parameter to be estimated is p 5 the pro portion of all such crashes that result in no damage [alternatively, p 5 P(no damage in a single crash)]. If X is observed to be x 5 15, the most reasonable esti mator and estimate are
n 25
X x 15
estimator pˆ 5
If for each parameter of interest there were only one reasonable point estima tor, there would not be much to point estimation. In most problems, though, there will be more than one reasonable estimator.
ExamplE 6.2
Consider the accompanying 20 observations on dielectric breakdown voltage for pieces of epoxy resin first introduced in Exercise 4.89.
The pattern in the normal probability plot given there is quite straight, so we now assume that the distribution of breakdown voltage is normal with mean value m. Because normal distributions are symmetric, m is also the median lifetime of the distribution. The given observations are then assumed to be the result of a random
sample X 1 ,X 2 ,…, X 20 from this normal distribution. Consider the following estima
tors and resulting estimates for m:
a. Estimator 5 X, estimate 5 x 5 ox i n 5 555.86y20 5 27.793
b. Estimator 5 X , , estimate 5 x, 5 (27.94 1 27.98) y2 5 27.960
c. Estimator 5 [min(X i ) 1 max(X i )] y2 5 the average of the two extreme lifetimes, estimate 5 [min(x i ) 1 max(x i )] y2 5 (24.46 1 30.88)y2 5 27.670
d. Estimator 5 X tr(10) , the 10 trimmed mean (discard the smallest and largest
10 of the sample and then average), estimate 5 x tr(10)
Each one of the estimators (a)–(d) uses a different measure of the center of the sample to estimate m. Which of the estimates is closest to the true value? This question can not be answered without knowing the true value. A question that can be answered is, “Which estimator, when used on other samples of X i ’s, will tend to produce estimates closest to the true value?” We will shortly address this issue.
n
250 Chapter 6 point estimation
ExamplE 6.3
The article “Is a Normal Distribution the Most Appropriate Statistical
Distribution for Volumetric Properties in Asphalt Mixtures?” first cited in Example 4.26, reported the following observations on X 5 voids filled with asphalt () for 52 specimens of a certain type of hotmix asphalt:
Let’s estimate the variance s 2 of the population distribution. A natural estimator is
the sample variance:
2 5 S 2 5 (X o i 2 X ) sˆ 2
n2 1
Minitab gave the following output from a request to display descriptive statistics: Variable Count
Mean
SE Mean StDev Variance
Q1
Median Q3
VFA(B) 52 73.880 0.889 6.413 41.126 67.933 74.855 79.470 Thus the point estimate of the population variance is
2 2 (x i 2 x ) sˆ 2 5 s 5 o 5 41.126
[alternatively, the computational formula for the numerator of s 2 gives
S 5 x 2 2 xx 2 o i _ x o i + yn 5 285,929.5964 2 (3841.78) y52 5 2097.4124].
A point estimate of the population standard deviation is then sˆ 5 s 5 Ï41.126 5 6.413.
An alternative estimator results from using the divisor n rather than n 2 1:
2 o (X i 2 X ) 2 sˆ 2097.4124 5 , estimate 5 5 40.335
n
We will shortly indicate why many statisticians prefer S 2 to this latter estimator.
The cited article considered fitting four different distributions to the data: normal, log normal, twoparameter Weibull, and threeparameter Weibull. Several different tech niques were used to conclude that the twoparameter Weibull provided the best fit (a normal probability plot of the data shows some deviation from a linear pattern). From Section 4.5, the variance of a Weibull random variable is
s 2 5b 2 hG(1 1 2ya) 2 [G(1 1 1ya)] 2 j
where a and b are the shape and scale parameters of the distribution. The authors of the article used the method of maximum likelihood (see Section 6.2) to estimate these parameters. The resulting estimates were aˆ 5 11.9731, bˆ 5 77.0153. A sen sible estimate of the population variance can now be obtained from substituting the
estimates of the two parameters into the expression for s 2 ; the result is sˆ 2 5 56.035.
This latter estimate is obviously quite different from the sample variance. Its valid ity depends on the population distribution being Weibull, whereas the sample vari
ance is a sensible way to estimate s 2 when there is uncertainty as to the specific form
of the population distribution.
n
6.1 Some General Concepts of point estimation 251
In the best of all possible worlds, we could find an estimator uˆ for which uˆ 5 u always. However, uˆ is a function of the sample X i ’s, so it is a random variable. For some samples, uˆ will yield a value larger than u, whereas for other samples uˆ will underestimate u. If we write
uˆ 5 u 1 error of estimation
then an accurate estimator would be one resulting in small estimation errors, so that estimated values will be near the true value.
A sensible way to quantify the idea of uˆ being close to u is to consider the
squared error (uˆ 2 u) 2 . For some samples, uˆ will be quite close to u and the result
ing squared error will be near 0. Other samples may give values of uˆ far from u, corre sponding to very large squared errors. An omnibus measure of accuracy is the
expected or mean square error MSE 5 E[(uˆ 2 u) 2 ]. If a first estimator has smaller
MSE than does a second, it is natural to say that the first estimator is the better one. However, MSE will generally depend on the value of u. What often happens is that one estimator will have a smaller MSE for some values of u and a larger MSE for other values. Finding an estimator with the smallest MSE is typically not possible.
One way out of this dilemma is to restrict attention just to estimators that have some specified desirable property and then find the best estimator in this restricted group. A popular property of this sort in the statistical community is unbiasedness.
unbiased Estimators
Suppose we have two measuring instruments; one instrument has been accurately calibrated, but the other systematically gives readings larger than the true value being measured. When each instrument is used repeatedly on the same object, because of measurement error, the observed measurements will not be identical. However, the measurements produced by the first instrument will be distributed about the true value in such a way that on average this instrument measures what it purports to measure, so it is called an unbiased instrument. The second instrument yields observations that have a systematic error component or bias. Figure 6.1 shows 10 measurements from both an unbiased and a biased instrument.
--x----x-x-----x----xx---x-x----x----x---
---x----x------x----xx----x-x---x-x----x
True value of characteristic
True value of characteristic
(a)
(b)
Figure 6.1 Measurements from (a) an unbiased instrument, and (b) a biased instrument
DEFINITION A point estimator uˆ is said to be an unbiased estimator of u if E(uˆ) 5 u for every possible value of u. If uˆ is not unbiased, the difference E(uˆ) 2 u is called the bias of uˆ.
That is, uˆ is unbiased if its probability (i.e., sampling) distribution is always “cen tered” at the true value of the parameter. Suppose uˆ is an unbiased estimator; then if u 5 100, the uˆ sampling distribution is centered at 100; if u 5 27.5, then the uˆ sam pling distribution is centered at 27.5, and so on. Figure 6.2 pictures the distributions of several biased and unbiased estimators. Note that “centered” here means that the expected value, not the median, of the distribution of uˆ is equal to u.
252 Chapter 6 point estimation
Bias of 1 Bias of u 1
Figure 6.2 The pdf’s of a biased estimator uˆ 1 and an unbiased estimator uˆ 2 for a parameter u
It may seem as though it is necessary to know the value of u (in which case estimation is unnecessary) to see whether uˆ is unbiased. This is not usually the case, though, because unbiasedness is a general property of the estimator’s sampling distribution—where it is centered—which is typically not dependent on any particular parameter value.
In Example 6.1, the sample proportion X yn was used as an estimator of p, where X, the number of sample successes, had a binomial distribution with parame ters n and p. Thus
1 n 2 n n
X 1 1
E (pˆ) 5 E 5 E(X) 5 (np) 5 p
pROpOSITION When X is a binomial rv with parameters n and p, the sample proportion
pˆ 5 Xyn is an unbiased estimator of p.
No matter what the true value of p is, the distribution of the estimator pˆ will be cen tered at the true value.
ExamplE 6.4
Suppose that X, the reaction time to a certain stimulus, has a uniform distribution on the interval from 0 to an unknown upper limit u (so the density function of X is rectangular in shape with height 1 yu for 0 x u). It is desired to estimate u on the
basis of a random sample X 1 ,X 2 ,…, X n of reaction times. Since u is the largest pos
sible time in the entire population of reaction times, consider as a first estimator the
largest sample reaction time: uˆ 1 5 max (X 1 ,…, X n ). If n 5 5 and x 1 5 4.2, x 2 5 1.7, x 3 5 2.4, x 4 5 3.9, and x 5 5 1.3, the point estimate of u is uˆ 1 5 max(4.2, 1.7, 2.4,
Unbiasedness implies that some samples will yield estimates that exceed u and other samples will yield estimates smaller than u—otherwise u could not possibly
be the center (balance point) of uˆ 1 ’s distribution. However, our proposed estima
tor will never overestimate u (the largest sample value cannot exceed the largest population value) and will underestimate u unless the largest sample value equals u.
This intuitive argument shows that uˆ 1 is a biased estimator. More precisely, it can be
shown (see Exercise 32) that
n1 1 1 n1 1 2
The bias of uˆ 1 is given by nu y(n 1 1) 2 u 5 2uy(n 1 1), which approaches 0 as n
gets large.
6.1 Some General Concepts of point estimation 253
It is easy to modify uˆ 1 to obtain an unbiased estimator of u. Consider the
estimator n1 1
uˆ 2 5 ? max (X 1 ,…, X n ) n
Using this estimator on the data gives the estimate (6 y5)(4.2) 5 5.04. The fact that
(n 1 1) yn . 1 implies that uˆ 2 will overestimate u for some samples and underesti
mate it for others. The mean value of this estimator is
3 n
n1 1 n1 1
4 n
E(uˆ 2 )5E
max(X 1 ,…, X n ) 5 ? E [max(X 1 ,…, X n )]
If uˆ 2 is used repeatedly on different samples to estimate u, some estimates will be too large and others will be too small, but in the long run there will be no systematic ten dency to underestimate or overestimate u.
n
Principle of unbiased Estimation When choosing among several different estimators of u, select one that is
unbiased.
According to this principle, the unbiased estimator uˆ 2 in Example 6.4 should
be preferred to the biased estimator uˆ 1 . Consider now the problem of estimating s 2 .
pROpOSITION Let X 1 ,X 2 ,…, X n
be a random sample from a distribution with mean m and variance s 2 . Then the estimator
5 S 2 5 o
(X i 2 X )
sˆ 2
n2 1 is unbiased for estimating s 2 .
Proof For any rv Y, V(Y) 5 E(Y 2 ) 2 [E(Y)] 2 , so E(Y 2 ) 5 V(Y) 1 [E(Y)] 2 .
Applying this to
2 1 _ X S i 5
2 o
n 4
o i
n2 1 o (
i ) n f _ o i + 5 g 6
n2 1 5 o
1 2 )2 V X 1 E X
n h _ o i + f _ o i + g j 6
5 (s 2 1m 2
n2 1 5 n n 6
n2 1
5 ns 2 1 nm 2 ns 2 (nm) 2
5 {ns 2 2s 2 }5s 2 (as desired) n
254 Chapter 6 point estimation
The estimator that uses divisor n can be expressed as (n 2 1)S 2 yn, so
3 n 4 n
This estimator is therefore not unbiased. The bias is (n 2 1)s 2 yn 2 s 2 5 2s 2 yn. Because the bias is negative, the estimator with divisor n tends to underestimate s 2 ,
and this is why the divisor n 2 1 is preferred by many statisticians (though when n is large, the bias is small and there is little difference between the two).
Unfortunately, the fact that S 2 is unbiased for estimating s 2 does not imply that
S is unbiased for estimating s. Taking the square root invalidates the property of unbiasedness (the expected value of the square root is not the square root of the expected value). Fortunately, the bias of S is small unless n is quite small. There are other good reasons to use S as an estimator, especially when the population distribu tion is normal. These will become more apparent when we discuss confidence inter vals and hypothesis testing in the next several chapters.
In Example 6.2, we proposed several different estimators for the mean m of
a normal distribution. If there were a unique unbiased estimator for m, the esti mation problem would be resolved by using that estimator. Unfortunately, this is not the case.
pROpOSITION If X 1 ,X 2 ,… , X n is a random sample from a distribution with mean m, then X is an unbiased estimator of m. If in addition the distribution is continuous and symmetric, then X , and any trimmed mean are also unbiased estimators of m.
The fact that X is unbiased is just a restatement of one of our rules of expected value:
E ( X) 5 m for every possible value of m (for discrete as well as continuous distribu tions). The unbiasedness of the other estimators is more difficult to verify.
The next example introduces another situation in which there are several un biased estimators for a particular parameter.
ExamplE 6.5 Under certain circumstances organic contaminants adhere readily to wafer surfaces and cause deterioration in semiconductor manufacturing devices. The article