3 The article “Is a Normal Distribution the Most Appropriate Statistical Distribution for

Example 6.3 The article “Is a Normal Distribution the Most Appropriate Statistical Distribution for

  Volumetric Properties in Asphalt Mixtures?” first cited in Example 4.26, reported the following observations on X ⫽ voids filled with asphalt () for 52 specimens of a certain type of hot-mix asphalt:

  Let’s estimate the variance s 2 of the population distribution. A natural estimator is

  the sample variance:

  Minitab gave the following output from a request to display descriptive statistics: Variable Count

  Mean

  SE Mean StDev Variance

  Q1

  Median Q3

  VFA(B)

  Thus the point estimate of the population variance is

  52 2 1 [alternatively, the computational formula for the numerator of s 2 gives

  S 5gx 2 2 (g x ) 2 xx 2 i i n 5 285,929.5964 2 (3841.78) 52 5 2097.4124 ].

  A point estimate of the population standard deviation is then s ˆ 5 s 5141.126 5 6.413. An alternative estimator results from using the divisor n rather than n ⫺ 1:

  We will shortly indicate why many statisticians prefer S 2 to this latter estimator.

  The cited article considered fitting four different distributions to the data: normal, log- normal, two-parameter Weibull, and three-parameter Weibull. Several different tech- niques were used to conclude that the two-parameter Weibull provided the best fit (a normal probability plot of the data shows some deviation from a linear pattern). From Section 4.5, the variance of a Weibull random variable is

  s 2 5b 2 5⌫(1 1 2a) 2 [⌫(1 1 1a)] 2 6

  where a and b are the shape and scale parameters of the distribution. The authors of the article used the method of maximum likelihood (see Section 6.2) to estimate these

  parameters. The resulting estimates were a ˆ 5 11.9731, bˆ 5 77.0153 . A sensible

  estimate of the population variance can now be obtained from substituting the esti-

  mates of the two parameters into the expression for s 2 ; the result is s ˆ 2 5 56.035.

  This latter estimate is obviously quite different from the sample variance. Its validity depends on the population distribution being Weibull, whereas the sample variance is

  a sensible way to estimate s 2 when there is uncertainty as to the specific form of the

  population distribution.

  ■

  6.1 Some General Concepts of Point Estimation

  In the best of all possible worlds, we could find an estimator for which uˆ uˆ 5 u always. However, is a function of the sample X i ’s, so it is a random variable. For some samples, uˆ will yield a value larger than , whereas for other samples will u uˆ underestimate . If we write u

  uˆ 5 u 1 error of estimation

  then an accurate estimator would be one resulting in small estimation errors, so that estimated values will be near the true value.

  A sensible way to quantify the idea of being close to is to consider the uˆ u squared error (uˆ 2 u) 2 . For some samples, uˆ will be quite close to and the resulting u

  squared error will be near 0. Other samples may give values of far from , corre- uˆ u sponding to very large squared errors. An omnibus measure of accuracy is the

  expected or mean square error MSE 5 E[(uˆ 2 u) 2 ] . If a first estimator has smaller

  MSE than does a second, it is natural to say that the first estimator is the better one. However, MSE will generally depend on the value of . What often happens is that one estimator will have a smaller MSE for some values of and a larger MSE for u other values. Finding an estimator with the smallest MSE is typically not possible.

  One way out of this dilemma is to restrict attention just to estimators that have some specified desirable property and then find the best estimator in this restricted group. A popular property of this sort in the statistical community is unbiasedness.