POINT AND INTERVAL ESTIMATION
Chapter 6 POINT AND INTERVAL ESTIMATION
The problem of estimation is defined as follows. Assume that some characteristic of the elements in a population can be represented by a random variable X whose den-
sity is f X (.; θ) = f (.; θ) , where the form of the density is assumed known except that it contains an unknown parameter θ (if θ were known, the density function would
be completely specified, and there would be no need to make inferences about it.
Further assume that the values x 1 ,x 2 , ..., x n of a random sample X 1 ,X 2 , ...., X n from
f (.; θ) can be observed. On the basis of the observed sample values x 1 ,x 2 , ..., x n it
is desired to estimate the value of the unknown parameter θ or the value of some function, say τ (θ), of the unknown parameter. The estimation can be made in two ways. The first, called point estimation, is to let the value of some statistic, say
t(X 1 ,X 2 , ...., X n ) , represent or estimate, the unknown τ (θ). Such a statistic is called the point estimator. The second, called interval estimation, is to define two
statistics, say t 1 (X 1 ,X 2 , ...., X n ) and t 2 (X 1 ,X 2 , ...., X n ) , where t 1 (X 1 ,X 2 , ...., X n )< t 2 (X 1 ,X 2 , ...., X n ) , so that (t 1 (X 1 ,X 2 , ...., X n ), t 2 (X 1 ,X 2 , ...., X n )) constitutes an in-
terval for which the probability can be determined that it contains the unknown τ (θ) .
6.1 Parametric Point Estimation The point estimation admits two problems. The first is to devise some means of ob-
taining a statistic to use as an estimator. The second, to select criteria and techniques
Point and Interval Estimation
to define and find a “best” estimator among many possible estimators.
6.1.1 Methods of Finding Estimators Any statistic (known function of observable random variables that is itself a random
variable) whose values are used to estimate τ (θ), where τ (.) is some function of the parameter θ, is defined to be an estimator of τ (θ).
Notice that for specific values of the realized random sample the estimator takes a specific value called estimate.
6.1.2 Method of Moments
Let f (.; θ 1 ,θ 2 , ..., θ k ) be a density of a random variable X which has k parameters
θ
1 ,θ 2 , ..., θ
k . As before let μ r denote the r moment i.e. = E[X ] . In general μ r
th
r
will be a known function of the k parameters θ 1 ,θ 2 , ..., θ k . Denote this by writ-
ing μ
r =μ r (θ 1 ,θ 2 , ..., θ k ) . Let X 1 ,X 2 , ..., X n be a random sample from the density
1 P j
th
f (.; θ 1 ,θ 2 , ..., θ k ) , and, as before, let M j be the j sample moment, i.e. M j = n X i .
i=1
Then equating sample moments to population ones we get k equations with k un- knowns, i.e.
M
j =μ j (θ 1 ,θ 2 , ..., θ k )
f or j = 1, 2, ..., k Let the solution to these equations be b θ 1 ,b θ 2 , ..., b θ k . We say that these k estimators
are the estimators of θ 1 ,θ 2 , ..., θ k obtained by the method of moments.
Example: Let X 1 ,X 2 , ..., X n be a random sample from a normal distribution
2 with mean μ and variance σ 2 . Let (θ
1 ,θ 2 ) = (μ, σ ) . Estimate the parameters μ and σ by the method of moments.. Recall that σ 2 =μ 2 − (μ 1 ) 2 and μ = μ 1 . The method
of moment equations become:
1 P
X i =X=M 1 =μ 1 =μ 1 (μ, σ )=μ
Solving the two equations for μ and σ we get: r
1 P
bμ = X, and bσ = n (X i − X) which are the M-M estimators of μ and σ.
i=1
Parametric Point Estimation
Example: Let X 1 ,X 2 , ..., X n be a random sample from a Poisson distribution
with parameter λ. There is only one parameter, hence only one equation, which is:
1 P
X i =X=M 1 =μ 1 =μ 1 (λ) = λ
n
i=1
Hence the M-M estimator of λ is b λ=X .
6.1.3 Maximum Likelihood Consider the following estimation problem. Suppose that a box contains a number of
black and a number of white balls, and suppose that it is known that the ratio of the number is 31 but it is not known whether the black or the white are more numerous,
i.e. the number of drawing a black ball is either 14 or 34. If n balls are drawn with replacement from the box, the distribution of X, the number of black balls, is given by the binomial distribution
x = 0, 1, 2, ..., n
x
where p is the probability of drawing a black ball. Here p = 14 or p = 34. We shall draw a sample of three balls, i.e. n = 3, with replacement and attempt to estimate the unknown parameter p of the distribution. the estimation is simple in this case as we have to choose only between the two numbers 14 = 0.25 and 34 = 0.75. The possible outcomes and their probabilities are given below:
outcome : x
f (x; 0.75)
f (x; 0.25)
In the present example, if we found x = 0 in a sample of 3, the estimate 0.25 for p would be preferred over 0.75 because the probability 2764 is greater than 164, i.e.
Point and Interval Estimation
And in general we should estimate p by 0.25 when x = 0 or 1 and by 0.75 when x = 2 or 3. The estimator may be defined as
⎧
⎨ 0.25 f or
x = 0, 1
p= b p(x) = b
⎩ 0.75 f or
x = 2, 3
The estimator thus selects fro every possible x the value of p, say p b , such that
f (x; b p) > f (x; p )
where p is the other value of p.
More generally, if several values of p were possible, we might reasonably pro- ceed in the same manner. Thus if we found x = 2 in a sample of 3 from a binomial population, we should substitute all possible values of p in the expression
f or 0 ≤p≤1
and choose as our estimate that value of p which maximizes f (2; p). The position of the maximum of the function above is found by setting equal to zero the first
derivative with respect to p, i.e. d dp f (2; p) = 6p
− 9p 2 = 3p(2 − 3p) = 0 ⇒ p = 0 or
d 2 d p = 23 2 . The second derivative is:
dp 2 f (2; p) = 6 − 18p. Hence, dp 2 f (2; 0) = 6 and
d 2 the value of p = 0 represents a minimum, whereas 2
dp 2 f (2; 3 )= −6 and consequently p= 2 represents the maximum. Hence p= 3 2 b 3 is our estimate which has the property
f (x; b p) > f (x; p )
where p is any other value in the interval 0 ≤ p ≤ 1.
The likelihood function of n random variables X 1 ,X 2 , ..., X n is defined to
be the joint density of the n random variables, say f X 1 ,X 2 ,...,X n (x 1 ,x 2 , ..., x n ; θ) , which is considered to be a function of θ. In particular, if X 1 ,X 2 , ..., X n is a random sample from the density f (x; θ), then the likelihood function is f (x 1 ; θ)f (x 2 ; θ).....f (x n ; θ) .
To think of the likelihood function as a function of θ, we shall use the notation
L(θ; x 1 ,x 2 , ..., x n ) or L(•; x 1 ,x 2 , ..., x n ) for the likelihood function in general.
Parametric Point Estimation
The likelihood is a value of a density function. Consequently, for discrete random variables it is a probability. Suppose for the moment that θ is known, denoted
by θ 0 . The particular value of the random variables which is “most likely to occur” is that value x
1 ,x 2 , ..., x n such that f X 1 ,X 2 ,...,X n (x 1 ,x 2 , ..., x n ;θ 0 ) is a maximum. for example, for simplicity let us assume that n = 1 and X 1 has the normal density
with mean 0 and variance 1. Then the value of the random variable which is most likely to occur is X
1 =0 . By “most likely to occur” we mean the value x 1 of X 1
such that φ
0,1 (x 1 )>φ 0,1 (x 1 ) . Now let us suppose that the joint density of n random
variables is f X 1 ,X 2 ,...,X n (x 1 ,x 2 , ..., x n ; θ) , where θ is known. Let the particular values
which are observed be represented by x
1 ,x 2 , ..., x n . We want to know from which
density is this particular set of values most likely to have come. We want to know from which density (what value of θ) is the likelihood largest that the set x
1 ,x 2 , ..., x n
was obtained. in other words, we want to find the value of θ in the admissible set, denoted by b θ , which maximizes the likelihood function L(θ; x
1 ,x 2 , ..., x n ) . The value bθ which maximizes the likelihood function is, in general, a function of x 1 ,x 2 , ..., x n ,
say b θ=b θ(x 1 ,x 2 , ..., x n ) . Hence we have the following definition:
Let L(θ) = L(θ; x 1 ,x 2 , ..., x n ) be the likelihood function for the random vari-
ables X 1 ,X 2 , ..., X n . If b θ [where b θ=b θ(x 1 ,x 2 , ..., x n ) is a function of the observations
x 1 ,x 2 , ..., x n ] is the value of θ in the admissible range which maximizes L(θ), then b Θ=
bθ(X 1 ,X 2 , ..., X n ) is the maximum likelihood estimator of θ. b θ=b θ(x 1 ,x 2 , ..., x n ) is the maximum likelihood estimate of θ for the sample x 1 ,x 2 , ..., x n . The most important cases which we shall consider are those in which X 1 ,X 2 , ..., X n
is a random sample from some density function f (x; θ), so that the likelihood function is
L(θ) = f (x 1 ; θ)f (x 2 ; θ).....f (x n ; θ)
Many likelihood functions satisfy regularity conditions so the maximum likelihood estimator is the solution of the equation
dL(θ) =0 dθ
Point and Interval Estimation
Also L(θ) and logL(θ) have their maxima at the same value of θ, and it is sometimes easier to find the maximum of the logarithm of the likelihood. Notice also that if the likelihood function contains k parameters then we find the estimator from the solution of the k first order conditions.
Example: Let a random sample of size n is drawn from the Bernoulli distri-
bution
f (x; p) = p x (1
− p) 1−x where 0 ≤ p ≤ 1. The sample values x 1 ,x 2 , ..., x n will be a sequence of 0s and 1s, and
the likelihood function is
L(p) =
P Let y = x i we obtain that
logL(p) = y log p + (n − y) log(1 − p)
and
d log L(p) y n −y = −
dp
p 1 −p
Setting this expression equal to zero we get
which is intuitively what the estimate for this parameter should be.
Example: Let a random sample of size n is drawn from the normal distrib- ution with density
The likelihood function is
L(μ, σ )=
√
e − (xi−μ)
2σ2
2 2 exp −
(x i − μ)
2πσ
2πσ
2σ 2 i=1
i=1
Interval Estimation
the logarithm of the likelihood function is
log L(μ, σ )= − log 2π − log σ −
(x i − μ)
2 2 2σ 2 i=1
To find the maximum with respect to μ and σ 2 we compute
and putting these derivatives equal to 0 and solving the resulting equations we find the estimates
which turn out to be the sample moments corresponding to μ and σ 2 .
6.1.4 Properties of Point Estimators One needs to define criteria so that various estimators can be compared. One of these
is the unbiasedness. An estimator T = t(X 1 ,X 2 , ..., X n ) is defined to be an unbiased
estimator of τ (θ) if and only if
E θ [T ] = E θ [t(X 1 ,X 2 , ..., X n )] = τ (θ)
for all θ in the admissible space.
Other criteria are consistency, mean square error etc.
6.2 Interval Estimation In practice estimates are often given in the form of the estimate plus or minus a
certain amount e.g. the cost per volume of a book could be 83 ± 4.5 per cent which
Point and Interval Estimation
means that the actual cost will lie somewhere between 78.5 and 87.5 with high probability. Let us consider a particular example. Suppose that a random sample (1.2, 3.4, .6, 5.6) of four observations is drawn from a normal population with unknown mean μ and a known variance 9. The maximum likelihood estimate of μ is the sample mean of the observations:
x = 2.7 We wish to determine upper and lower limits which are rather certain to
contain the true unknown parameter value between them. We know that the sample mean, x, is distributed as normal with mean μ and variance 9n i.e. x 2 v N(μ, σ n) .
Hence we have
X −μ
Z= 3 v N(0, 1)
Hence Z is standard normal. Consequently we can find the probability that Z will
be between two arbitrary values. For example we have that
Z 1.96
P[ −1.96 < Z < 1.96] =
φ(z)dz = 0.95
−1.96
Hence we get that μ must be in the interval
X+
1.96 > μ > X − 1.96
and for the specific value of the sample mean we have that 5.64 > μ > −.24 i.e. P [5.64 > μ > −.24] = .95. This leads us to the following definition for the confidence
interval.
Let X 1 ,X 2 , ...., X n be a random sample from the density f (•; θ). Let T 1 = t 1 (X 1 ,X 2 , ...., X n ) and T 2 =t 2 (X 1 ,X 2 , ...., X n ) be two statistics satisfying T 1 ≤T 2
for which P θ [T 1 < τ (θ) < T 2 ]=γ , where γ does not depend on θ. Then the random
interval (T 1 ,T 2 ) is called a 100γ percent confidence interval for τ (θ). γ is called
the confidence coefficient. T 1 and T 2 are called the lower and upper confidence limits, respectively. A value (t 1 ,t 2 ) of the random interval (T 1 ,T 2 ) is also called a
100γ percent confidence interval for τ (θ).
Interval Estimation
Let X 1 ,X 2 , ...., X n be a random sample from the density f (•; θ). Let T 1 = t 1 (X 1 ,X 2 , ...., X n ) be a statistic for which P θ [T 1 < τ (θ)] = γ . Then T 1 is called a one- sided lower confidence interval for τ (θ). Similarly, let T 2 =t 2 (X 1 ,X 2 , ...., X n )
be a statistic for which P θ [τ (θ) < T 2 ]=γ . Then T 2 is called a one-sided upper
confidence interval for τ (θ).
Example: Let X 1 ,X 2 , ...., X n be a random sample from the density f (x; θ) =
√
φ θ,9 (x) . Set T 1 =t 1 (X 1 ,X 2 , ...., X n )=X − 6 n and T 2 =t 2 (X 1 ,X 2 , ...., X n )=
√
X + 6 n . Then (T 1 ,T 2 ) constitutes a random interval and is a confidence interval
√
√
for τ (θ) = θ, with confidence coefficient γ = P [X − 6 n < θ < X + 6 n] =
=P[
−2 < X−θ
√ 3 < 2] = Φ(2) n − Φ(−2) = 0.9772 − 0.0228 = 0.9544. hence if the
random sample of 25 observations has a sample mean of, say, 17.5, then the interval
25, 17.5 + 6 25) is also called a confidence interval of θ.
6.2.1 Sampling from the Normal Distribution
Let X 1 ,X 2 , ..., X n be a random sample from the normal distribution with mean μ
and variance σ 2 . If σ 2 is unknown then θ = (μ, σ 2 ) , the unknown parameters and
τ (θ) = μ , the parameter we want to estimate by interval estimation. We know that
X −μ
v N(0, 1)
√ σ
n
However, the problem with this statistic is that we have two parameters. Conse- quently we can not an interval. hence we look for a statistic that involves only the parameter we want to estimate, i.e. μ. Notice that
This statistic involves only the parameter we want to estimate. Hence we have
S n where q 1 ,q 2 are such that
∙
¸
X −μ P q 1 < √ Point and Interval Estimation ¡ √ √ ¢ Hence the interval X −q 2 (S n), X −q 1 (S n) is the 100γ percent confidence interval for μ. It can be proved that if q 1 ,q 2 are symmetrical around 0, then the length is the interval is minimized. Alternatively, if we want to find a confidence interval for σ 2 , when μ is un- known, then we use the statistic Hence we have where q 1 ,q 2 are such that So the interval (n−1)S 2 (n−1)S 2 q 2 , q 1 is a 100γ percent confidence interval for σ . The q 1 ,