Estimating latent abilities

4.1 Estimating latent abilities

Maximum likelihood (ML) estimates Acceptance of a measurement model on the basis of a well designed and

thoroughly investigated measurement model means in practice that the item parameters of the model are replaced by their estimates and subsequently treated as known quantities. The likelihood of an observed response pattern x on k binary items is then

1 − L x (;) θ x ∏ f

i () θ i × [ 1 − f ( )] θ i ,

which now only depends on a single unknown quantity, ␪, and the value of ␪ that maximizes the right-hand side of (14) is the maximum likelihood estimate of ␪. Once the item parameters are fixed, the 2PLM is also an exponential family model, and the sufficient statistic for ␪ is the weighted score (with the discrimination parameters as weights). Denoting the sufficient statistic for ␪ generically as s, the ML estimate of ␪ in the 2PLM is the solution of the likelihood equation

s = ∑ α i f i ( ). θ

For the Rasch model it suffices to replace the ␣-parameters by one. Some of the models discussed in previous sections, however, do not belong to the

IRT models 209 exponential family, even when the item parameters are fixed. The 3PLM is such

a case, as well as the graded response model and all normal ogive models. For these models there is no sufficient statistic for ␪, and to find the ML estimate, the right-hand side of (14) must be maximized.

A disadvantage of ML estimates is that (15) has no solution if s is zero or s equals the maximal score because the right-hand side of (15) is always a positive number (all response probabilities are positive), and it is always smaller than the maximal score, because all response probabilities are strictly smaller than one.

One of the attractive features of maximum likelihood estimation is the availability of the standard errors of the estimates, which are derived from a quantity known as Fisher information. The information function is defined as the negative of the expected value of the second derivative of the log-likelihood function:

The expected value is to be taken over all possible response patterns. The relationship with the standard error of the ML estimate ␪ˆ is this:

1 1 SE (ˆ) θ ≈ ≈

I (ˆ) θ Notice that in (17) two approximations are used. The first one is there because

I () θ

the result is only valid asymptotically – that is, when the number of items tends to infinity. The second one is needed because the information function is a function of ␪, and since ␪ is not known, its value is replaced by a proxy, the estimate.

In the 2PLM the information function is given by = k

I 2 () θ ∑ α

i f i ( )[ θ 1 − f i ( )]. θ

Conditional unbiasedness It seems a reasonable requirement that upon (independent) replicated adminis-

trations of the same test, the average estimate of the ability equals the true ability. Formally the requirement is that

E(␪ˆ|␪) – ␪ = 0. (19) The estimator ␪ˆ is said to be conditionally unbiased if (19) holds for all values

of ␪.

210 Different methodological orientations The maximum likelihood estimator of ␪ is conditionally biased (Lord 1983).

In Figure 9.8, the bias of the ML estimator (dashed curve) – that is, the value of the left-hand side of (19), is displayed graphically for a test of 40 items complying with the Rasch model. The item parameters range from –1.05 to +1.7 with an average value of 0.5. For zero and perfect scores, values of –5 and +5 have been used respectively as estimates of the latent ability.

In the figure (dashed curve) it is clearly seen that the bias is zero only for one value of ␪, ␪ 0 say. For values larger than ␪ 0 , the bias is positive, meaning that on the average, the real value of ␪ will tend to be overestimated by the ML estimate, while for values smaller than ␪ 0 , they will be underestimated. In general, this means that the ML estimates are stretching out the real values of ␪. The relation between the item parameters and the value of ␪ 0 has not yet been determined mathematically but in a number of simulations it has appeared that ␪ 0 coincides with the point of maximal information. However, independently of the correctness of the previous statement, it should be kept in mind that the value of ␪ 0 , the bias function as displayed in the figure and the information function are completely determined by the item parameters and are not in any way related to the distribution of the latent variable ␪ in whatever population.

Suppose that one wants to compare the ability of girls and boys, for example by a t-test, and one uses the ML estimates as proxies for ␪. If the average estimate of the girls is larger than ␪ 0 and the average of the boys is smaller, then the difference of the averages will be an overestimation of the true difference between the two groups.

The conditional bias and the fact that the estimate does not exist for zero nor for perfect scores makes the ML estimates unsuitable for statistical applications. The results discussed here apply to models of the logistic family

Figure 9.8 Conditional bias of the ML and the WML estimators in the Rasch model

IRT models 211 (the Rasch model, OPLM, (G)PCM, 2PLM and 3PLM). Similar theoretical

results for the GRM and the normal ogive models are not known. Weighted maximum likelihood (WML) estimates

Although it is not possible to find an estimator of ␪ that is conditionally unbiased, Warm (1989) developed an estimator that is almost unbiased. The estimate is the value of ␪ that maximizes the weighted likelihood – that is, the product of the likelihood function multiplied by a weight function that depends on ␪ but not on the response pattern x:

w(␪) × L(␪; x) (20) Warm’s purpose was to find the weight function such that the bias disappeared

as much as possible. It turned out that in the Rasch model and in the 2PLM the weight function is given by

w () θ = I ( ), θ the square root of the information function. He also found the weight function

for the 3PLM, but in that case it is more complicated. This estimator, referred to as the weighted maximum likelihood or Warm estimator, is almost unbiased for a wide range of ␪-values. Moreover, the estimator exists for all possible values of the score, including zero and perfect scores. For extreme values of ␪, the bias remains, but it is in the opposite direction from the bias of the ML estimator: for values much higher than the point of maximal information the bias is negative and for small values it is positive – the estimates tend to shrink compared to the true values. In Figure 9.8 (solid line) the bias is displayed graphically and can be compared directly to the bias of the ML estimator. Notice that the bias is very small (less than 0.01 in absolute value) in the range (–2.5, 3), which is much wider than the range of the difficulty parameters.

The availability of an (almost) unbiased estimator is a great advantage for statistical applications, as will be discussed in the next sections.