Parameter estimation in mixed-effects models

2.2 Parameter estimation in mixed-effects models

In the mixed model approach, the item effects are considered as fixed effects, but the values of the latent variable are treated as random effects. This means that the sample of students is considered as a sample from some population, and that in this population the latent variable follows some distribution. If one can assume that this distribution can be described with a probability density function of, say, g(␪), then the marginal likelihood of a data vector x v and the corresponding design vector d v is given by

∫∫ ∏ [ ( )] f i θ [ 1 − f i ( )] θ

d vi ( 1 x vi )

× g () θθ d , (2)

where f i (␪) is the IRF of item i and depends on the item parameter ␤ i , and g(␪) depends on the p ‘population’ parameters ␩ 1 , . . ., ␩ p . For example, if it is assumed that the latent variable is normally distributed, the two population parameters are the mean and the variance of this distribution and p = 2.

IRT models 191 It is important to read the right-hand side of equation (2) correctly. The

ability of student v, ␪ v , does not appear in this expression; the only reference to student v is through the design variables d vi and the observed responses x vi . The latent variable ␪ is the integration variable, and it is integrated out. So the right- hand side of (2) must be read as ‘the probability of observing x v , given d v from

a student randomly drawn from the population where the probability density function of the latent variable is given by g(␪)’. This likelihood is called the marginal likelihood. The likelihood for a matrix of observed responses x is just the product (over v) of the expression in (2):

1 … η p ;,) xd = ∏ ∫ ∏ [ ( )] f i θ [ 1 − f i ( )] θ

× g () θθ d . (3)

v == 1 i = 1

Maximizing the right-hand side of (3), or its logarithm with respect to all model parameters (that is, item parameters and population parameters), jointly yields the marginal maximum likelihood (MML) estimates of the parameters. This procedure leads to consistent estimates of all model parameters and is a good basis on which to make estimates of the standard errors and to build statistical goodness-of-fit tests.

Notice that the right-hand side of (3) is a generic expression for all unidimen- sional IRT-models and all population models where the distribution can be described with a probability density function. The technical procedures actually to compute the parameter estimates, of course, will differ from model to model. For a detailed explanation of the Rasch model, see Glas (1989) or Molenaar (1995).

Identifiability Fixed-effects models are not identified if the design is not linked. This is easy

to understand because the unit and the origin of the scale can be freely chosen for each set of test forms that are not linked to any of the other test forms. In the left-hand panel of Figure 9.1, this means that for each of the sets an arbitrary unit and origin can be chosen for each of the two sets of items, and there is no means to bring the two sets onto a common scale.

When using MML, the design restrictions can be relaxed. Referring again to the left-hand panel of Figure 9.1, the model is identified if the two groups of students are considered as equivalent samples from the same population. Although the groups do not share any item, they are tied together by the equivalence of their ability distribution, so that it is possible to choose a common unit and origin, for example, the standard deviation and the mean of the common distribu - tion. Although this situation is comfortable, one should be careful with the assumption of statistical equivalence of the two samples. If this assumption is not fulfilled, all parameter estimates will be biased. In practice, using this assump - tion will in general be satisfied only in an experimental set-up where students are allocated randomly to one of the two groups.

192 Different methodological orientations About g( ␪ )

In principle, any g(␪) can be used in equation (3). The most widely used one is the normal distribution. Sometimes, however, a discrete distribution is used as well, where it is assumed that the latent variable can assume only m different values, called support points. In such a case, again, two different approaches are possible. In the first approach the support points are considered as known, for example, m Gauss- Hermite quadrature points, and for each quadrature point the probability mass has to be estimated, leaving m – 1 free parameters, since the sum of the probability masses equals one. Details of such an approach are given by Bock and Aitken (1981). In the other approach, support points and their associated masses have to

be estimated jointly. For the Rasch model, this has been investigated by De Leeuw and Verhelst (1986); but see also Forman (1995) and Laird (1978). In this latter approach there are also theoretical results for the value of m. Although, such an approach looks attractive, there are two drawbacks: the number of parameters to be estimated is generally larger than when using a parametric distribution, and the estimated distribution is in general not unique.

But there is a more serious practical drawback to the use of the MML procedure. The use of the first multiplication sign in (3) – the multiplication over students – implies that one assumes the latent value is identically distributed for all students in the sample. This is equivalent to assuming that the students represent a simple random sample from a distribution g(␪), and this in practical applications in EER is seldom the case. Suppose a sample of students is drawn using a two-stage cluster sampling procedure (first schools, then students within schools), and one applies (3) as it stands, then the estimates of all parameters, inclusive of the item parameters, may be systematically affected, even if it is true that g(␪) is the correct distribution for the whole student population.

Assume, as an example, that one wants to apply a two-level model, where the school effects, u j , are assumed to be normally distributed with mean ␮ 0 and variance ␶ 2 , and the student-within-school effects ␪ vj are normally distributed with mean zero and common variance ␴ 2 . If so, then the likelihood (3) has to

be replaced with

d ( 1 − x L ) (,, β

1 … βη k ,,, 1 … η p ;,) xd = ∏ ∫ ∏ ∫ ∏ [ ( )] f θ

× g 1 () θ 2 g u d du () θ , where g 1 (␪) is the probability density function for the normal distribution

2 (␪) is the probability density function for N(␮ 0 , ␶ ). The first product sign in (4) runs over schools, the second over students within schools, and the right-most one runs over items.

N(0, ␴ 2 ) and g

The previous example makes clear what the general problem is: the measure- ment model, be it the Rasch model or any other model, is, for the sake of its item parameter estimation, embedded in a structural model – via the likelihood function, which causes two kinds of problems:

IRT models 193 • The first one is of a technical nature: both models can be quite complicated,

and finding the maximum of the likelihood function is not a trivial task, as can be seen from the sandwich structure of multiplication signs and integral signs in (4).

• The second problem is a more principled one: it is true that if the models are known to be the correct ones, maximizing the likelihood estimation will yield estimates that are optimal in a number of respects, but the problem is usually that in any application one cannot be sure that the models are correct. It may be the case, for example, that in the structural model (as in (4)) there is a large gender effect, which has been ignored. In such a case, and dependent on the design, such a specification error may lead to biases in the estimates of the parameters of the measurement model, and it is very difficult to find out in general terms how severe these biases will be. Or to put it a bit differently, using some distributional assumption about the latent variable will affect the estimates of the measurement model. Conversely, a defect in the measurement model will affect the inferences about the structural model, also to an unknown extent. For example, it may happen that one finds a seemingly large difference between the mean ability of boys and girls, but that this difference is caused by the presence of a few items that do not fit the assumptions of the measurement model.

These problems make it clear that it would be worthwhile to have a method to estimate the parameters of the measurement model, and to check the validity of the model, in a way that is not affected by any assumption about the distribution of the latent variable. Such a method is discussed in the next section.