The nominal response model (NRM)
5.1 The nominal response model (NRM)
The discussion on this model serves two functions. First the model is interesting enough in itself to devote some attention to it. It shows how very unstructured data, just observations in nominal categories, can be treated in a way that allows one to draw quantitative inferences. Second, the model offers a good opportunity to discuss some identifiability problems present in all IRT models that were not mentioned in the previous discussions.
Suppose a test consists of k items, and the responses to each item are classified in a number of categories. This number can differ across items. The number of categories for item i is denoted as m i , and it is assumed that m i ⱖ 2. Furthermore, it is assumed that all responses can be considered as indicators for the same latent ability . If there are more than two categories, then the concept of an IRF does not make much sense; in such cases models specify the probability that a certain category will be chosen as a function of the underlying variable . These functions are called ‘category response functions’. In the nominal response model, these functions are given by
P(X i = j|) ⬀ exp[␣ ij ( –  ij )], (j = 1, . . ., m i ), (17) where the expression X i = j means: the answer to item i belongs to category j.
To find the exact probabilities, one has to determine the proportionality constant, which is one divided by the sum of the m i expressions specified in (17). Notice that in this model, unlike in the 2PLM, there are no positivity restrictions on the ␣-parameters.
There are three different reasons why this model is not identified as it is specified in (17). One reason applies to each item separately; the other two apply to all items jointly. We discuss them in turn.
If we consider the ratio of the probabilities for two categories of the same item, the normalizing constant cancels, since it is common for all categories. Now consider the ratio of the probability for category j to the probability for some other category r:
PX ( i = j |) θ exp[ αθβ ij ( − ij )] exp[ a ij − a ir ) θ − a ij β ij + a ir β ir )] =
PX ( i = r |) θ exp[ a ir ( θ −− β ir )]
172 Different methodological orientations Defining
␣* ij =␣ ij –␣ ir , and
冦  ij , if ␣ ij =␣ ir ,
one finds that
PX ( i = j |) θ exp[ *( a θβ
− ij *)]
(19) PX ( i = r |) θ
ij
,( j ≠ rr).
Together with the restriction that the sum of the probabilities of all m i equals one, it follows that (19) defines an equivalent NRM as (17). Moreover, for the reference category r, it holds that the denominator of the right-hand side of (19) is no longer dependent on , which implies that a ir = 0. The reference category can be chosen freely among the categories of the item, and across items this choice is completely arbitrary.
However, if we do this, then we find that P(X i = r|) ⬀ 1, which also means that the location parameter  ir has disappeared completely from all expressions; it simply does not exist anymore. So, for each item, there are only m – 1 location parameters  and m – 1 weight parameters ␣. A similar phenomenon is found in the 2PLM: although there are two response categories per item, correct and incorrect, the model has only one discrimination parameter and one difficulty parameter. The reference category for each item is the incorrect response.
The two other indeterminacies have to do with the origin and the unit of the -scale. As to the origin of the scale, it will be clear from the right-hand side of (17) that applying the transformations
* = + d and * ij = ij + d, with d being an arbitrary constant, yields ( –  ij ) = (* – * ij ), and thus lets
(17) remain unchanged. To solve this indeterminacy one can fix a single arbitrary location parameter  ij (j ⫽ r) to an arbitrary constant (zero, for example) or require that the sum of all location parameters is zero. Likewise, for the unit of the scale, the transformations
* = c, * ij = c ij and ␣* ij =␣ ij /c, where c is an arbitrary positive constant, yield ␣ ij ( –  ij ) = ␣* ij (* – * ij ) and
will not change the value of the right-hand side of (17). This indeterminacy can
Using Item Response Theory 173
be solved, for example, by fixing a single ␣-parameter to one. Notice that if ␣ ij is set to one, the category j must not be the reference category for that item. The specific way in which origin and unit for a model are chosen is called ‘normalization’ in IRT.
In summary then, the number of free category parameters in the nominal response model is not ⌺ i m i as may be suggested by (17) but ⌺ i (m i – 1) – 2. In Figure 8.7, a graphical display is given for a four-category item. The second category has been chosen as the reference category. The four weight parameters ␣ are –1, 0, 0.5 and 1.2, respectively. The location parameters are –1.5, –0.5 and 0.65 for the categories 1, 3 and 4, respectively. The location parameter for the reference category does not exist.
A general characteristic of the category response functions for this model is visible in the figure: exactly one of the curves is increasing and one is decreasing; the others are single peaked. The increasing one is associated with the category having the largest ␣-parameter; the decreasing one is associated with the category with the smallest ␣-parameter. The increasing one approaches the upper asymptote (one) as increases without bound, and the decreasing approaches the same asymptote as decreases without bound. If there is more than one category with the largest weight parameter, than all the associated curves increase; their asymptotes, however are less than one, but the sum of their asymptotes equals one. This follows similarly for the smallest value.
Now assume that the item in Figure 8.7 represents a multiple choice item, and the category labels 1 to 4 represent the alternatives. From the figure it is clear that the ability level where category 3 is the modal one is higher than the level where category 2 is the modal one. For the NRM it holds that, whatever the distribution of the ability is in the population, the average value of students choosing category 3 is higher than the average for those choosing category 2. So, a valid NRM allows for discriminating between these two categories of students. In working only with binary data (correct or incorrect) the distinction
obability Pr
(2 = r)
θ Figure 8.7 Category response functions in the NRM
174 Different methodological orientations between students choosing category 2 or 3 is lost, meaning that one has removed
useful information about the latent ability. More in general, it looks as if the four curves in Figure 8.7 are ordered, and most importantly, they are ordered in the same way as the ␣-parameters. These parameters appear in (17) as the coefficients of the latent variable , and they can
be interpreted as the score a student gets when a category is chosen. In the example,
a score of –1 is given for category 1, a score of zero for category 2, and so on. The score on the test is then the sum of all item scores obtained, meaning that the test score is a weighted sum of the categories chosen. This is the same rationale as in the 2PLM. Notice that in the NRM and the 2PLM, the estimate of is not this weighted sum, but it is a monotonic transformation of it.