60 where EY =
b
, VarY =
a
b
, and is a dispersion parameters
Hardin and Hilbe 2003 and Jiang et.al 2001. The GLM modeling process involves four steps: 1 specifying the models in
two parts: equations linking the response and explanatory variables, and the probability distribution of the response variable; 2 estimating parameters used in
the models; 3 checking how well the models fit the actual data; 4 making inferences, for example: calculating confidence intervals and testing hypotheses
about the parameters Dobson 2002. A linear model
where
i
is the mean and
2
is the variance of Y
i
, a dependent random variable, is a
basic of analysis of continue data. Some expansions of the GLM that include random effects as explanatory
variables is called as generalized linear mixed models GLMM. This model is used to accommodate the over-dispersion problems that often occur at the outcome
with binomial Williams 1982 or Poisson Breslow 1984 distribution.
4.2.1 Nested GLM for Ordinal Response
This part is about the building of Nested Generalized Linear Models for ordinal response. Before continue to study this model building, some concepts, i.e.
nested spatial and threshold models are explained as the following. Nested Spatial
In certain multilevel spatial survey, assumed the regions e.g., districts or ‘kabupaten’ of one area e.g., province are similar but not identical for another
area. Such an arrangement is called a nested, with levels of district nested under the levels of province. For example, consider the government has the goal to
reduce poverty and wishes to determine poverty score of each district through a spatially modeling. There are some districts available from each province. The
situation is depicted by Figure 2 page 4, which in this problem, a district from particular province has different nature from districts of another province. Every
province has a particular nature and policy especially for a specific province such
61 as ‘Daerah Istimewa’. This situation has an effect on the correlation
matrix and parameter estimation. Based on this effect, the nature of the spatial component
should be considered especially when a spatially modelling is needed to analyze the effects of districts and province.
Threshold Model Threshold is a latent variable at the model that made the difference between
the linear models with ordinal response and the linear models with non-ordinal responses. Threshold model is explained as follows. In logistic and probit
regression models, there are assumptions about an unobserved latent variable y associated with the actual responses through the concept of threshold Hedeker
1994. For dichotomy model, it is assumed there is a threshold value, while for ordinal model with K categories polytomy, it is assumed there are K-1 threshold
values, namely , with
and . Response occurs in
category k Z = k, if the latent response Y is greater than
k-1
and smaller than
k
. Assumed Y
j
is unobserved, and the j-th observation is in a category, say category Z
j
, j = 1 , …, N. The relationship between Y
j
and Z
j
is taken to be
where k {1, , K},
= - ,
K
= + and
are unknown boundary points that define a partitioning of the real line into K intervals. Thus,
when the realized value of Y
j
belongs to the k-th interval, we observe that z
j
= k. Under that assumptions, the probability-mass function of Z
1
, , Z
N
is
This model is called the threshold model Harville 1984. Cumulative Link Models
All of the models to be considered in this research arise from focusing on the cumulative
distribution of the response Rodriguez 2007. Let
jk
= Pr{Z
j
= k}
62
denote the probability that the response of an individual with characteristics x
j
falls in the k-th category, and let p
jk
denote the corresponding cumulative probability
that the response falls in the k-th category or below, so
3 Let g. denote a link function mapping probabilities to the real line. Then the class
of models that will be considered, assumes that the transformed cumulative probabilities are a linear function of the predictors, of the form
4 In this formulation,
k
is a constant representing the baseline value of the transformed cumulative probability for category k, and
β represents the effect of
the covariates on the transformed cumulative probabilities. Since it is written as the constant explicitly, we assume that the predictors do not include a column of
ones. Note that there is just one equation: if x
jq
increases by one, then all transformed cumulative probabilities increase by
β
q
. By focusing on the cumulative probabilities, a single effect can be postulated.
These models can also be interpreted in terms of a latent variable. Specifically, suppose that the manifest response Z
j
results from grouping an underlying continuous variable Y
j
using cut-points
1
2
K-1
, so that Z
j
takes the value 1 if Y
j
is below
1
, the value 2 if Y
j
is between
1
and
2
, and so on, taking the value K if Y
j
is above
K-1
. Figure 19 illustrates this idea for the case of five response categories.
63
Figure 19 An ordered response and its latent variable Suppose further the underlying continuous variable follows a linear model of
the form
where the error term
j
has c.d.f. F
j
. Then, the probability response of the j-th
individual will fall in the k-th category or below, given x
j
, satisfies the equation
and therefore follows the general form in Equation 4 with link given by the inverse of the c.d.f. of the error term
.
It is assumed that the predictors x
j
do not include a column of ones, because the constant is absorbed in the cut-points Rodriguez 2007.
As an illustration, let the threshold model has 4 categories of ordinal response: {bad, ok, good, great}. Figure 20 shows the changes probability values
of every category for x = 0, 1, and 2 and Figure 21 shows pdf of Y for some x values. The sloped line is the predicted value of the right-hand side,
. The bell-shaped curves correspond to the distribution of Y, the unobserved variable, is
, where
j
is a random variable Simpson 2008.
64
Figure 20 Changes in the value of x that cause changes in the magnitude of probability; a1, a2, a3 are the thresholds.
Figure 21 The effect of a covariate on the transformed cumulative probabilities pdf of Y for some values of x
4.2.1.1 Data Layout
Two types data structures in GEE modeling are shown by example 1 and 2, that is the same and different number of observations in a subject. Data in this
research appropriate for Example 1.
X Y
X
Y
65
Example
1 Equal
number of observations for al1 three subjects: Sub-subject
Subject 1
y
11
y
12
y
13
2 y
21
y
22
y
23
3 y
31
y
32
y
33
Example
2 Different number of observations for al1 three subjects: Sub-subject
Subject 1
y
11
y
12
y
13
y
14
y
15
y
16
y
17
2 y
21
y
22
y
23
y
24
y
25
3 y
31
y
32
y
33
y
34
y
35
y
36
The response variable can be continuous or discrete or order categorical. For example, the response could be calcium content continuous, presence or absence
of a disease 1 or 0, or number of asthma attacks counts over the last time period Valois 1997, or poverty level of a sub district ordinal. The responses for subject
i, Y