83
The size of matrix Z is , where
. is a vector of random effects which are assumed to be normally
distributed with mean 0 and variance matrix G.
The variance of y of equation 8, conditional on the random effects is
9
A
is a diagonal matrix and contains the variance function of the model, which is the function of the mean
μ, divided by the corresponding scale weight variable;
that is, , s = 1, …, S and i = 1, …, n
s
. The variance functions, V
μ, are different for different distributions. The matrix R is the
variance matrix of observations inside clusters. Generalized linear mixed models allow correlation andor heterogeneity from
random G-side andor heterogeneity from residual effects R-side, resulting in 4
types of models:
1. If a GLMM has no G-side or R-side effects, then it reduces to a GLM; G = 0
and R=
I, where I is the identity matrix and is the scale parameter. For
continuous distributions normal, inverse Gauss and gamma, is an unknown
84 parameter and is estimated jointly with the regression parameters by the
maximum likelihood ML method. For discrete distributions negative binomial, Poisson, binomial and multinomial,
is estimated by Pearson chi- square as follows:
where N = N - p
x
for the restricted maximum pseudo-likelihood REPL method.
2. If a model only has G-side random effects, then the G matrix is user-specified
and R=
I. is estimated jointly with the covariance parameters in G for
continuous distributions and = 1 for discrete distributions.
3. If a model only has R-side residual effects, then G = 0 and the R matrix is user-specified. All covariance parameters in R are estimated using the REPL
method.
4. If a model has both G-side and R-side effects, all covariance parameters in G and R are jointly estimated using the REPL method.
Type 2 is appropriate to the model in this study. For ordinal multinomial distribution,
of equation 9 and R =
I which means that R-side effects are not supported for the multinomial distribution.
is set to 1.
4.2.2.2 Logistic Response Function
The probability that , conditional on random effects, under logit
formulation with the mixed-effects regression model for the underlying latent variable
, as shown by equation 3 in section 4.2.1, is given by
85 where
represents the random effects; and
· represents the logistic cumulative distribution function cdf. In the following
model development, the logit response function and the expansion of formula is based on Liu and Hedeker 1993.
Maximum Marginal Likelihood estimation
Let Y
sij
be the vector of ordinal responses from area s and subject i for all the si occasions with n
si
items at each occasion. Assuming independence of the responses conditional on the random effect, the conditional likelihood of any
pattern Y
sij
, given u
i
, is
where
Then the marginal likelihood of Y
s
in the population is expressed as the following integral of the conditional likelihood, L., weighted by the prior density
where represents the distribution of random effects in the population the
joint distribution of , a standard normal density. With assumption conditional
on the level-2 effect , the responses from n
i
occasions in subject i are independent, the marginal probability can be rewritten as
where
86
For estimation of the p covariate coefficients , r item discrimination parameters u,
and K –1 threshold values
k = 1, …, K-1, the marginal log likelihood for the
patterns from the n
s
level-2 subjects is differentiated,
Let
θ is an arbitrary parameter vector, then we obtain
It is tractable for probit formulation and as long as the number of level-2 random effects is no greater than three or four,
a condition which is typically satisfied for longitudinal or clustered studies Liu and Hedeker 2006. In this study, cumulative
logit is used, which is not tractable Vasdekis et.al. 2010 or has no closed form solution Hardin and Hilbe 2003. To handle this problem, Wolfinger and
O’Connell gave a solution using Linear Mixed Pseudo model with first-order Taylor series approximation that will be discussed at the following sub section.
4.2.2.3 Wolfinger and O’Connell Approach
Procedure of pseudo-likelihood estimation by
Wolfinger and O’Connell is described as follows. For the generalized linear mixed model GLMM consider a
data vector y of length n satisfying
and a differentiable monotonic link function
such that
where
is a vector of unknown fixed effects with known mode1 matrix X of rank p, and u is a vector of unknown random effects with known model matrix Z.
Assume Eu = 0 and covu = D, where D is unknown.
87
Also, e is a vector of unobserved errors with
and
Here
,
is a diagonal matrix contain evaluations at of a known variance
function for the generalized linear model under consideration and R is unknown.
The next step is to build PL and REPL methods for fitting the GLMM by using three approximations: two analytic and one probabilistic
For the first analytic approximation, let
and be known estimates of and u, and define
which is a vector consisting of evaluations of at each component of
. Now let
10 where
is a diagonal matrix with elements consisting of evaluations of the first derivative of
. Note that
in equation 10 is a Taylor series approximation to
expanding about and . Next, for the probabilistic approximation, the conditional distribution of
given
and u with a Gaussian distribution having the same first two moments as e
|
, u which we assume corresponds to e | . In particular, assumed that |, u is
Gaussian with mean and variance
. The second and final analytic approximation is substituting
for in the
variance matrix. Then, since
for each component
i,
where is a diagonal matrix with elements constructed as above. Defined
then equivalently it can be specified For ordinal multinomial response,
88
and error terms with
and
and block diagonal weight matrix is
The Gaussian log pseudo-likelihood PL and restricted log pseudo- likelihood REPL, which are expressed as the functions of covariance parameters
in , corresponding to the linear mixed model for v are the following:
l l
R
where ,
,
89 N denotes the effective sample size, and
denotes the total number of non-
redundant parameters for B.
The parameter can be estimated by linear mixed model using the objection
function -2
l
θ; v or -2
l
R
θ; v, B and u are best linear unbiased prediction
BLUP Robinson 1991 and computed as
Iterative process The estimation of
θ uses the doubly iterative according to Wolfinger and
O’Connell and SPSS algorithm. The steps are as follows: 1. Obtaining an initial estimate of
, . Let
. Also set
the outer iteration index m = 0, M = maximum iterations.
2. Based on
, compute
and Fit a weighted linear mixed model with pseudo target v, fixed effects
design matrix X, random effects design matrix Z, and diagonal weight
matrix . The fitting procedure, which is called the inner iteration, yields
the estimates of
θ, and is denoted as θ
m
. If m = 0, go to step 4; otherwise go to the next step.
3. Check if the following criterion with tolerance level is satisfied:
. If it is met or maximum number of outer iterations is reached, stop.
Otherwise, go to the next step.
4. Compute
by setting then set
. Depending on the choice of random effect estimates, set
. 5. Compute the new estimate of
π by , set m = m + 1 and
go to step 2. Wald confidence intervals for covariance parameter estimates
90
It is assumed that the estimated parameters of G and R are obtained through
the above doubly iterative process. Then their asymptotic covariance matrix can be approximated by 2
Η
-1
, where H is the Hessian matrix of the objective function -
2
l
θ; v or -2
l
R
θ; v, evaluated at . The standard error for the ith covariance
parameter estimate in the vector, say
, is the square root of the ith diagonal element of 2
Η
-1
. Thus, a simple Wald’s type confidence interval or test statistic for any
covariance parameter can be obtained by using the asymptotic normality. However, these can be unreliable in small samples, especially for variance and
correlation parameters that have a range of [0, and [-1, 1] respectively.
Therefore, following the same method used in linear mixed models, these parameters are transformed to parameters that have range -
, . Using the delta method, these transformed estimates still have asymptotic normal distributions.
For variance type parameters in G and R, such as in the autoregressive,
autoregressive moving average, compound symmetry, diagonal, Toeplitz, and variance components, and
in the unstructured type, the 1001 – α Wald
confidence interval is given, assuming the variance parameter estimate is and
its standard error is
from the corresponding diagonal element of 2Η
-1
, by
4.3 Methodology
4.3.1 Model Building
Model building in this study is based on spatial concept: the closer the observation, the larger the correlation Cressie 1993. Based on this concept, the
idea was expanded to the nested of location or area. Furthermore, as the data in the observation is not always continue nor has normal distribution, the model should
be in the general form,
and for nested generalized linear models,