If a GLMM has no G-side or R-side effects, then it reduces to a GLM; G = 0 Compute

83 The size of matrix Z is , where . is a vector of random effects which are assumed to be normally distributed with mean 0 and variance matrix G. The variance of y of equation 8, conditional on the random effects is 9 A is a diagonal matrix and contains the variance function of the model, which is the function of the mean μ, divided by the corresponding scale weight variable; that is, , s = 1, …, S and i = 1, …, n s . The variance functions, V μ, are different for different distributions. The matrix R is the variance matrix of observations inside clusters. Generalized linear mixed models allow correlation andor heterogeneity from random G-side andor heterogeneity from residual effects R-side, resulting in 4 types of models:

1. If a GLMM has no G-side or R-side effects, then it reduces to a GLM; G = 0

and R=

I, where I is the identity matrix and  is the scale parameter. For

continuous distributions normal, inverse Gauss and gamma,  is an unknown 84 parameter and is estimated jointly with the regression parameters by the maximum likelihood ML method. For discrete distributions negative binomial, Poisson, binomial and multinomial,  is estimated by Pearson chi- square as follows: where N = N - p x for the restricted maximum pseudo-likelihood REPL method.

2. If a model only has G-side random effects, then the G matrix is user-specified

and R= I.  is estimated jointly with the covariance parameters in G for continuous distributions and  = 1 for discrete distributions. 3. If a model only has R-side residual effects, then G = 0 and the R matrix is user-specified. All covariance parameters in R are estimated using the REPL method. 4. If a model has both G-side and R-side effects, all covariance parameters in G and R are jointly estimated using the REPL method. Type 2 is appropriate to the model in this study. For ordinal multinomial distribution, of equation 9 and R = I which means that R-side effects are not supported for the multinomial distribution.  is set to 1.

4.2.2.2 Logistic Response Function

The probability that , conditional on random effects, under logit formulation with the mixed-effects regression model for the underlying latent variable , as shown by equation 3 in section 4.2.1, is given by 85 where represents the random effects; and · represents the logistic cumulative distribution function cdf. In the following model development, the logit response function and the expansion of formula is based on Liu and Hedeker 1993. Maximum Marginal Likelihood estimation Let Y sij be the vector of ordinal responses from area s and subject i for all the si occasions with n si items at each occasion. Assuming independence of the responses conditional on the random effect, the conditional likelihood of any pattern Y sij , given u i , is where Then the marginal likelihood of Y s in the population is expressed as the following integral of the conditional likelihood, L., weighted by the prior density where represents the distribution of random effects in the population the joint distribution of , a standard normal density. With assumption conditional on the level-2 effect , the responses from n i occasions in subject i are independent, the marginal probability can be rewritten as where 86 For estimation of the p covariate coefficients , r item discrimination parameters u, and K –1 threshold values k = 1, …, K-1, the marginal log likelihood for the patterns from the n s level-2 subjects is differentiated, Let θ is an arbitrary parameter vector, then we obtain It is tractable for probit formulation and as long as the number of level-2 random effects is no greater than three or four, a condition which is typically satisfied for longitudinal or clustered studies Liu and Hedeker 2006. In this study, cumulative logit is used, which is not tractable Vasdekis et.al. 2010 or has no closed form solution Hardin and Hilbe 2003. To handle this problem, Wolfinger and O’Connell gave a solution using Linear Mixed Pseudo model with first-order Taylor series approximation that will be discussed at the following sub section.

4.2.2.3 Wolfinger and O’Connell Approach

Procedure of pseudo-likelihood estimation by Wolfinger and O’Connell is described as follows. For the generalized linear mixed model GLMM consider a data vector y of length n satisfying and a differentiable monotonic link function such that where  is a vector of unknown fixed effects with known mode1 matrix X of rank p, and u is a vector of unknown random effects with known model matrix Z. Assume Eu = 0 and covu = D, where D is unknown. 87 Also, e is a vector of unobserved errors with and Here , is a diagonal matrix contain evaluations at  of a known variance function for the generalized linear model under consideration and R is unknown. The next step is to build PL and REPL methods for fitting the GLMM by using three approximations: two analytic and one probabilistic For the first analytic approximation, let and be known estimates of  and u, and define which is a vector consisting of evaluations of at each component of . Now let 10 where is a diagonal matrix with elements consisting of evaluations of the first derivative of . Note that in equation 10 is a Taylor series approximation to expanding about and . Next, for the probabilistic approximation, the conditional distribution of given  and u with a Gaussian distribution having the same first two moments as e | , u which we assume corresponds to e | . In particular, assumed that |, u is Gaussian with mean and variance . The second and final analytic approximation is substituting for  in the variance matrix. Then, since for each component i, where is a diagonal matrix with elements constructed as above. Defined then equivalently it can be specified For ordinal multinomial response, 88 and error terms with and and block diagonal weight matrix is The Gaussian log pseudo-likelihood PL and restricted log pseudo- likelihood REPL, which are expressed as the functions of covariance parameters in , corresponding to the linear mixed model for v are the following: l l R where , , 89 N denotes the effective sample size, and denotes the total number of non- redundant parameters for B. The parameter can be estimated by linear mixed model using the objection function -2 l θ; v or -2 l R θ; v, B and u are best linear unbiased prediction BLUP Robinson 1991 and computed as Iterative process The estimation of θ uses the doubly iterative according to Wolfinger and O’Connell and SPSS algorithm. The steps are as follows: 1. Obtaining an initial estimate of , . Let . Also set the outer iteration index m = 0, M = maximum iterations. 2. Based on , compute and Fit a weighted linear mixed model with pseudo target v, fixed effects design matrix X, random effects design matrix Z, and diagonal weight matrix . The fitting procedure, which is called the inner iteration, yields the estimates of θ, and is denoted as θ m . If m = 0, go to step 4; otherwise go to the next step. 3. Check if the following criterion with tolerance level  is satisfied: . If it is met or maximum number of outer iterations is reached, stop. Otherwise, go to the next step.

4. Compute

by setting then set . Depending on the choice of random effect estimates, set . 5. Compute the new estimate of π by , set m = m + 1 and go to step 2. Wald confidence intervals for covariance parameter estimates 90 It is assumed that the estimated parameters of G and R are obtained through the above doubly iterative process. Then their asymptotic covariance matrix can be approximated by 2 Η -1 , where H is the Hessian matrix of the objective function - 2 l θ; v or -2 l R θ; v, evaluated at . The standard error for the ith covariance parameter estimate in the vector, say , is the square root of the ith diagonal element of 2 Η -1 . Thus, a simple Wald’s type confidence interval or test statistic for any covariance parameter can be obtained by using the asymptotic normality. However, these can be unreliable in small samples, especially for variance and correlation parameters that have a range of [0,  and [-1, 1] respectively. Therefore, following the same method used in linear mixed models, these parameters are transformed to parameters that have range - ,  . Using the delta method, these transformed estimates still have asymptotic normal distributions. For variance type parameters in G and R, such as in the autoregressive, autoregressive moving average, compound symmetry, diagonal, Toeplitz, and variance components, and in the unstructured type, the 1001 – α Wald confidence interval is given, assuming the variance parameter estimate is and its standard error is from the corresponding diagonal element of 2Η -1 , by

4.3 Methodology

4.3.1 Model Building

Model building in this study is based on spatial concept: the closer the observation, the larger the correlation Cressie 1993. Based on this concept, the idea was expanded to the nested of location or area. Furthermore, as the data in the observation is not always continue nor has normal distribution, the model should be in the general form, and for nested generalized linear models,