Nested GLM for Ordinal Response

60 where EY =      b , VarY =       a b  , and  is a dispersion parameters Hardin and Hilbe 2003 and Jiang et.al 2001. The GLM modeling process involves four steps: 1 specifying the models in two parts: equations linking the response and explanatory variables, and the probability distribution of the response variable; 2 estimating parameters used in the models; 3 checking how well the models fit the actual data; 4 making inferences, for example: calculating confidence intervals and testing hypotheses about the parameters Dobson 2002. A linear model where  i is the mean and  2 is the variance of Y i , a dependent random variable, is a basic of analysis of continue data. Some expansions of the GLM that include random effects as explanatory variables is called as generalized linear mixed models GLMM. This model is used to accommodate the over-dispersion problems that often occur at the outcome with binomial Williams 1982 or Poisson Breslow 1984 distribution.

4.2.1 Nested GLM for Ordinal Response

This part is about the building of Nested Generalized Linear Models for ordinal response. Before continue to study this model building, some concepts, i.e. nested spatial and threshold models are explained as the following. Nested Spatial In certain multilevel spatial survey, assumed the regions e.g., districts or ‘kabupaten’ of one area e.g., province are similar but not identical for another area. Such an arrangement is called a nested, with levels of district nested under the levels of province. For example, consider the government has the goal to reduce poverty and wishes to determine poverty score of each district through a spatially modeling. There are some districts available from each province. The situation is depicted by Figure 2 page 4, which in this problem, a district from particular province has different nature from districts of another province. Every province has a particular nature and policy especially for a specific province such 61 as ‘Daerah Istimewa’. This situation has an effect on the correlation matrix and parameter estimation. Based on this effect, the nature of the spatial component should be considered especially when a spatially modelling is needed to analyze the effects of districts and province. Threshold Model Threshold is a latent variable at the model that made the difference between the linear models with ordinal response and the linear models with non-ordinal responses. Threshold model is explained as follows. In logistic and probit regression models, there are assumptions about an unobserved latent variable y associated with the actual responses through the concept of threshold Hedeker 1994. For dichotomy model, it is assumed there is a threshold value, while for ordinal model with K categories polytomy, it is assumed there are K-1 threshold values, namely , with and . Response occurs in category k Z = k, if the latent response Y is greater than  k-1 and smaller than  k . Assumed Y j is unobserved, and the j-th observation is in a category, say category Z j , j = 1 , …, N. The relationship between Y j and Z j is taken to be where k  {1, , K},  = - ,  K = +  and are unknown boundary points that define a partitioning of the real line into K intervals. Thus, when the realized value of Y j belongs to the k-th interval, we observe that z j = k. Under that assumptions, the probability-mass function of Z 1 , , Z N is      This model is called the threshold model Harville 1984. Cumulative Link Models All of the models to be considered in this research arise from focusing on the cumulative distribution of the response Rodriguez 2007. Let  jk = Pr{Z j = k} 62 denote the probability that the response of an individual with characteristics x j falls in the k-th category, and let p jk denote the corresponding cumulative probability that the response falls in the k-th category or below, so  3 Let g. denote a link function mapping probabilities to the real line. Then the class of models that will be considered, assumes that the transformed cumulative probabilities are a linear function of the predictors, of the form 4 In this formulation,  k is a constant representing the baseline value of the transformed cumulative probability for category k, and β represents the effect of the covariates on the transformed cumulative probabilities. Since it is written as the constant explicitly, we assume that the predictors do not include a column of ones. Note that there is just one equation: if x jq increases by one, then all transformed cumulative probabilities increase by β q . By focusing on the cumulative probabilities, a single effect can be postulated. These models can also be interpreted in terms of a latent variable. Specifically, suppose that the manifest response Z j results from grouping an underlying continuous variable Y j using cut-points  1  2   K-1 , so that Z j takes the value 1 if Y j is below  1 , the value 2 if Y j is between  1 and  2 , and so on, taking the value K if Y j is above  K-1 . Figure 19 illustrates this idea for the case of five response categories. 63 Figure 19 An ordered response and its latent variable Suppose further the underlying continuous variable follows a linear model of the form where the error term  j has c.d.f. F  j . Then, the probability response of the j-th individual will fall in the k-th category or below, given x j , satisfies the equation and therefore follows the general form in Equation 4 with link given by the inverse of the c.d.f. of the error term . It is assumed that the predictors x j do not include a column of ones, because the constant is absorbed in the cut-points Rodriguez 2007. As an illustration, let the threshold model has 4 categories of ordinal response: {bad, ok, good, great}. Figure 20 shows the changes probability values of every category for x = 0, 1, and 2 and Figure 21 shows pdf of Y for some x values. The sloped line is the predicted value of the right-hand side, . The bell-shaped curves correspond to the distribution of Y, the unobserved variable, is , where  j is a random variable Simpson 2008. 64 Figure 20 Changes in the value of x that cause changes in the magnitude of probability; a1, a2, a3 are the thresholds. Figure 21 The effect of a covariate on the transformed cumulative probabilities pdf of Y for some values of x

4.2.1.1 Data Layout

Two types data structures in GEE modeling are shown by example 1 and 2, that is the same and different number of observations in a subject. Data in this research appropriate for Example 1. X Y X Y 65 Example 1 Equal number of observations for al1 three subjects: Sub-subject Subject 1 y 11 y 12 y 13 2 y 21 y 22 y 23 3 y 31 y 32 y 33 Example 2 Different number of observations for al1 three subjects: Sub-subject Subject 1 y 11 y 12 y 13 y 14 y 15 y 16 y 17 2 y 21 y 22 y 23 y 24 y 25 3 y 31 y 32 y 33 y 34 y 35 y 36 The response variable can be continuous or discrete or order categorical. For example, the response could be calcium content continuous, presence or absence of a disease 1 or 0, or number of asthma attacks counts over the last time period Valois 1997, or poverty level of a sub district ordinal. The responses for subject

i, Y