Selecting a linear mixed model for longi

SELECTING A LINEAR MIXED MODEL 1

0F1F2F3F4F5FRunning head: SELECTING A LINEAR MIXED MODEL

Selecting a linear mixed model for longitudinal data: Repeated measures
ANOVA, covariance pattern model, and growth curve approaches
Siwei Liu
Michael J. Rovine
Peter C. M. Molenaar
The Pennsylvania State University

Author Note
Siwei Liu, Human Development and Family Studies, The Pennsylvania State
University; Michael J. Rovine, Human Development and Family Studies, The Pennsylvania
State University; Peter C. M. Molenaar, Human Development and Family Studies, The
Pennsylvania State University.
This study was supported by a grant from the National Science Foundation (NSF
0527449) to the second author. These data have not been published anywhere and have not
been submitted for publication anywhere else.
Correspondence should be addressed to Siwei Liu, Human Development and Family
Studies, 110 S-Henderson, The Pennsylvania State University, University Park, PA 16802.

Email: szl143@psu.edu.

SELECTING A LINEAR MIXED MODEL 2

Abstract
With increasing popularity, growth curve modeling is more and more often considered as the
first choice for analyzing longitudinal data. While the growth curve approach is often a
good choice, other modeling strategies may more directly answer questions of interest.

It is

common to see researchers fit growth curve models without considering alterative modeling
strategies.

In this paper we compare three approaches for analyzing longitudinal data:

repeated measures ANOVA, covariance pattern models, and growth curve models.

As all


are members of the general linear mixed model family, they represent somewhat different
assumptions about the way individuals change.

These assumptions result in different

patterns of covariation among the residuals around the fixed effects.

In this paper we first

indicate the kinds of data that are appropriately modeled by each, and use real data examples
to demonstrate possible problems associated with the blanket selection of the growth curve
model.

We then present a simulation that indicates the utility of AIC and BIC in the

selection of a proper residual covariance structure. The results cast doubt on the popular
practice of automatically using growth curve modeling for longitudinal data without
comparing the fit of different models.

Finally, we provide some practical advice for


assessing mean changes in the presence of correlated data.
Keywords: repeated measures ANOVA, covariance pattern model, growth curve
model, longitudinal data, AIC, BIC

SELECTING A LINEAR MIXED MODEL 3

Selecting a linear mixed model for longitudinal data: Repeated measures ANOVA,
covariance pattern model, and growth curve approaches

Introduction
Developmental researchers often conduct longitudinal studies to examine stability and
change, in which individuals are measured on multiple occasions. Repeated measures of
individuals create challenges for data analysis because they are not independent.

A variety

of models have been developed for analyzing longitudinal data (Hedeker & Gibbsons, 2006;
McArdle, 2009; Singer & Willett, 2003).


However, it is often not clear to substantive

researchers which model to use or how to choose among different models.

This confusion

sometimes leads to a naïve approach of fitting one particular model without consideration of
other alternatives.

In particular, when the research question involves the assessment of

change over time, fitting a growth curve model seems to have become the standard in
developmental research.

While the growth curve approach is often a good choice, other

modeling strategies may more directly answer questions of interest.

In this paper, we argue


for a model-based selection procedure to determine the proper model for assessing mean
changes in the presence of correlated data.

We use the linear mixed model (Laird & Ware,

1982) as a general framework and concentrate on three models that are subsumed under the
mixed model family: repeated measures analysis of variance (ANOVA; G. E. P. Box, 1954;
Myers, 1979; Scheffe, 1959), covariance pattern model (Hedeker & Gibbons, 2006), and the
multilevel growth curve model (Bryk & Raudenbush, 1992; Goldstein, 1995).
Before going into the details of the three models, it is necessary to clarify the scope of
this paper.

Longitudinal data analysis can be used to answer a variety of research questions.

SELECTING A LINEAR MIXED MODEL 4

Here, we focus on the most basic question in developmental research: how to model change
over time. When the data are assumed to come from a multivariate normal distribution, the
consideration of proper analytic strategy to answer this question consists of two parts: the
modeling of the means and the modeling of the residuals around the means.


In the linear

mixed model framework, the means are modeled by fixed effects, which are identical for all
individuals, whereas the residuals are modeled by random effects, which vary by individuals.
Typically, we are primarily interested in estimating and testing hypotheses about the fixed
effects, and the covariance matrix of the random effects is of secondary interest. Repeated
measures ANOVA, covariance pattern model, and the multilevel growth curve model
represent different ways to model the fixed and random effects. Both repeated measures
ANOVA and the covariance pattern model treat time as a categorical variable and have a
saturated means model, thus, the means are modeled perfectly. They account for the
correlation of the residuals around the fixed effects model by allowing the covariance matrix
of residuals to show a particular pattern: compound symmetry or the less restrictive sphericity
for repeated measures ANOVA; one of a number of alternative patterns (e.g., autoregressive)
for the covariance pattern model.

The multilevel growth curve model treats time as a

continuous predictor and assumes that the means across time follow a particular shape.
Individuals are assumed to follow the same curve shape but are allowed to vary in the

parameters that describe this curve (random effects).

Variability in these parameters and the

individual deviations around this curve result in a residual covariance pattern different from
the ANOVA or covariance pattern models.
Although the fixed effects model is usually the central concern of substantive

SELECTING A LINEAR MIXED MODEL 5

researchers, in this paper we focus on the selection of an appropriate model to account for the
correlations of the residuals around the fixed effects model.
to refer to the covariance pattern of these residuals.

We use the term error structure

For repeated measures ANOVA and the

covariance pattern model, error structure simply refers to the covariance pattern of errors.
For the growth curve model, it refers to the covariance pattern resulted from combining the

random effects and the errors around individual curves. The error structure is important
because it is included as a probability model in the maximum (or restricted maximum)
likelihood estimation of parameters.

Identifying the best fitting error structure is often

recommended to obtain a proper inference related to tests of the fixed effects (Jennrich &
Schluchter, 1986; Milliken & Johnson, 2009).

Since the “true” error structure is usually

unknown, some goodness-of-fit criterion is necessary to select the best error structure
(Jennrich & Schluchter, 1986).

In the mixed model approach, we can use AIC (Akaike

information criterion; Akaike, 1974) and BIC (Bayesian information criterion; McQuarrie &
Tsai, 1998; Schwarz, 1978) to assess the goodness-of-fit.
With the increasing popularity of growth curve modeling, it is common to assume a
simple shape (e.g. linear) to the means.


This assumption also implies a specific error

structure. Even if the means fall on a straight line, however, some other error structures may
represent a better fit to the data. This is often not tested.
In the following, we will first describe the presumed error structures of the repeated
measures ANOVA model, the covariance pattern model, and the growth curve model.

Next,

we will apply these models to both real and simulated data to highlight the problems for
inferences with fitting an inappropriate error structure, and demonstrate the utility of using

SELECTING A LINEAR MIXED MODEL 6

AIC and BIC to choose among different error structures when working with real data.
In this paper we will concentrate on complete data examples.

Linear mixed models


programs such as SAS PROC MIXED handle missing data in the dependent variables
through the use of full-information maximum likelihood estimation (FIML) based on the raw
data likelihood (Littell, Milliken, Stroup, Wolfinger, & Schabenberger, 2006).

More general

approaches such as multiple imputation (Schafer, 1997) can be implemented.

Linear mixed

models are well suited to unbalanced designs (i.e., each individual is observed at potentially
different time points).

For models discussed here, solutions for a number of unbalanced

designs are described in Milliken and Johnson (2009).
The General Linear Model (GLM) and the Linear Mixed Model (LMM)
Most researchers are familiar with the general linear model (GLM), which is usually
used to represent regression models.


If y is an n×1 vector of scores on the dependent

variable, and X is an n×k design matrix with one column representing a constant and k-1
columns representing the k-1 independent variables, then the general linear model of
regression is:
y = Xβ + ε

(1)

where ß is a k×1 vector of regression coefficients, and ε is an n×1 vector of errors with a
distribution of N (0, σε2I).

The design matrix can include dummy coded variables to

represent groups, which then turn the model into an ANOVA model. The regression
estimator is:

β = ( X T X) −1 ( X T Y)

and the regression coefficients can be directly estimated given the raw data.

(2)

SELECTING A LINEAR MIXED MODEL 7

For the simple regression equation,

y i = β 0 + β1 x i + ε i

(3)

= (y | x i ) + ε i ,
the observed value of the dependent variable, yi, is equal to the conditional mean of y given
the value of xi plus the individual’s residual. Under the independence assumptions we
estimate the values of the regression weights and the variance of the errors,
normal theory these estimates can be used to provide a proper inference.
independence assumptions are violated, the inference becomes biased.

.

Under

When the
The linear mixed

model produces a proper inference by allowing residuals to be correlated.
The linear mixed model (LMM) proposed by Laird and Ware (1982) based on the
work of Harville (1977) is expressed as:
y i = Xiβ + Zi γ i + εi

(4)

where the subscript i represents an individual or other unit of analysis (e.g. family) on which
observations are repeated. yi is an ni×1 vector of response values for the ith individual, Xi is
an ni×b design matrix of independent variable values, ß is the corresponding b×1 vector of
fixed effect parameters, Zi is an ni×g design matrix for the random effects, and γi is a g×1
vector of random effect scores. The fixed effects coefficients ß = (ß1, ..., ßb)T are common
among all individuals.

The random effects γi = (γi1, ..., γig)T can vary by individual. They

are assumed to be normally distributed with means zero and a covariance matrix G.

The εi

are within subject errors that are assumed to be normally distributed with means zero and a
covariance matrix σε2Wi. While the number of observations and design matrix vector values
can differ by unit of analysis, we will consider here the situation where each unit of analysis

SELECTING A LINEAR MIXED MODEL 8

is measured on the same occasions (Xi=X; Zi=Z; Wi=W).
We can break down Equation 4 into a conditional means model and now a covariance
structure for the residuals where

y i = (y i | X i ) = X i β
is the fixed effects means model and
Zi γ i + εi
is the random effects model. The covariance matrix of the random components Ziγi + εi (i.e.,
the error structure) is given by:
V = ZGZ T + σ ε2 W .
This form is very general. We can model it using only the errors, εi.

(5)
In this case, Z and G

would both be zero and we specify a pattern for σε2W in terms of a set of variance and/or
covariance parameters - this is done in repeated measures ANOVA and the covariance pattern
model.

As part of the estimation process we estimate the values of those parameters.

Alternatively, it may be more convenient to allow random regression weights for modeling
the residuals.

This especially holds true in the case of growth curve modeling, where the

residuals can be modeled by an individual curve that deviates from the group curve and a set
of individual errors that deviate from the individual curve.
Once V is specified the Henderson (1990) mixed model estimator for the regression
weights is:

β = (X T V −1 X) −1 (X T V −1 Y) .

(6)

In contrast to GLM, the regression coefficients in LMM are dependent on the covariance
matrix of the residuals.

Different residual structures can, thus, result in different regression

SELECTING A LINEAR MIXED MODEL 9

estimates (along with different standard errors).

For that reason, it is especially important to

determine the best error structure when modeling any data set.
The Repeated Measures ANOVA Model
The repeated measures ANOVA model was first developed by Fisher (Scheffe, 1959)
to model mean differences based on an experimental design.

In the original formulation, the

repeated measures factor represented a randomized ordering of a repeatedly administered
treatment factor.
A one-way repeated measures ANOVA can be presented as a LMM where Ziγi is zero:
y i = Xβ + ε i

(7)

(Rovine & Molenaar, 2000).

For example, for a repeated measures design with 5 occasions,

the model can be written as:

 y i1 
1
y 
1
 i2 

 y i 3  = 1
 

 yi 4 
1
 y i 5 
1

1
0
0
0
0

0
1
0
0
0

0
0
1
0
0

0
0
0

1
0

 β1 
ε i1 
β 
ε 
 2
 i2 
 β 3  + ε i 3 
 
 
β 4 
ε i 4 
 β 5 
ε i 5 

(8)

where yi1 to yi5 are the individual’s scores at the 5 occasions. The fixed-effects coefficients
ß1, ß2, ß3, ß4, and ß5 yield the expected values for yi5, yi1 - yi5, yi2 - yi5, yi3 - yi5, yi4 - yi5,
respectively. The within-person residuals εi form a 5 × 5 covariance matrix, σε2Wi, which is
assumed to be identical across subjects. The error structure is described by compound
symmetry:

σ ε2 + σ

 σ
 σ

 σ
 σ


σ

σ
σ

σ
σ
σ

σε +σ
σ
σ ε2 + σ
σ
σ
σ ε2 + σ
σ
σ
σ
2

σ
σ
σ
σ




.


σ ε2 + σ 

SELECTING A LINEAR MIXED MODEL 10

The covariance between occasions is assumed to be σ regardless of the distance between
occasions and the variance at each occasion is assumed to be σε2+σ. We estimate σ and σε2
along with the fixed effects parameters.
The Covariance Pattern Model
The covariance pattern model (Hedeker & Gibbons, 2006; Jennrich & Schlucter, 1986;
Laird & Ware, 1982) can be thought of as an extension of the repeated measures ANOVA.
Its primary purpose is identical, namely, to model mean differences as expressed in the
conditional means model; but this model allows structures other than compound symmetry to
describe the error structure.
Jennrich and Schluchter (1986) introduced methods for modeling alternative error
structures related to their work on the development of a general procedure for implementing
the Laird-Ware mixed model (SAS PROC MIXED).

Currently, SAS PROC MIXED

includes roughly 36 alternative structures. This greatly enhances the ability to model mean
differences, especially in the case of longitudinal data, where time, not treatment, is the
repeated measures factor.

To describe some of the options of the covariance pattern model,

we will concentrate on two patterns that represent qualitatively different structures than the
compound symmetry pattern described above: the first-order autoregressive (AR(1)) pattern,
and the first-order moving average (MA(1)) pattern.

Rovine and Molenaar (2005) have

shown that based on the addition rules of Granger and Morris (1976), any other covariance
pattern can be constructed as a sum of autoregressive and moving average components.
The AR(1) pattern. When the levels of the repeated measures factor are ordered
according to time, it is reasonable that the time lag between occasions would affect the size of

SELECTING A LINEAR MIXED MODEL 11

residual correlations. One pattern that allows this structure is the first-order autoregressive
or AR(1) structure.

σε2

1

ρ
ρ 2
 3
ρ
ρ 4


For 5 occasions of measurement this structure is:

ρ
1

ρ
ρ2
ρ3

ρ2
ρ
1

ρ
ρ2

ρ3
ρ2
ρ
1

ρ

ρ4

ρ3
ρ2

ρ
1 

where σε2 is the variance of errors and ρ is the autocorrelation.

Conceptually, this pattern

results from the first-order autoregressive process:

ε it = ρε i(t −1) + υ it

(9)

where the innovation υit is normally distributed with mean zero.

It assumes that the

correlation of errors between any two consecutive occasions is identical; thus, the correlation
decreases at a constant rate as two measurements get farther away in time. This structure
differs from compound symmetry which assumes that the covariance of residuals between
any two occasions is identical.

For longitudinal data it may be unreasonable to assume that

the residual covariance between distant occasions is the same as the covariance between
adjacent occasions.
The MA(1) pattern.

When a process cuts off after a certain number of occasions,

a good model for that structure is the moving average pattern.

The first-order moving

average (MA(1)) pattern is:

σε2

1
γ

0

0
 0



1


γ 1

0 γ 1 
0 0 γ 1

where the correlation between any two adjacent occasions is identical (γ), but the correlation

SELECTING A LINEAR MIXED MODEL 12

between more distant occasions is zero.

As we can see, the larger the value of γ, the more

this diverges from compound symmetry.
Unlike ordinary regression, the covariance of the residuals is part of the linear mixed

model estimator. As a result, the fixed effects estimates, β , are affected by the presumed
error structure.

The probability of jointly observing the data depends on properly modeling

the fixed effects and properly describing the distribution of the errors.
The Multilevel Growth Curve Model
The multilevel growth curve model combines two complementary traditions: growth
models first described independently by Tucker (1958) and Rao (1958), and linear mixed
models described by Henderson (1953) and Hartley and Rao (1967).

Equivalent statistical

methods for analyzing multilevel data have been developed and described.

Here, we

concentrate on two methods: the linear mixed model approach (Laird & Ware, 1982; Littell et
al., 2006; Rosenberg, 1973), and the multilevel approach (Bryk & Raudenbush, 1992;
Goldstein, 1995).

Growth curve models can also be implemented as structural equation

models (SEM; Bauer, 2003; Curran & Bauer, 2007; Rovine & Molenaar, 2000).

Alternative

estimation approaches for these methods include empirical Bayes estimation (Bryk &
Raudenbush, 1992) and the estimator resulting from the Henderson mixed model equations
(Henderson, 1990).

Estimators from these two general methods have been shown to be

equivalent (Littell et al., 2006; Robinson, 1991).
In the form of a multilevel model, a simple linear growth curve model is:
Level 1:
Level 2:

y it = π 0 i + π 1i time+ ε it

π 0 i = β 00 + υ 0 i
π 1i = β10 + υ1i

(10)

SELECTING A LINEAR MIXED MODEL 13

where π0i and π1i are individual intercepts and slopes, εit are errors around individual lines that
are typically assumed to be independent and normally distributed with constant variances
over time (although this assumption can be relaxed in SEM), ß00 and ß10 are the average
intercept and slope, υ0i are differences between the individual and average intercepts, and υ1i
are differences between the individual and average slopes.

We can transform this into the

linear mixed model form by combining the level 1 and level 2 equations.
y it = [β 00 + β10 time] + [υ 0 i + υ1i time] + ε it .

We then get:
(11)

The error structure for this model is based on having a set of random effects that reflect the
fixed effects model. More generally, methods for properly selecting the set of random
effects to include in growth curve models have been described (Gelman & Hill, 2007;
Maxwell & Delaney, 2004).
Suppose the data comes from a repeated measures design with 5 equally spaced
occasions, a linear growth curve model with random intercept and slope can be written as:

 y i1 
1
y 
1
 i2 

 y i 3  = 1
 

 yi 4 
1
 y i 5 
1

0
1
1
1 

 β 00 
2   + 1
  β10 

3
1
1
4

0
ε i1 
ε 
1 
 i2 
υ 0i 
2   + ε i 3 
 
 υ1i
3  
ε i 4 
ε i 5 
4

(12)

which is in the form of:
y i = Xiβ + Zi γ i + εi
where X = Z, γi ~ N (0, G), and εi ~ N (0, σε2I).

(4)
The covariance matrix of the random

effects is:

σ 2
G=  i
σ si

σ is 

σ s2 

where σi2 is the variance of the random intercepts, σs2 is the variance of the random slopes,

SELECTING A LINEAR MIXED MODEL 14

and σis is the covariance between the random intercepts and slopes.

The error structure of

the this model can be called the random-coefficients (RC) structure (Wolfinger, 1996).
As demonstrated by Rovine and Molenaar (1998) and Biesanz, Deeb-Sossa,
Papadakis, Bollen, and Curran (2004), the magnitude of σis, the covariance between random
intercepts and slopes, depends on the placement of the intercept, which is often an arbitrary
decision made by researchers.

For simplicity, we show here the random-coefficients

structure for a five-occasion linear growth model with σis=0:

σ i2 + σ ε2

2
 σi
 σ i2

2
 σi
 σ2
i



σ i2
σ i2
σ i2
σ i2

2
2
2
2
2
2
2
2
2
σi +σs +σε
σ i + 2σ s
σ i + 3σ s
σ i + 4σ s 
σ i2 + 2σ s2
σ i2 + 4σ s2 + σ ε2
σ i2 + 6σ s2
σ i2 + 8σ s2 

σ i2 + 3σ s2
σ i2 + 6σ s2
σ i2 + 9σ s2 + σ ε2
σ i2 + 12σ s2 
σ i2 + 4σ s2
σ i2 + 8σ s2
σ i2 + 12σ s2
σ i2 + 16σ s2 + σ ε2 

This pattern assumes a functional relationship among variances along the diagonal over time
and a functional relationship among covariances over time which depends on the spacing
between occasions1.

In the case that all individual trajectories are parallel to the group

trajectory, the terms related to σs2 are all 0 and the pattern reduces to compound symmetry.
With small variation in the individual slopes, the linear growth curve error structure and
compound symmetry are almost indistinguishable. As the variability in the slopes increases,
the two patterns diverge.
Writing the Repeated Measures ANOVA and Covariance Pattern Models as a Multilevel
Model
To help compare the repeated measures ANOVA and the covariance pattern model to

1

The pattern showed here results from the specific way in which we specified the random effects.

In general,

the straight line growth curve model assumes that the variance is a convex function of time, not necessarily
showing an increasing or decreasing pattern.

SELECTING A LINEAR MIXED MODEL 15

the growth curve model, we can alternatively express them as multilevel models.

With five

repeated occasions they become:
Level 1:

y it = π 0 i + π 1i υ1 + π 2 i υ 2 + π 3 i υ 3 + π 4 i υ 4 + ε it

Level 2:

π 0 i = β 00

(13)

π 1 i = β 10
π 2 i = β 20
π 3 i = β 30
π 4 i = β 40
where υ1, υ2, υ3, and υ4 represent dummy-coded time. Notice that there are no random
effects at level-2.

This is because the covariance structure is completely modeled by

patterning the level-1 εits.
measures ANOVA.

If we choose compound symmetry, we have the standard repeated

By selecting another structure, we would have a covariance pattern

model.
An alternative way to model the compound symmetry structure would be to include a
random intercept only along with a diagonal covariance matrix of the εit. For this model the
first equation of level-2 becomes:

π 0 i = β 00 + μ 0i
Everything else remains the same. This alternative would be analogous to a growth curve
model in which all of the individual curves are parallel differing only in level, not in slope.
In this case, all of the occasion residual variances would be modeled as identical and the
residual covariances would all be the same.
Repeated Measures ANOVA, Covariance Pattern Model or Growth Curve Model?

SELECTING A LINEAR MIXED MODEL 16

A comparison among repeated measures ANOVA, the covariance pattern model, and
the growth curve model shows two of the modeling decisions that the researcher must make.
First, what is the proper fixed effects model for the data?

When the fixed effects

model cannot be hypothesized a priori, it must be determined empirically.

One of the

decisions we have to make when modeling the means is whether to treat time as categorical
or continuous.

In repeated measures ANOVA and the covariance pattern model, time is

discrete and treated as a categorical variable, whereas in the growth curve model, time is
continuous.

Hence, growth curve models are more convenient when dealing with data

where individuals are measured at varying occasions (unbalanced design).

In many

developmental studies, researchers are only interested in changes in a construct across several
time points, and each participant is measured on the same occasions (balanced design).
this case, treating time as a categorical variable may be more appropriate.

In

When we have a

priori hypotheses about the means, such as a straight line, a linear growth curve model can be
fitted.

Alternatively, in ANOVA or the covariance pattern model, we can use a linear

polynomial as a planned contrast “predictor variable”.

The variability around this straight

line could then be modeled using, for example, a compound symmetry error pattern.
Second, what type of error structure best fits the data?

When certain parameter

values are small, the difference between structures can be indistinguishable.

When the

variability in the random slopes is small, the random-coefficients structure assumed by the
linear growth curve model is very similar to compound symmetry.

When the correlation

between adjacent occasions is very small, compound symmetry, AR(1), and MA(1) may be
very similar.

When the adjacent correlation is small enough so the square of that coefficient

SELECTING A LINEAR MIXED MODEL 17

approaches zero, AR(1) and MA(1) may be indistinguishable.

However, using different

structures in the model estimation may lead to differences in the inferences.

Even when

different error structures all lead to significant results, they can have different effect sizes
which can change the interpretation of the results.
With the increasing popularity of growth curve modeling, the above decision making
process is often ignored.

In particular, the linear growth curve seems to be the model of

choice these days. With the linear growth curve model as the only model considered,
researchers fail to consider 1) whether the data follow the pattern assumed by the model (i.e.,
do all individuals follow a straight line?), and 2) whether an alternative model may better fit
the data.

In the following, we use two real data examples to illustrate the potential problems

with this approach.
Two Real Data Examples
As part of a study by Belsky and Rovine (1990) a measure of job satisfaction was
collected on four occasions for husbands from families expecting a child.

The data were

collected at roughly 3-month intervals with the first occasion occurring about 3 months
before the baby was due.

A plot of the means along with fitted intercept and slope model

appears in Figure 1a and shows a generally decreasing mean pattern2.
[FIGURE 1A ABOUT HERE]
We first analyzed the data using a straight line growth curve model.
2

The original data had a quadratic trend (Belsky & Rovine, 1990).

The estimated

To concentrate on the linear growth curve,

we rescaled the data by shifting the third data point by a constant for all individuals.

After rescaling, there was

no significant higher order polynomial trend in the data and the original covariance structure of the residuals
remained unchanged.

The original data also showed idiosyncratic patterns and the growth curve error structure

was a poor fitting model for the original data.

SELECTING A LINEAR MIXED MODEL 18

intercept and slope along with the standard errors appear in Table 1. To assess the relative
fit of the model, we included AIC and BIC.

A smaller number in AIC and BIC indicates a

better fit (more details about AIC and BIC are given in the methods section).

We next

analyzed the data using a straight line fixed effects model with some alternative error
structures that have been shown to represent plausible structures for longitudinal data3.
Looking at the table we see that the best fitting error structure according to AIC was
UNSTRUCTURED (UN) and according to BIC was TOEPLIZ with HETEROGENEOUS
variances (TOEPH).

According to these models, the time effect is not significant, whereas

according to the standard growth curve model it is significant.

If we assume for now that

the best fitting model according to AIC and BIC is closer to the “truth” (which will be tested
in the simulation study described later), the standard growth curve model is likely to have
resulted in a Type I error.
[TABLE 1 ABOUT HERE]
This result indicates that when individuals do not follow a common trajectory, the
standard growth curve model may not be appropriate.

We can see evidence of these

different trajectory shapes by looking at a spaghetti plot of the observed data (Figure 1b),
3

No golden rule exists for determining the number and type of error structures to include in the comparison

when dealing with real data.

The error structures we included in this example are typical of those one might

see in longitudinal studies.

CS is the structure assumed under traditional analysis of variance and often

represents the structure when the repeated measures effect is a randomized ordering.
appropriate when more adjacent errors are more correlated than more separated errors.

AR(1) is often
Toeplitz is a

generalized autoregressive model and allows errors at different spacings to have different correlations.

MA(1)

indicates a process in which the adjacent errors are correlated, but more distal errors are uncorrelated.

Since

the homogeneity of error variances across occasions may not hold, we also included heterogeneous versions of
these structures.

Researchers should feel free to include additional structures or test fewer structures if they

have a good reason of doing so.

One could, for example, begin with an unstructured covariance pattern as a

baseline, look at the pattern of variances and covariances in this matrix and compare this empirically to a set of
more parsimonious alternatives.

We will revisit this issue in the discussion section.

SELECTING A LINEAR MIXED MODEL 19

which seems to indicate that no typical trajectory exists for these individuals. We can also
see this by calculating a set of orthonormalized polynomial trend scores.

Looking at the

variances (Table 2), we see that the individual linear trend had a relatively large variance and
somewhat less but still sizable variances for the individual quadratic and cubic trends.
suggests idiosyncratic change rather than a common trajectory shape.

This

For this scenario, the

growth curve error structure which is based on a common trajectory shape for each individual
may improperly account for the true covariance structure of the residuals.
[FIGURE 1B AND TABLE 2 ABOUT HERE]
Littell et al. (2006) presented data from a study of strength resulting from different
programs of weight training.
number of occasions.

In this study, the amount of weight lifted was measured on a

We look at the first four occasions. From the spaghetti plot (Figure

2) and the distributions of the orthonormalized polynomial components (last column in Table
2), we see that unlike the previous example, the variances of the quadratic and cubic trends
are very small.

The predominant individual trajectory is a straight line.

expect the standard growth curve model to have a good fit to the data.

Given this we may

However, the

variances of the errors around the fixed effect straight line regression model do not follow the
strict functional relationship we would expect under this model.

Testing a number of

different error structures, we see that the AR(1) error structure around the straight line means
model is optimal for these data (Table 3).

Comparing the results of the standard growth

curve model to AR(1), we see that although both models give significant results, the estimates
are different.

In this case, the standard growth curve model underestimated the time effect.

So, although the pattern of growth is linear and similar for all subjects, the

SELECTING A LINEAR MIXED MODEL 20

random-coefficients error structure assumed by the linear growth curve model may still not
be optimal.
[FIGURE 2 AND TABLE 3 ABOUT HERE]
Other examples could be selected to show that the linear growth curve model can be
the preferred model over the others tested.

We present these counter examples primarily to

indicate that the blanket selection of the growth curve model may result in a less than optimal
inference.

Hence, selecting an error structure is an important part of the inferential process.

Given that, we next consider how feasible it is to select the “correct” error structure.

We can

only demonstrate when we know what the true error model is, so we will investigate whether
our model comparison approach can recover the true model.

This is an important

consideration in evaluating any statistical technique, because if the method cannot select the
correct model when the true model is known, we have less confidence in its ability to select
among a set of competing models.

Conversely, if the method can select the true model

under a variety of conditions, we have more confidence in its general utility for selecting a
best model.
To investigate the degree to which comparative fit criteria can select a model when
the “true” model is known, we simulated data which have a straight line means pattern and
one of four error structures: compound symmetry (CS), AR(1), MA(1), and
random-coefficients (RC).

These error structures represent, in order, the repeated-measures

ANOVA model, two covariance pattern models, and the linear growth curve model with
random intercept and slope, and are typically thought to represent qualitatively different
processes (Box & Jenkins, 1970; Jennrich & Schluchter, 1986). While this is not an

SELECTING A LINEAR MIXED MODEL 21

exhaustive set, we feel that these four patterns are different enough to allow us in this initial
investigation to examine whether AIC and BIC are able to identify the correct error structure
under various conditions.
Since AIC and BIC will not always select the proper model, we examine the degree to
which the inference of the time effect is affected.

We discuss the extent to which our results

can be generalized to guide real data analysis in the discussion section.
Method
Simulation
We simulated data based on a linear means model4.

Specifically, all data were

assumed to come from a repeated measures study with five equally spaced occasions.

The

means from the first to the last occasions were set to 5, 10, 15, 20, and 25, respectively.

The

covariance matrix of residuals was simulated to show one of the following patterns.
A covariance matrix showing the compound symmetry

Compound symmetry.
pattern was shown on page 9.
three-factorial design.

To simulate data with this error structure, we used a

The first factor was effect size, which contained two levels: medium

(.5) and large (.8) (Cohen, 1988).

The second factor was intraclass correlation (ICC), which

had three levels: small (ρ=.2), medium (ρ=.5), and large (ρ=.8).

Finally, we also varied the

sample size to be small (20), medium (100), or large (200). We chose these numbers
because they were representative of the spectrum of values in typical developmental research.
To construct the covariance matrix, we used the formula:

4

We limited ourselves to a straight line means model to make the model selection a choice among the different

error covariance structures given a particular means model.

SELECTING A LINEAR MIXED MODEL 22

d=

M1 - M 2
σ'

(14)

(Cohen, 1988) where d is the effect size, M1-M2 is the difference between two means, and σ'
is the standard deviation.

In our simulation, M1-M2 was the mean difference between two

adjacent time points, and σ' equaled to the square root of the elements on the main diagonal
of the covariance matrix. Hence for a compound symmetry structure,
σ' =

σ ε2 + σ

(15)

Combining the formula for intraclass correlation:
ICC =

σ

(16)

σε +σ
2

we solved for σε2 and σ.

All together, the three-factorial design yielded 18 (2×3×3)

combinations of simulation values. These values were in line with other similar studies
(Ferron, Dailey, & Yi, 2002; Keselman, Algina, Kowalchuk, & Wolfinger, 1999; Kwok, West,
& Green, 2007).
AR(1).

To simulate data with an AR(1) structure (page 11), we again used a

three-factorial design, with the same values for effect size and sample size.

Intraclass

correlation was replaced by the autoregressive coefficient, ρ, which again could vary from
small (0.2), medium (0.5) to large (0.8). Therefore, our simulation yielded 18 combinations
of simulation values for the AR(1) structure.
MA(1). The MA(1) pattern was shown on page 11.

We used the same

three-factorial design, where the factors were effect size, sample size, and the moving
average coefficient, γ.

Because γ has to be less than or equal to 0.5 for the covariance

matrix to be positive definite, we used two values, 0.2 and 0.5.

This gave us 12

SELECTING A LINEAR MIXED MODEL 23

combinations of simulation values.
Random-coefficients structure (RC).
was determined by

σε2

σ i2
and G = 
σ si

Because Z was constant, the RC structure

σ is 
 . To make the RC covariance matrix comparable
σ s2 

to the other structures, we set the value of σε2 and σi2 as equivalent to the σε2 and σ in
compound symmetry.

That is, the variance at the first occasion in RC was the same as the

variance in the CS, AR(1), and MA(1) structures.

σ s2
r=
σ i2

Moreover, we defined a ratio:
(17)

which could be either small (0.1) or medium (0.25). This parameter controlled the rate of
change in variance and covariances over time. Because σis is arbitrary and subject to scaling,
it was set to 0.

The simulation of the RC structure thus followed a four-factorial design,

with r as the added factor.
(100) sample sizes.

We used a medium effect size (0.5), and small (20) and medium

In total, there were 12 (1×2×3×2) combinations of simulation values.

For each combination of simulation values, we simulated 100 sets of data using
different seeds.
Analysis
We analyzed the data in SAS using PROC MIXED with the four error structures
which we used in our simulation.

In other words, each set of data was fitted with its true

model and three alternative models. We compared the AIC and BIC of these four models
and examined the power of these fit indices to identify the correct error structure. These
indices both penalize the -2*loglikelihood (-2l) for the number of parameters estimated in the
model. When comparing different models that differ in both fixed and random effects, we

SELECTING A LINEAR MIXED MODEL 24

use a maximum-likelihood (ML) estimator. Here, we are interested only in comparing the
covariance structures under identical means models, so we consider the restricted (or residual)
maximum-likelihood (REML) estimator.5
AIC = -2l +2d

The penalties for AIC and BIC are different:

(Akaike, 1974)

(18)

and
BIC = -2l + d log(n) (Schwarz, 1978)

(19)

where in REML d equals the effective number of estimated covariance parameters and n
equals (number of observations – rank(X)). Given these penalties, AIC tends in the
direction of selecting the more complex model, whereas BIC tends to select the more
parsimonious model (Wolfinger, 1996).
Since AIC and BIC will not always select the true model, particularly when the
sample size is small, we compare the tail probabilities (p-values) of the linear time effect
produced by the best ANOVA or covariance pattern model and the growth curve model.
These analyses indicate the extent to which the fixed effects inferences are affected by the
error structure selected by AIC and BIC.

This information is important when considering

the consequences of fitting a particular model when the data do not follow its presumed error
structure.
Results
Using AIC and BIC to Recover the True Model

5

We are interested in comparing covariance structures under a common means model (here the linear trend

assumed under the linear growth curve model).

This is equivalent to using a planned linear contrast for the

repeated measure ANOVA and the covariance pattern model.

In the case in which the means model is not

known, the growth curve model could be a higher order polynomial.

The analogous ANOVA or covariance

pattern model could include a similar set of planned polynomial contrasts.

SELECTING A LINEAR MIXED MODEL 25

We first looked at the average AIC and BIC of the four models for each of the 100
data sets generated by different combinations of simulation values.

We found that repeated

measures ANOVA always had the lowest average AIC and BIC when it was the true model,
regardless of effect size, ICC, and sample size (results not shown).

In other words,

modeling with a CS error structure, on average, always yielded better fit than modeling with
AR(1), MA(1) or RC when the true structure was indeed compound symmetric.

Similarly,

the covariance pattern models with AR(1) and MA(1) structures always had the lowest AIC
and BIC when they represented the true error structure (results not shown).

In contrast, the

growth curve model sometimes did not have the lowest average AIC and BIC when it was the
true model. Table 4 shows the average AIC of the four models when fitted to data simulated
with an RC structure, effect size of 0.5.
italicized.

The lowest AIC values among the four models were

We did not include BIC in the table because these two fit indices showed the

same pattern.

As we can see from Table 4, the growth curve model, on average, was not

necessarily the best fitting model when the sample was small and when sample size = 100, r
= 0.1, and ρ = 0.2.
[TABLE 4 ABOUT HERE]
We then looked at the number of times the true model was selected by AIC and BIC
out of 100 comparisons.

Figure 3(a) shows the success rates of AIC in selecting the

ANOVA model with different simulation values.

We do not present the success rates of BIC

because they were almost identical to those of AIC (the same apply hereinafter).

As

indicated in Figure 3(a), AIC and BIC performed very well when the true error structure was
CS.

They had the lowest success rates selecting the true model when ICC and sample size

SELECTING A LINEAR MIXED MODEL 26

were both small.

Yet, even in the worst scenario, they successfully picked the ANOVA

model approximately 80% of the time. As ICC and sample size increased, the success rates
quickly improved to more than 90%.
[FIGURE 3 ABOUT HERE]
For data simulated to follow an AR(1) or MA(1) error structure, the success rate of
AIC and BIC was less satisfactory.

As shown in Figure 3(b), when γ was small, AIC and

BIC correctly picked the MA(1) model 58% to 78% of the time, with a higher percentage
corresponding to a larger sample size.
increased to 0.5.

These numbers increased to more than 90% when γ

The same pattern was found for the AR(1) structure.

As shown in Figure

3(c), AIC and BIC had the lowest success rates in selecting the AR(1) model when ρ was
small and sample size was small -only about 40%.

When ρ was small but the sample size

increased to 100, the numbers increased to about 65%.

If we increased both the value of ρ

and the sample size, AIC and BIC would have success rates close to 100%.
For data with an RC error structure (Figure 3(d)), AIC and BIC were less successful
when the sample size was small, with the smallest success rate occurring when either ρ (the
ratio of σi2 over the sum of σε2 and σi2) or r (the ratio of σs2 over σi2) was small. The success
rate increased slightly as ρ and r had larger values, but they were still very low in most cases
(8% to 29%). The only exception was when ρ = 0.8 and r = 0.25, where the success rates
reached approximately 70%.
BIC substantially.

A sample size of 100 improved the performance of AIC and

In most cases, the growth curve model was correctly chosen.

The performance of AIC and BIC seemed to be influenced by the similarity between
the true error structure and competing error structures. This can be shown by looking at

SELECTING A LINEAR MIXED MODEL 27

which alternative model AIC and BIC picked over the true model.

For example, an AR(1)

structure with a small ρ was very closed to CS and MA(1), thus, when it represented the true
model, AIC and BIC tended to wrongly pick the ANOVA model or the MA(1) model.

If the

alternative structure was more parsimonious than the true structure, it became even more
difficult for AIC and BIC to identify the true structure.

For instance, when the true structure

was RC and ρ and r were small, σi2 and σs2 would both be small.

In this case, the variance

and covariance would change slowly, showing a pattern that resembled CS and AR(1).
Moreover, CS and AR(1) were more parsimonious than RC because they required fewer
parameter estimates. As a result, AIC and BIC tended to choose the ANOVA model or the
AR(1) model instead of the growth curve model.

The problem was exacerbated when we

had a small sample size, that is, when few data were available for estimating the parameters.
In this case, AIC and BIC could hardly distinguish between these structures.
Comparing the Best Fitting ANOVA or Covariance Pattern Model to the Blanket
Selection of the Linear Growth Curve Model
In this section we are interested in considering the cost of selecting the wrong model.
How does that affect the inference?

Does the model selected by AIC and BIC yield

statistical inferences that are similar to the true model?

Does selecting the best fitting model

enhance our ability to make valid statistical inferences (in particular compared to the blanket
selection of the linear growth curve model)?

To answers these questions, we examined the

tail probabilities (p-values) of the linear time effect of the best fitting ANOVA or covariance
pattern model to that of the growth curve model.

For ANOVA and the covariance pattern

models, we used the linear polynomial contrast to capture the linear time effect; for the

SELECTING A LINEAR MIXED MODEL 28

growth curve model, we used the linear slope. To compare them with the true model, we
extracted the minimum (MIN), maximum (MAX), and the three quartiles (25%, 50%, 75%)
from the distribution of p-values of the linear time effect produced by the true model. We
then looked at the distribution of tail probabilities for the best fitting model and the growth
curve model by counting the frequencies of p-values they produced in each of the ranges
defined by those numbers (< MIN, 0 - 25%, 25 - 50%, 50 - 75%, 75 - 100%, and > MAX).
Because we were most concerned about the situations where AIC and BIC failed to identify
the true model, we plotted these distributions for scenarios in which the success rates of AIC
and BIC were the lowest.

As shown in Figure 4, the distribution of tail probabilities

produced by the model selected by AIC (as well as BIC, which gave identical results) was
much closer to the true model6 than the growth curve model, for which the distribution of tail
probabilities concentrated at the higher end.

This indicates that the blanket selection of the

growth curve model tends to yield higher p-values for the linear time effect and thus has a
lower power.

In this case, using AIC and BIC to select the error structure for the data lowers

the risk of making a Type II error.
[FIGURE 4 ABOUT HERE]
Next, we looked at the distribution of tail probabilities of the linear time effect for the
best fitting model when the growth curve model is the true model. We plotted four
scenarios in Figure 5.

We see that when the success rate was low, the model with the lowest

AIC tended to underestimate the p-value, regardless of the sample size. The results were

6

Note that this figure shows the distribution when the success rates of AIC and BIC were the lowest.

With

higher success rates, the distribution of tail probabilities in the model selected by AIC and BIC were nearly
identical to the true model.

SELECTING A LINEAR MIXED MODEL 29

similar for BIC. This suggests that when the fit indices fail to identify the growth curve
model as the true model, th