An analysis of variance test for random

An Analysis of Variance Test for
Random Attrition
David J. Weiss
California State University, Los Angeles
The proper way to treat missing scores in a fixed effect analysis of
variance remains a matter of some controversy. There is agreement,
however, on a key assumption underlying all of the proposed techniques.
Inequality in cell sizes should not be attributable to the treatments.
Missing scores should occur haphazardly. An objective procedure for
evaluating the assumption of random attrition is proposed. Rather than
attempting to determine why scores are absent, the researcher should
simply try to assess whether systematic effects of the treatments are
associated with the cell sizes. This can be accomphshed with ordinary
analysis of variance, using the same factorial structure as for the
experimental design. The scores that are present are assigned a "1,"
while those that are absent are assigned a "0." Any significant F ratio
mitigates against the assumption of random attrition.
The seemingly trivial disappearance of a few scores can be severely
annoying for a researcher executing that most ordinary of experimental
plans, a factorial design with independent groups (fixed effect model).
Unequal cell sizes present an analytic challenge. Expert opinion on the

proper procedures for handling these nonorthogonal designs has shifted
over the years (Appelbaum & Cramer, 1974; Cramer & Appelbaum,
1980; Herr & Gaebelein, 1978), and there is still a lack of consensus
among the analysis of variance textbooks. For example, Winer, Brown,
and Michels (1991, p. 386) recommend the method of unweighted
means, while Maxwell and Delaney (1990, p. 290) suggest this method
be avoided, preferring instead a more complex least squares solution.
Keppel (1991, p. 291) also recommends unweighted means despite a
"slight bias."
One aspect of the discussion has remained constant. The algorithmic
advice for analysis of experimental factors presupposes that the reasons
Author Info: David J. Weiss, Department of Psychology, Califomia State University, Los
Angeles, 5151 State University Drive, Los Angeles, CA 90032. email:
dweiss@calstatela.edu.

^

Author's Note: I thank Stanley P. Azen for a critical review of the manuscript.
Joumal of Social Behavior and Personality, 1999, Vol. 14, No. 3, 433^38.
©2000 Select Press, Novato, CA, 415/435-4461.


'

434

JOURNAL OF SOCIAL BEHAVIOR AND PERSONALITY

for the inequality are unrelated to the treatments. Underlying all of the
proposed techniques is the assumption that, as Maxwell and Delaney
phrase the requirement, "...the treatments are not differentially responsible for subjects failing to complete the study" (Maxwell & Delaney,
1990, p. 273). The idea is that accidental attrition produces subsamples
that are effectively random samples of the original groups, and thus one
can use the data from those who remain to predict the data of those who
disappeared. Classical examples of acceptable reasons for data loss
include scheduling mishaps or equipment failure. On the other hand,
suppose volunteers in an obnoxious experimental condition are less
likely to complete their task than those experiencing a benign treatment.
One might worry that participants who do face up to the harshness are
different from those who avoid it, in ways that may be important to the
experiment. Such bias obviously threatens the validity of any conclusions based on the extant data.

From the researcher's perspective, it is not always apparent when
attrition has been caused by treatment. Since the advent of institutional
review boards, blatantly aversive experimental conditions are rare. When
the nature of the treatments is less clear-cut, the researcher must make a
judgment about the reasons behind the varying numbers. The basis for
that judgment may be unacceptably subjective.
Those who do know the impact of the treatment, namely the volunteers who either didn't show up or stopped before the end, are poor
informants. Usually they vanish without a trace. Even when interrogation is possible, demand characteristics govem the responses. Without
mind-reading skills, it is impossible to evaluate the suddenly recalled
dental appointments and ailing relatives. The genteel participants with
whom I am familiar would never offend the researcher by decrying the
treatment. One might explore less direct ways of inducing no-shows and
dropouts to reveal their reasons, such as anonymous mailers retumed to
a third party, but such attempts are problematic. Blaming the treatment
may still be seen as an insult to the researcher.
The decision that missing scores have occurred systematically is an
important one that should not be made on the basis of casual inspection.
While the reasons underlying particular missing scores cannot generally
be known, variation in cell sizes suggests that biased selection has
occurred. This in itself is a substantive conclusion, one that may have

sufficient implication to justify its report as a component of the data
analysis.
Consider a weight-loss study in which participants are randomly
assigned to combinations of exercise and diet regimens. Program efficacy may be assessed using the difference between pre- and post-

Weiss

RANDOM ATTRITION

435

program weight [an arguable procedure (Weiss, Walker, & Hill, 1988),
but bear with the example]. If there are disparities among the numbers of
people finishing the program and reporting for the final weigh-in for the
various treatment combinations, that information may be valuable in
planning future programs.
Similarly informative are disproportions that arise when classificatory variables such as gender influence completion of the experimental
task. If an equal number of women and men is assigned to each of several
treatments, but the men are more likely to produce data, then the
researcher may wish to explore possible sexism in the procedure or

instruments.
Even when disparities do not have obvious meaning, they threaten
the validity of any subsequent analysis. Without a guiding procedure, it
is difficult for the researcher to know when to make the painful decision
to reject the data. Therefore I propose that a statistical analysis of cell
size inequality routinely precede nonorthogonal analysis of variance.
The decision about attrition may be made using formal hypothesistesting machinery, with the emphasis shifted from assessment of reasons
to assessment of disparity. The null hypothesis is that the numbers of lost
scores for all design cells are equal. The results of the test of this
hypothesis should be reported along with the primary nonorthogonal
analysis.
The proposed test is simply an analysis of variance that considers
the presence or absence of a score, rather than its magnitude. Attending
to the factorial structure, the analysis calls for replacing each actual score
with a "1" and each planned but missing score with a "0." This ANOVA
will therefore have equal cell n, and the planned cell size will be the
number of replicates (scores per cell). The values "1" and "0" are
arbitrary, as any other values would yield the same F ratio; but one and
zero constitute a natural code for "present" and "absent."
A significant F ratio for any source, either main effect or interaction,

marks concentrated inequality of attrition. Specific comparisons may
also be used when the researcher anticipates that particular treatment
combinations may prove troublesome. The present-absent coding may
also be used to test for systematic trend in the pattem of missing scores.
Researchers also carry out nonorthogonal analyses when data have
not been lost in a literal sense. In some investigations, personal characteristics such as gender, ethnicity, or health status may be among the
factors of interest. Participants are recruited to fill particular design cells.
In such cases, cell size inequalities reflect differences in ease of recruitment, an issue that may have substantive importance. Here the null
hypothesis is that the numbers of scores present for all design cells are

436

JOURNAL OF SOCIAL BEHAVIOR AND PERSONALITY

equal. The slight difference in wording leads to a difference in the way
the test is carried out. It seems appropriate to consider as the number of
replicates the largest number of participants obtained for any cell. The
adjustment ensures that the researcher will not be "rewarded" for an
overly optimistic projection of cell sizes, which might have the consequence that all cells would fall far short of the goal and the analysis
would have little power to detect disparities.

EMPIRICAL EXAMPLE
Researchers seldom report attrition data, since dropouts are seen
merely as a minor inconvenience to be handled by the computer. In
contrast, Rudy, Estok, Kerr, and Menzel (1994) carried out an investigation in which they explicitly looked at participant retention. Their report
provided numerical values that can illustrate the value of the proposed
analysis. Participants were either runners or nonninners recruited by
mail for a longitudinal study of exercise. The type of incentive provided,
money or gifts of equal value, was the manipulated variable.
The numbers of participants who began and completed the study are
shown in Table 1. Initial disparities presumably reflect differential rates
of volunteering as well as a difference in the proportion of runners and
nonninners. The analysis of variance on cell size inequality for the 2x2
design, using 54 (the largest cell size) as the number of replicates,
yielded significant F ratios for Running, F (1,212) = 16.08, p .O5.1 would conclude that it was easier to recruit and
retain mnners, for whom money appeared to be a greater lure than gifts.
Substantive conclusions regarding the exercise outcome measure would
therefore have to be tempered by this disparity. Rudy et al. (1994), who
were interested purely in retention and not in recruitment effects, similarly reported an advantage for money over gifts.
STATISTICAL ISSUES
The use of ones and zeros as scores raises concem that analysis of

variance is inappropriate because the standard assumptions are violated.
A dichotomous variable is not normally distributed. The variance in a
given group is directly linked to the dropout rate for that group (in the
range below .5); thus if the hypothesis of unequal attrition is true,
heterogeneity of variance is assured. However, Lunney (1970) empirically examined the robustness of analysis of variance using dichotomous
scores with fixed effect models and equal cell sizes. With regard to both
Type I error rate and power, his simulation results supported the use of

Weiss

TABLE 1

RANDOM ATTRITION

437

Number of participants (data from Rudy et al., (1994))
Beginning of Study

End of Study


Gifts

Money

Gifts

Money

Runners

48

54

30

Nonninners

34


30

24

82

84

54

44
22
66

analysis of variance in situations in which (a) the response proportions
are less extreme than 80%-20% if there are at least 20 df for error, and
(b) the response proportions are more extreme and there are at least 40 df
for error.
Since current practice does not employ any test of random attrition,

it seems unlikely that researchers will be overly concemed about Type II
errors when the proposed test is used. Type I errors, which inappropriately call into question the validity of the subsequent nonorthogonal
analysis, are likely to be a barrier against acceptance of the proposed
procedure. For small total numbers of planned scores, Lunney's results
show that the observed Type I error rate will be lower than expected; the
F test is conservative for small n when a dichotomous variable is used.
Chi-square tests are often used with dichotomous responses. Cochran
(1950), arguing without the benefit of simulation, championed use of the
F statistic as though the responses were normally distributed. A major
advantage of the proposed technique is that the analysis of cell sizes has
the same factorial structure as the experimental design itself, allowing
attrition effects to be localized just as response effects are.
CONCLUSION
When unequal cell sizes occur, the altemative to testing for equality
of attrition is for the researcher to make the strong behavioral assumption
that scores have been lost haphazardly. The computerized statistics
package, inexorably imposing a default option, allows the researcher to
avoid thinking about the potential problem (Orme & Reis, 1991). If an
experimental variable is selectively inducing participants to withdraw,
conclusions based on those who remain may be inaccurate. One can, of
course, blithely hope that those who remain in each treatment are a
random subsample of those assigned, but it seems more likely that as the
song says, "only the strong survive."

438

JOURNAL OF SOCIAL BEHAVIOR AND PERSONALITY

For dropout rates in the moderate range likely to be of concem, the
test of random attrition identifies pattems effectively. For small dropout
rates (the range below .2), or for extremely large rates (above .8, which
would seem not to be a pragmatic concem), the test is relatively weak
unless the number of planned scores is large. The positive way to regard
this lack of power is to note that the test will not signal alarm in a smallscale study unless systematic attrition is pronounced.
REFERENCES
Appelbaum, ML, & Cramer, E.M. (1974). Some problems in the nonorthogonal
analysis of variance. Psychological Bulletin, 81, 335-343.
Cochran, W.F. (1950). The comparison of percentages in matched samples.
Biometrika. 37, 256-266.
Cramer, E.M., & Appelbaum, M.I. (1980). Nonorthogonal analysis of variance—Once again. Psychological Bulletin, 87, 51-57.
Herr, D.G., & Gaebelein, J. (1978). Nonorthogonal two-way analysis of variance. Psychological Bulletin, 85, 207-216.
Keppel, G. (1991). Design and analysis: A researcher's handbook (3rd ed.).
Upper Saddle River, NJ: Prentice Hall.
Lumiey, G.H. (1970). Using analysis of variance with a dichotomous dependent
variable: Anempincdlstudy. Journal of Educational Measurement, 7. 263269.
Maxwell, S.E., & Delaney, H.D. (1990). Designing experiments and analyzing
data. Pacific Grove, CA: Brooks/Cole.
Orme, J.G., & Reis, J. (1991). Multiple regression with missing data. Joumal of
Social Science Research, 15, 61-91.
Rudy, E.B., Estok, P.J., Kerr, M.E., & Menzel, L. (1994). Research incentives:
Money versus gifts. Nursing Research, 43. 253-255.
Weiss, D.J., Walker, D.L., & Hill, D. (1988). The choice of a measure in ahealthproniotion study. Health Education Research: Theory and Practice, 3, 381386.
Winer, B.J., Brown, D.R., & Michels, K.M. (1991). Statistical principles in
experimental design (3rd ed.). New York: McGraw-Hill.