264 M.J. Hilmer Economics of Education Review 20 2001 263–278
of Lee 1983 type two-stage corrections for self-selec- tion bias, we extend the analysis to consider such selec-
tivity correction models. The estimated results for the years of college completed by 2- and 4-year attendees
suggest that the choice of specification of the first-stage college attendance equation may have a significant
impact on the second-stage selectivity-corrected coef- ficient estimates. Namely, the effects of several key vari-
ables are estimated to be statistically significant under some specifications but not others. Prominent among
these are test scores, which are only estimated to have large and significant effects among 4-year attendees for
the ordered probit and a series of family background and high school performance measures, which are only esti-
mated to have large and significant effects among 2-year attendees for the multinomial logit. In addition to the
estimated coefficients differing across specifications, pre- dicted outcomes for students of different genders and
ethnicities possessing average sample characteristics appear to differ across specifications. Hence, the results
suggest the importance of considering specification issues before estimating the college attendance equation,
especially when being used as the first stage of selection correction models.
2. Econometric issues
The econometric specifications examined in this study are well known and are all examples of discrete choice
models.
1
In the context of college attendance, the models all assume that a student makes his or her attendance
decision on the basis of a latent variable, either the expected utility of an attendance option, the probability
of college graduation, or more generally the underlying propensity
to attend
college. Unfortunately,
the researcher does not directly observe the latent variable.
Instead, he or she only observes the student’s actual attendance decision for purposes of this study 4-year
college attendance [E
i
= 2], 2-year college attendance
[E
i
= 1], or non-attendance [E
i
= 0].
2
The discrete choice models discussed below are all methods of “back-
1
Descriptions of the models analyzed in this study are available in most econometric texts. For a nice intuitive dis-
cussion of these types of models see Kennedy 1998. For a more rigorous treatment, the classic references are Maddala
1983 and Greene 1997.
2
There may be some question as to the definition of 2-year colleges. In this study, a student is defined as attending a 2-
year college if they are taking academic courses at a 2-year college. Students taking only vocational courses are defined as
being non-attendees. All three models in this study could also have been estimated with vocational school defined as a separ-
ate attendance path. As with Weiler 1989, doing so does not significantly alter the results.
tracking” from the observed attendance decision to the underlying relationships between certain explanatory
variables and the attendance path decision. While the basic goals of the three models are the same, they differ
according to the assumptions made about the relationship between the different attendance options.
2.1. Mutinomial logit Estimation of the multinomial logit follows directly
from expected utility maximization. As with other ran- dom utility models, the multinomial logit assumes that
a student chooses which attendance path to follow by comparing the indirect utility provided by each path and
choosing the one that provides the highest. For the cur- rent application, the student’s attendance path choice can
be defined as:
E
i
= 2 if B
2
9X
i
+ emaxB
1
9X
i
+ e
i
, B 9X
i
+ e
i
= 1 if B
1
9X
i
+ emaxB
2
9X
i
+ e
i
, B 9X
i
+ e
i
= 0 if B
9X
i
+ emaxB
2
9X
i
+ e
i
, B
1
9X
i
+ e
i
1 where X
i
is a vector of observed individual character- istics and state-level relative net attendance costs that
affect the student’s expect utility from each attendance option and
e
i
is an i.i.d log Weibull distributed error term.
3
Parameters to be estimated by maximum likeli- hood are B
, B
1
, and B
2
. The multinomial logit has gained favor in estimating
discrete choice models due to it computational ease. Namely, the probability of choosing each potential out-
come can be easily expressed and the resulting log-likeli- hood function can be maximized in a straightforward
fashion. A potential shortcoming of the multinomial logit is its reliance on the independence of irrelevant alterna-
tives IIA. The IIA property assumes that the relative probability of two existing outcomes is unaffected by the
addition of a third outcome. For example, suppose that an individual’s choice is initially between two different
outcomes and that he or she is evenly split between the two. Now, suppose we add a third alternative that is
nearly identical to the second. We would then expect the probability of choosing the second outcome to be split
in half and the probability of choosing the first outcome to be unaffected. Unfortunately, the IIA property does
not account for this, but rather splits the probabilities equally among all three alternatives in order to keep the
3
The Log Weibull Type I extreme-value distribution is assumed due to its convenient property that the cumulative den-
sity of the difference between any two random variables distrib- uted Log Weibull is given by the logistic function Kennedy,
1998, pp. 244.
265 M.J. Hilmer Economics of Education Review 20 2001 263–278
relative probabilities of the first two options equal.
4
Hence, in cases where two alternatives are close substi- tutes the multinomial logit may be inappropriate as it
relies on the IIA property. Hausman and McFadden 1984 suggest a specification test, based on dropping a
category from the estimation and observing whether the estimated coefficients change, that can be used to assess
the validity of the IIA property in the model logit model. This test provides a means test whether the multinomial
logit is an appropriate specification for this exercise.
2.2. Ordered probit The ordered probit assumes that the variable of inter-
est follows a strict ordering based on the value of the latent variable. Hilmer 1998 suggests that the latent
variable is the student’s subjective probability of gradu- ation and that his or her decision follows the natural
ordering of students with the highest probabilities attending 4-year colleges, students with midrange prob-
abilities attending 2-year colleges, and students with the lowest
probabilities attending
neither institution.
5
Accordingly, the student’s attendance path can be defined as:
E
i
= 2 if
a
2
, d9X
i
+ m
i
,` =
1 if a
1
, d9X
i
+ m
i
, a
2
= 0 if
− `,
d9X
i
+ m
i
, a
1
2 where X
i
is a vector of factor’s affecting the student’s subjective probability of graduation and
m
i
is a normally distributed error term.
a
1
and a
2
partition the student’s attendance path choice into the decision to attend a 4-
year college, attend a 2-year college, or attend no post- secondary institution and therefore represent the mini-
mum probability levels at which a student chooses to
4
As a simplified example, suppose that in the absence of 2- year colleges a student is equally likely to choose to attend a
4-year college 12 as to not attend college 12. Now, suppose the student is given the choice between a 2-year college and a 4-
year college and assume that he or she views the two as perfect substitutes. We would then expect the probabilities of non-
attendance, 2-year attendance, and 4-year attendance to be, 12, 14, 14. This is not how the multinomial logit treats the prob-
abilities, however. Due to the IIA property, the multinomial logit treats the probabilities as 13, 13, 13 in order to keep the
relative probabilities of non- and 4-year attendance constant.
5
Hilmer 1998 explains the intuition as follows: “To avoid the time cost associated with transferring, a student who thinks
he or she is likely to graduate will start at a university. A student who is uncertain about his or her ability will start at a com-
munity college since the foregone cost of the first 2 years will be much lower should he or she be forced to drop out. A student
who is not likely to graduate will choose to work since doing so will make him or her better off than attending a community
college for 2 years and dropping out.”
attend a 4-year college and a 2-year college. Parameters to be estimated by maximum likelihood are
d, a
1
, and a
2
. A primary difference between the multinomial logit
and the ordered probit is that due to the assumed natural ordering the latter does not require the IIA property.
However, for the model to be appropriate, the assumed natural ordering must be realistic. For example, the natu-
ral ordering of 4-year2-yearnon-attendance seems reasonable at least for students expecting to receive a
Bachelor’s degree due to the lower attendance cost at 2-year colleges and the transfer cost associated with
transferring from a 2-year college to a 4-year college.
6
On the other hand, if one were examining the decision between public and private 4-year colleges assuming a
natural ordering of privatepublic may not be reasonable as it has been demonstrated that many students choose
to attend public institutions that are potentially lower in quality than the private colleges they would have chosen
in order to take advantage of the in-kind subsidy afforded by public higher education Ganderton, 1992. This
observation suggests that the estimated thresholds in the ordered probit model should always be significant. If not,
then we might conclude that the assumed natural ordering and consequently the ordered probit is an inap-
propriate specification for this exercise. While this obser- vation is potentially valuable in determining whether the
ordered probit is inappropriate it would be of limited value in assessing whether it is superior to the alterna-
tives models we are discussing.
2.3. Bivariate probit with sample selection The bivariate probit with sample selection Greene,
1998 assumes that the potential student makes two sequential decisions: 1 whether to attend a postsecond-
ary institution and 2 if so which type of institution to attend. The model can thus be defined as:
7
Z
1i
= f
1
9X
1i
+ e
1i
E
2
= 1 if Z
1i
. E
1
= 1 otherwise
Z
2i
= f
2
9X
2i
+ e
2i
Z
1i
observed if Z
2i
. 0 E
= 1 otherwise
e
1i
, e
2i
BVN0,0,1,1, r
3
6
Community college transfer students may be forced to take longer to graduate for a variety of reasons. For example, com-
munity college students often take smaller class loads than uni- versity students, and as a result, are required to either spend
longer taking classes at the community college before transfer- ring or at the university after transferring. Either way, such stu-
dents will be required to spend longer in school before receiving their degree.
7
This nesting structure is much like the nested logit model proposed by Weiler 1989. Because the two nesting structures
are similar and the bivariate probit is computationally simpler, especially in two-stage selectivity-correction models, the bivari-
ate probit is much more popular.
266 M.J. Hilmer Economics of Education Review 20 2001 263–278
where Z
1i
and Z
2i
are the latent variables determining the attendancenon-attendance and 2-year4-year attendance
decisions, X
1i
and X
2i
are vectors of individual- and state- specific characteristics affecting those decisions, and the
error terms e
1i
and e
2i
are distributed bivariate normal BVN with
r representing the correlation coefficient between the two. Parameters to be estimated by
maximum likelihood are f
1
, f
2
, and r.
As with the ordered probit, a potential benefit of the bivariate probit with sample selection is that by assuming
the two attendance decisions are made sequentially, the model does not rely on the IIA property. A potential
drawback is the requirement that the error terms from the two equations be distributed jointly normal. Due to
this requirement, it should be possible to determine whether the model is inappropriate by testing whether
the assumed joint normality of the two error terms holds. Again, while such a test is valuable in determining
whether the bivariate probit with sample selection is inappropriate it would be of limited value in assessing
whether a model for which joint normality is not rejected is superior to the models discussed above.
2.4. Correcting for self-selection bias An application of the college attendance equation that
has recently become popular is as the first-stage in two- stage econometric models that correct OLS estimates for
the presence of self-selection bias. For example, Brewer, Eide and Ehrenberg 1999 estimate a multinomial logit
selection model to correct for potential selectivity bias in the return to elite private colleges while Ganderton
1992 estimates a bivariate probit selection model to correct for potential selection bias in student quality
choices at public and private universities. The problem inherent in such studies is that the observed outcomes are
the result of non-random decisions. Namely, the return to quality for students attending elite, private institutions
and the quality choices of students attending public and private universities are only observed for students mak-
ing the non-random decisions to attend an elite, private institutions and public or private universities and not for
the entire population of college-age students. As Heck- man 1979 and others demonstrate, this non-ran-
domness, or self-selection of college attendance choices violates the familiar Gauss–Markov assumptions. Conse-
quently, estimating the desired outcomes by OLS yields potentially biased results.
To correct for the potential of self-selection bias, most studies employ the two-stage methodology of Lee
1983. According to this methodology, it is possible to correct for the non-random assignment to different
attendance paths by: 1 estimating the student’s self- selected college attendance choice and 2 using those
results to calculate selectivity-correction terms that when included as regressors in the second-stage functions cor-
rect for the potential self-selection bias. The intuition behind this procedure is that the selectivity-correction
terms are derived from the attendance path estimates and therefore include the important effect that a student’s
unobservable characteristics have on his or her attend- ance path decision. Including those terms in the second-
stage functions then corrects for the bias that may be induced by students with identical observable character-
istics non-randomly self-selecting different attendance paths and subsequently making persistence decisions that
differ due strictly to differences in their unobservable characteristics.
In the work below, we examine the effect that choice of specification for the college attendance equation has
on the two-stage selectivity-corrected results by estimat- ing the years of college that a student completes. The
econometric model to be estimated is a system of reduced form equations that can be specified as:
E
i
estimated as a multinomial logit, ordered probit, 4 or bivariate probit
Y
i
5 h
1
W
i
1 h
2
l
i
1 n
i
5 where Y
i
represents the number of years of college that the student completes, W
i
is a vector of observed individ- ual characteristics affecting the student’s college persist-
ence decision, l
i
is the selectivity-correction term derived from the first-stage Eq. 4, and
n
i
is a stochastic error term. Parameters to be estimated are W
i
and l
i
, with W
i
representing the selectivity-corrected results.
3. Data