Specifying the empirical model

379 S.L. DesJardins et al. Economics of Education Review 18 1999 375–390 ture research, one drawback of model 1 is that it assumes that all the determinants of the event are accounted for by the explanatory variables z k . Model 1 also assumes that the effects of the explanatory vari- ables are constant over time. Violations of either of these assumptions, which are common when doing social science research, may cause biased estimates. The model outlined below generalizes model 1 by allowing for time-varying effects and includes an unob- served heterogeneity variable. The new model is there- fore a substantial improvement over the proportional hazards model presented above see McCall, 1994 for details. To account for unobserved heterogeneity, it is assumed that the event of interest is influenced by a ran- dom variable u, where u is unobserved and distributed independently of z k . Let G denote the cumulative distri- bution function c.d.f. of u. For identification purposes the mean of u is fixed at 1 the unit mean assumption is simply a normalization; the mean could be fixed at any finite value. Let PK 5 k uK k 2 1, z 1 , …, z k , u rep- resent the conditional probability that the event occurs in period k given that it has not occurred in the first k 2 1 periods of enrollment. The values of the time-varying regressors in periods 1 through k, z 1 , …, z k are observ- able and the unobserved variable is specified as u. It is assumed that PK 5 k uK k 2 1,z 1 ,…,z k ,u 5 3 1 2 exp 2 expa k 1 b k z k u where b k measures the possibly time-varying effect of z k in period k and a k is again a time-varying constant term, k 5 1, 2, 3, …. Model 1 is the special case of 3 where b k 5 b for all k and u equals 1 with prob- ability one. Model 3 is estimated by maximum likelihood and non-parametric maximum likelihood techniques see Heckman Singer, 1984. McCall 1994 has shown that G is non-parametrically identified so that the latter method is feasible. The competing risks models jointly estimating first stopout and graduation estimated were run as robustness checks of our single risks results, specify a functional form similar to 3 for each risk. Each risk has a separate unobserved heterogeneity variable although it is possible that they are correlated.

4. Specifying the empirical model

4.1. The sample and statistical routine Table 1 provides a detailed description of the compo- sition of the sample. The original sample consisted of 4100 students who entered the University of Minnesota Minneapolis campus only as New High School students Table 1 Descriptive statistics of the sample Variable Term one Term one Term one range or mean SE Asians 0–1 0.05 — Blacks 0–1 0.02 — Whites 0–1 0.91 — Hispanics 0–1 0.009 — Females 0–1 0.47 — Disabled 0–1 0.02 — ACT score 3–36 22.8 5.02 HS rank 1–99 70.9 23.24 From metro area 0–1 0.65 — From out of state 0–1 0.16 — MN From reciprocity 0–1 0.14 — state From other US state 0–1 0.06 — Enrollment age 15–39 18.3 1.2 Institute of 0–1 0.21 — Technology General College 0–1 0.14 — College of Liberal 0–1 0.65 — Arts Cum GPA a 0–4.00 2.58 0.87 Athlete a 0–1 0.03 — Transfer credits 1–39 13.5 9.7 Loan a 40–3011 774 316 Earnings a 11–2734 603 466 Scholarship a 8–2430 458 528 Grants a 18–1908 609 373 Workstudy a 3–4049 1955 1392 a Indicates possibly time-varying regressors. in the fall term of 1986. 2 After deleting records with missing information the effective sample size used in the event history procedure was 3975, or roughly 97 of the original sample. Twenty-two terms of data were col- lected on these individuals from a variety of institutional sources. It should be noted that this dataset includes only one record per person. In contrast, when using logistic regression to do event history modeling one must con- struct a “person-period” dataset which includes a record for each time period in which the individual is at risk of the event see Allison, 1984; Singer Willett, 1991; Yamaguchi, 1991; DesJardins, 1993. After construction, the dataset was moved to a Cray X-MP-EA supercomputer housed at the Minnesota Supercomputer Institute. The single risks models were 2 New High School students are students entering the Univer- sity with fewer than 39 transfer credits. Some of these students may have taken college course work while in high school through the Postsecondary Educational Opportunity Program funded by the state of Minnesota. 380 S.L. DesJardins et al. Economics of Education Review 18 1999 375–390 estimated with a FORTRAN program initially developed by Bruce Meyer of Northwestern University and modi- fied for our purposes by co-author Brian McCall. The competing risks specifications were estimated using a statistical model designed and programmed by McCall. The maximum likelihood technique used to estimate the models is an iterative process and, coupled with the large number of parameters included as regressors, the amount of memory needed to estimate the models is substantial. As indicated in Table 1, the value of some of the inde- pendent variables may change from term-to-term. Also, the averages cited for each of the financial aid variables are conditional on receipt of an aid offer, and the transfer credit average includes only students with previous col- lege credits which is why the transfer credit and finan- cial aid variables are italicized in Table 1. Another note about the financial aid variables is necessary. For the fol- lowing analyses the amount of aid by type offered to an individual is the relevant measure. Aid offered is used in an attempt to mitigate the self-selection endogeneity bias that results if financial aid paid is used. 4.2. The empirical model As mentioned above, there are four different model specifications with respect to the outcome of interest; time to first stopout, time to dropout, time to “censored” dropout, and a competing risks model of the duration to first stopout and graduation. For each of the models estimated the specification of the dependent variable is duration until the time of the relevant event events in the competing risks case. Thus, conditional on having the event, for each individual in the sample we know the time the number of terms to the relevant events. Time to events, therefore, is the fundamental outcome of interest in each of the models estimated. As is the case for many discrete dependent variable models, an unobserved continuous variable representing the individual’s utility level actually generates the dis- crete outcome of interest. It is assumed that each student makes a decision about continued enrollment by weigh- ing the future costs and benefits of going to college. If the net benefits are negative positive, the student exits remains enrolled in college. Thus, the assumption that students make rational utility calculations allows us to implicitly include student intentions, a factor found to be important in the student-departure literature Tinto, 1975; Bean, 1978. That students base the choice of whether to stay or not to stay in school on internal opti- mality conditions is an often overlooked but important point. “It is this choice component that distinguishes the econometric analysis of transition data from standard applied statistical analysis of survival and transition data and gives a richness but also an added complexity to econometric work” Lancaster, 1990, p. 6. The vectors of regressors z 1 , …, z k specified in 3 include individual background, organizational, and environmental variables. The independent variables included in the models were chosen based on theoretical considerations and previous research on student depar- ture. Individual background variables include race, gen- der, age, initial home location and whether the student has a disability. Race is entered into the models by inclusion of three dummy variables Asian-American, African-American, and ChicanoHispanic. The refer- ence group is white students. American-Indian and inter- national students were omitted because of the small num- ber of these students in the sample. Race was included in the empirical specification since many studies of student departure have found that minority students tend to have higher probabilities of dropout and stopout, and lower probabilities of graduation than majority students. There may very well be race differences but it is also possible that these results are a function of insufficient control variables in models used to date. Race was also included because little information is available about the time pro- file of college departure for minority students. Gender is specified by inclusion of a dummy variable indicating whether the student is a female or not. Over the years, conflicting results have been found with regard to the relationship between gender and college departure. Thus, this variable is included to examine whether there are longitudinal differentials by gender. Age at the time of initial enrollment is also included as a regressor and it is hypothesized that older students are more likely to stop out and drop out than traditional college students given the likelihood of significant time constraints jobs, family. A variable indicating whether the student is dis- abled is also included. No a priori empirical research was found indicating whether disabled students were at higher risk of leaving college before graduation. During the 1980s, however, disabled students at the study insti- tution had voiced concerns about physical access to buildings, classrooms, and special programs. Therefore, the inclusion of this variable seemed appropriate. Home location is included as a control and oper- ationalized by a series of dummy variables. These vari- ables include whether a student is from the Twin Cities metropolitan area, from greater Minnesota, or from a tui- tion reciprocity agreement state. The reference category consists of students who are non-Minnesotans and not from a reciprocity state. Research has shown that dis- tance from campus to a student’s home is associated with persistence Ramist, 1981. Also, since students from tui- tion reciprocity agreement states receive discounted tui- tion relative to the normal non-resident tuition, we included the reciprocity controls to examine the effects that these agreements have on student departure. Precollege variables are also considered appropriate background variables and include a student’s overall score on the ACT entrance exam and the student’s high school rank percentile. Expectations are that students 381 S.L. DesJardins et al. Economics of Education Review 18 1999 375–390 with high scholastic aptitudes should have relatively more academic potential and would be less likely to exit before graduation see Spady, 1970. It is also possible, however, that there are differences in what the ACT exam score and high school rank percentile typically used as ability proxies are measuring. ACT scores mea- sure ability within the pool of all entrance exam test tak- ers and if students with high ACT scores have more schooling options, they may be prone to leave insti- tutions if they perceive it to be a bad academic fit. High school rank percentile reflects variation within one’s high school and after controlling for other ability measures may also be thought of as a proxy for stud- ent effort. Also included is a variable indicating the number of transfer credits of University matriculants. Students who have had some prior college experience they could have taken college course work while in high school should be better able to adjust to college life, be more likely to become academically and socially integrated Tinto, 1975, and, therefore, be more likely to persist and graduate than students entering college for the first time. There is at least one alternative hypothesis though: stu- dents who enter with previous college course work may be “movers” who are searching for the right institutional fit and are therefore more likely to leave. Institution-related variables are included to examine the effects of student interactions with the institution. The initial collegiate unit of enrollment of a student is included to examine whether there are college specific environmental factors that help to explain student depar- ture from college. Cross-sectional designs have found that students in the Institute of Technology IT are less likely to drop out and more likely to graduate than Col- lege of Liberal Arts CLA students Matross DesJard- ins, 1994. General College GC students, 3 on the other hand, appear to have lower chances of graduating and higher chances of dropping out than students enrolled in other collegiate units. It is not clear, however, that these subgroup results garnered from aggregate graduation and retention rate data will hold after accounting for other factors usually found to be related to student departure. Also, even though students may change collegiate units during their academic careers, we were only interested in their initial college of enrollment because most student departure at this institution takes place from these units. In future studies we will examine how transferring among colleges allowing collegiate unit to vary by term affects student departure decisions, especially at the upper division. 3 General College enrolls underprepared and other special needs students and prepares them for transfer to schools and colleges of the University and other higher education insti- tutions. General College does not grant degrees. A student’s grade point average for each term of enrollment is calculated and included to control for vari- ations in academic performance. One’s grade point aver- age is also hypothesized to be the reward for successful academic achievement. Financial aid offered is included for each term and dissagregated into its component parts: loans, scholarships, grants, workstudy earnings, and earnings as a student employee other than workstudy on campus. Typically, financial circumstances are con- sidered environmental variables because they are out of the control of the institution Bean, 1981. Since state and institutional policymakers do have some direct con- trol over the way aid is distributed, aid variables are con- sidered organizational in this model. For instance, grants are included separately from scholarships in order to examine whether there are differences in how these sources of aid independently affect student departure [St John and Starkey 1995 discovered that financial sub- sidies have differential effects]. To examine these effects in more detail we plan to disaggregate grants into federal and state components so that we can evaluate whether these aid packages have differential effects on student departure. Finally, a dummy variable is included indicating whether the student is an athlete during each term of enrollment. Athletes’ rates of dropout and graduation have been a source of much discussion nationally and at the study institution. Therefore a variable distinguishing athletes from the general student population is included in an effort to better understand the longitudinal nature of athletes’ academic progress Naughton, 1996.

5. The empirical results