83 C. Lo¨fgren, H. Ohlsson Economics of Education Review 18 1999 79–88
Table 2 Arithmetic means
Uppsala Umeå
Both Not
Pass Pass with
Total Not
Pass Pass with
Total Total
completed distinction
completed distinction
Background Women,
48 41
43 42
29 43
39 38
41 Age, years
26.5 25.6
24.7 25.5
27.9 27.0
26.4 26.8
25.8 Science program,
35 33
41 35
57 17
30 33
35 secondary school,
Grade point average, 3.7
3.7 4.1
3.8 3.5
3.7 3.7
3.7 3.8
secondary school Study time,
3.9 2.5
2.2 2.7
2.2 2.0
2.4 2.2
2.6 economics, years
High grades, 17
21 31
23 40
29 58
49 28
economics, Study programs
Public administration, 26
36 19
30 14
14 22
19 28
Business economics, 35
25 30
28 29
43 44
40 30
Social science, 9
6 14
8 14
22 16
10 International
9 6
11 8
– –
– –
6 economics,
Other programs, 9
6 14
8 7
Single subject 13
21 14
18 43
43 13
24 19
courses, Thesis
Coauthored, 17
44 60
44 29
57 41
43 D-level thesis,
35 13
14 17
29 22
19 17
Applied econometrics, 13
24 14
– –
– –
11 Spring 1993,
30 32
24 30
29 29
9 16
27 Number of students
23 84
37 144
7 7
23 37
181 Number of students,
19 72
36 127
6 4
22 32
159 grade point average
bility of using the summer vacation for thesis work which would shorten the time for spring students when
compared to the fall students. Table 2 displays means of the variables at the two
universities. The numbers are similar. According to the table it seems that higher theses grades are associated
with younger students, with higher previous grades, with coauthoring, and with C-theses. One of the variables
stands out more than others: Coauthors constitute a far smaller share of the students not completing their theses
than of the total group of students. In Uppsala only 17 of noncompleters were coauthors while 44 of students
that passed and 60 of those who passed with distinction came from this group. The pattern in Umeå is similar.
3
The students in the applied econometrics course in Uppsala also seem to come out better than other groups
of students, as far as the averages in Table 2 can lead us. Data in the table must be interpreted with great caution.
Specific cells in the table may represent a small number of students, particularly for the case of Umeå.
4. Econometric specification
Our data have two drawbacks. First, we do not have information on the study intensity of students. We would
3
In Sweden the grade given to a thesis can be either pass with distinction, pass or fail. The last one is equivalent to an
uncompleted thesis.
84 C. Lo¨fgren, H. Ohlsson Economics of Education Review 18 1999 79–88
Table 3 Semester when the thesis is completed
Semester Number of students that
Hazard rate, complete
could potentially complete 1
85 181
47 2
38 96
40 3
14 58
24 4
6 44
14 5
5 38
13 6
1 33
3 7
2 32
6 Total
151 482
31 7
? 30
?
have preferred to have information about the total num- ber of days and the total number of hours spent on the
thesis work. Bearing this mind, we assume that study intensity is constant over students or at least does not
vary systematically with regard to the explanatory vari- ables.
Second, our data are discrete. We know during which semester that thesis work was completed but we do not
know the exact date when the thesis was completed. Grouped data regression analysis e.g., Bra¨nna¨s, 1987
and discrete time hazard analysis e.g., Allison, 1984 are two empirical methods that can be used to study data
of this type. These methods provide a measurement of the probability of both the occurrence and the timing of
an event. The empirical analysis presented here is based on event history analysis.
4
A time profile of thesis completion is given in Table 3. A data set consisting of 482 cases has been con-
structed in the following way: The first 181 cases con- sists of the total number of observed students. They were
all potential completers in their first semester of thesis work. Less than half of them did actually complete dur-
ing the first semester the hazard rate is 47 leaving 96 students to potentially complete during for the second
semester. Of these 38 did complete, leaving 58 potential completers for the third semester. Over the seven sem-
esters the sum of potential completers in each semester is 482, which constitutes our data set.
Among the first 181 cases in the data set the dependent variable is coded 1 for the 85 completers and 0 for the
96 noncompleters. These represent the next 96 cases for
4
We have also estimated grouped data regression models. The results are similar to those presented in this paper. The
results are available on request.
which the dependent variable is coded 1 for the 38 com- pleters and 0 for the 58 noncompleters in the second
semester. Working this way up to the seventh semester gives a total of 151 cases being coded 1 for the depen-
dent variable equaling the number of students completing their thesis during the observed period. In this way the
data set gives information of whether a student has com- pleted his thesis or not. It also gives information of the
time it has taken to complete. A student who has com- pleted during the first semester is only represented
among the first 181 cases, thereby contributing 1 “per- son-semester” to the data set. A student completing dur-
ing the third semester is represented as well among the first 181 and the second 96 cases with a 0 in both cases
for the dependent variable as among the third 58 cases with a 1 for the dependent variable thereby contributing
3 “person-semesters” to the data set.
The last column of Table 3 shows that the hazard rate decreases over time. Less than 10 of the remaining stu-
dents complete during the 6th and the 7th semester. We do not know if, and if so when, the remaining 30 students
will complete. The following model has been estimated:
For the dependent variable, y
it
, we have y
it
5
H
1 if student i has completed during semester t, 0 if student i has not completed during semester t.
The hazard rate, P
it
, is the probability that student i completes hisher thesis in semester t given that the the-
sis is not completed in the preceding semester, or P
it
5 Pry
it
5 1|y
it 2 1
5 0 Assume that there is a continuous index Z
it
that rep-
85 C. Lo¨fgren, H. Ohlsson Economics of Education Review 18 1999 79–88
resents the possibilities to complete the thesis for each of the students in semester t who did not complete
before. These possibilities are determined both by prefer- ences and ability. Suppose also that Z
it
is a linear func- tion of the explanatory variables x
it
. We have Z
it
5 a 1 bx
it
Assume also that the larger Z
it
, the larger the student’s possibilities to complete in semester t. Say more specifi-
cally that if Z 0 the thesis will be completed. This means that P
it
5 FZ
it
. Let F be a cumulative logistic distribution function. Then
P
it
5 1
1 1 e
− a 1 bx
it
This results in the following likelihood function: L 5 P
n i 5 1
S
1 1 1 e
− a 1 bx
it
D
y
it
S
1 1 1 e
a 1 bx
it
D
1 2 y
it
6 which has been maximized with respect to the para-
meters a and b using the LIMDEP package Greene, 1995. The estimation results are presented in Section 5.
5. Evidence