Directory UMM :Data Elmu:jurnal:S:Socio-Economic Planning Sciences:Vol34.Issue4.2000:

Socio-Economic Planning Sciences 34 (2000) 271±284
www.elsevier.com//locate/dsw

Predicting criminal recidivism using neural networks
Susan W. Palocsay*, Ping Wang, Robert G. Brookshire
Computer Information Systems/Operations Management Program, MSC 0202, James Madison University,
Harrisonburg, VA 22807, USA

Abstract
Prediction of criminal recidivism has been extensively studied in criminology with a variety of
statistical models. This article proposes the use of neural network (NN) models to address the problem
of splitting the population into two groups Ð non-recidivists and eventual recidivists Ð based on a set
of predictor variables. The results from an empirical study of the classi®cation capabilities of NN on a
well-known recidivism data set are presented and discussed in comparison with logistic regression.
Analysis indicates that NN models are competitive with, and may o€er some advantages over,
traditional statistical models in this domain. 7 2000 Elsevier Science Ltd. All rights reserved.

1. Introduction
The development of e€ective methods for predicting whether an individual released from
prison eventually returns or not is a major concern in criminology. A simple model for
predicting parole outcomes was proposed as early as 1928 [4], and was followed by the

introduction of a variety of statistical models for classifying recidivists (see [6] for an historical
overview). In general, the overall performance of these models has been considered weak due
to their high error rates (false-negative and false-positive rates above 50% [11]) and limited
explanatory power [2,29]. Researchers have thus continued to look for new and/or improved
predictive models in this area.
Of particular interest in the recidivism literature are the `split population' survival-time
models recently developed by Schmidt and Witte [30,31]. These models estimate both the
probability that an individual eventually returns to prison and the probability distribution of
* Corresponding author.
0038-0121/00/$ - see front matter 7 2000 Elsevier Science Ltd. All rights reserved.
PII: S 0 0 3 8 - 0 1 2 1 ( 0 0 ) 0 0 0 0 3 - 3

272

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

the time until return for those who are expected to return. Schmidt and Witte were able to
obtain higher predictive accuracy with these models, in terms of lower false-positive and falsenegative rates, than those previously reported in the literature.
Schmidt and Witte have also been outspoken about the need for the development of
improved statistical models for criminal-justice prediction [29]. They have submitted their work

as evidence of the bene®ts from continuing the search for more sophisticated models for
recidivism prediction, in spite of their limited explanatory ability. While they acknowledge that
attention should also be given to better determination of the individual characteristics of
recidivists as described in [1,5,8,18,21], Schmidt and Witte stress that there is still a need for
research that leads to improvements in predictive ability using available explanatory variables.
In a discussion of realistic goals for this research, they make the statement that `The ultimate
test of prediction research is not variance explained, but rather ability to predict' [29, p. 267].
In this paper, we approach the problem of predicting recidivism using arti®cial neural
networks (NNs) [26,40,41]. Numerous studies have shown NNs to be a viable alternative to
conventional statistical models for classi®cation problems (see references in [33]). In certain
applications, NNs have performed at least as well as, and often better than, some traditional
statistical methods, including logistic regression (LR), when compared on the degree of
prediction accuracy (e.g. [9,14,17,28,36±38]). NNs are appealing because they are able,
theoretically, to approximate any nonlinear form, yet do not require speci®cation of a
nonlinear model prior to analyzing the data. They also demonstrate a robust ability to make
reasonable predictions for previously unseen inputs after `learning' from example data, even in
the presence of signi®cant noise [27].
Although multivariate statistical procedures have been more commonly used by social
scientists, the use of NN models for the analysis of social-science data is not new [13]. Two
studies reported in the literature speci®cally addressed the problem of criminal recidivism

prediction with NNs. In the more recent of these studies, Caulkins et al. [6] showed that, on a
certain data set, NNs do not o€er any improvements over multiple regression for predicting
criminal recidivism. Additional analysis of their data set indicated that there was a lack of
information on the predictor variables that appeared to limit the performance of both models.
Their work also provides a good introduction to the application and appropriateness of NNs
in this domain, including a general overview of NNs for readers who are unfamiliar with their
underlying process. In the other study, Brodzinski et al. [3] compared NNs to discriminant
analysis using 778 probation cases (390 in the construction sample and 388 in the validation
sample). They obtained very impressive results on the validation data (99% classi®cation
accuracy) using an NN. Importantly, they invested a great deal of e€ort in developing the
study data set by using local court administrators and probation ocers to select the risk
factors prior to carefully coding the data from case ®les. Their work provides a good example
of the bene®ts of collecting extensive behavioral data on release cohorts prior to model
estimation.
Based on the recognized need for a robust model to aid in criminal-justice decision-making,
more research is needed to evaluate the potential contribution of NNs. This article presents the
results from a study of the classi®cation ability of NNs, in comparison with LR, for criminal
recidivism prediction. An NN model is developed employing the same data sets used by
Schmidt and Witte [30,31] and Chung et al. [7] in the development of survival-time models.


S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

273

These data were selected because they are well known in this domain and have been extensively
analyzed using a variety of statistical methods [10]. They thus provide a benchmark data set
for validating new methods. Comparative statistical analyses reported in this study suggest that
NNs can successfully compete with LR and may be better able to identify recidivists.

2. Data
The data used by Schmidt and Witte [30,31] and Chung et al. [7] for survival-time analysis
were obtained from the Inter-university Consortium for Political and Social Research [32]. The
criminal recidivism data originally contained information on two sets of releasees from North
Carolina prisons: 9457 individuals released from 1 July, 1977 to 30 June, 1978 (referred to as
the 1978 data set), and 9679 individuals released from 1 July, 1979 to 30 June, 1980 (referred
to as the 1980 data set). For comparative purposes, we used the analysis and validation data
sets in [30±32], where defective (130 in each data set) and incomplete (4709 in the 1978 data
and 3810 in the 1980 data) records were removed, for NN training and testing, respectively. A
subset of the analysis data was randomly selected for monitoring the network training, as
discussed in the next section. The total number of releasee records in each of these data sets is

provided in Table 1.
For recidivism prediction, the output or dependent variable is equal to 1 if the individual
returned to a North Carolina prison, and 0 if they did not. The input data consists of nine
explanatory variables as identi®ed and de®ned in [30±32], where `sample sentence' refers to the
prison sentence from which individuals were released. The six binary-coded variables are:
whether the individual was African-American or not; if the individual had a past serious
alcohol problem; if the individual had a history of using hard drugs; whether the sample
sentence was for a felony or misdemeanor; whether the sample sentence was for a crime
against property or not; and the individual's gender. There are three non-binary input
variables: the number of previous incarcerations, not including the sample sentence; the age (in
months) at the time of release; and the time served (in months) for the sample sentence.

Table 1
Composition of data sets
Data set

Total records

Recidivists/non-recidivists


1978
1978
1978
1980
1980
1980

1357
183
3078
1263
172
4304

505/852
65/118
1151/1927
463/800
67/105
1615/2689


Training
Monitoring
Test
Training
Monitoring
Test

274

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

3. Neural network model development
The NN model we selected for this study was the widely used multi-layer, feedforward
backpropagation network. Nine input nodes corresponding to the nine explanatory variables in
the data were connected to a hidden layer of nodes. These, in turn, were connected to a single
output node, whose value was used to classify the releasee as either an individual who returns
to prison or one who does not. Logistic activation functions were used for all hidden and
output nodes, and a linear scaling function was applied to the values of the non-binary input
variables. All NN models were built using NeuroShell 2 from Ward Systems Group, Inc.

Training an NN involves repeatedly presenting the same set of training data to the network
and iteratively adjusting weights associated with the network's inter-node connections. The
objective is to ®nd a set of weights that minimizes a total error function based on a
comparison of the network output values to the desired outputs for the training data. The
most popular algorithm for updating the weights is backpropagation, which is based on the
principle of gradient descent with the goal of minimizing the sum of the squared errors for the
training data [15,26,40,41]. The response of the net to errors during training is regulated by
two parameters, learning rate and momentum. We used an alternative backpropagation
algorithm in NeuroShell 2, similar to RPROP [24]. It is a method for updating network
weights that dynamically adjusts the size of each weight change during training using local
gradient information.
An issue encountered in NN development is when to stop training; i.e. determining when a
network has been suciently trained on a set of examples to be able to generalize new, neverbefore-seen cases. The learning algorithm seeks to minimize the sum of the squared errors
generated on the training data, but it is not guaranteed to ®nd a global minimum or even a
local minimum. And, while an NN with at least one hidden layer and enough hidden nodes
can be trained until it correctly classi®es all the example cases in almost any training-data set
[39], this can result in over®tting and, thus, development of a net that does not perform well
when presented with cases not included in the training set.
To address this issue, NeuroShell 2 implements an option that creates an entirely separate
set of `monitoring' data and uses it during training to evaluate how well the network is

predicting. NeuroShell 2 automatically computes the optimum point to save the network based
on its performance on this monitoring data set. Several studies have provided strong support
for this approach [27] to developing an NN model with good generalization capabilities (see
e.g. [19,20]). In the current case, we used a monitoring data set that is approximately 12% of
the size of the training set.

4. Experimental results
4.1. Model selection and classi®cation accuracy
In order to identify the best con®guration for the NN model, we varied the number of nodes
in the hidden layer from 5 to 50 and analyzed the training and test results for each network.
For evaluation, individuals with NN output values of 0.5 or greater were predicted to be

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

275

recidivists while those with values less than 0.5 were predicted to be non-recidivists, as in [16].
The results for all experiments were recorded in terms of the percentage of recidivists correctly
classi®ed as recidivists, the percentage of non-recidivists correctly classi®ed as non-recidivists,
and the total percentage of correct classi®cations. Table 2 shows the results from training and

testing with the 1978 data for the 10 best network con®gurations, rank-ordered by overall
accuracy on the test data. Although the 39-hidden-node network had the highest percentage of
test-set correct classi®cations (69.20%), the 26-node network performed almost as well (69.17%
overall) with a considerably smaller network con®guration. We thus selected this network for
further experimentation.
Since the initial values for the weights on NN connections are randomized, we trained 26hidden-node networks on the 1978 and 1980 training sets using 50 di€erent random number
seeds. The overall performance varied from 66.44 to 69.23% with an average of 68.39% total
correct classi®cations on the 1978 test set, and from 64.22 to 66.98% with an average of
65.90% on the 1980 test set. The complete training and test results for the NN with the highest
percentage of correct classi®cations on the test set are reported in Tables 3 and 4. Schmidt and
Witte [30,31] found that the best split population model for this data was a logit lognormal
model, where the probability of recidivism is assumed to follow a logit model and the timing of
return is lognormally distributed. For comparison purposes, Tables 3 and 4 also provide the
results from ®tting an LR model to the 1978 and 1980 training-data sets and using the
regression coecients to predict recidivism for the test data. We considered the e€ects of
stepwise introducing interaction terms among the variables but found that this provided no
substantial improvement in the LR model classi®cation results for either training data set and
actually reduced classi®cation accuracy on the test sets.
4.2. Further analysis of predictive capabilities
To further evaluate the predictive accuracy of our models, we applied the NN that was

trained on the 1978 data to the 1980 test data. These classi®cation results are reported in
Table 4, with the corresponding LR model results included for purposes of comparison.
As the results in Tables 3 and 4 show, the NN models achieved a higher total percentage of
correct classi®cations and were also more successful in predicting recidivism for both the
training and test cases. Tables 5 and 6 present measures of association that compare the ®t of
the 26-node NNs and LR models to the training and test data, respectively. The odds ratios
[22] are the ratios of the odds of being a recidivist for those who were predicted to be
recidivists to the odds of being a recidivist for those who were not predicted to be recidivists.
The odds ratio ranges from zero to in®nity with values close to one indicating no relationship,
i.e. equivalent odds. Values higher than 1 indicate more successful prediction. Yule's Q [22] is a
measure of association based on the odds ratio and ranges between ÿ1.00 and 1.00, with zero
indicating no relationship and values closer to 1.00 indicating a more successful prediction.
Relative improvement over chance (RIOC) [18], a measure frequently used in recidivism
research, indicates `the percentage of persons correctly predicted in relation to the maximum
percentage of persons who could possibly have been correctly predicted' [12, p. 202].
The measures of association in Table 5 indicate that the NN ®t the 1980 training cases better
than did LR. Also, although the measures for LR are slightly higher on the 1978 training data,

276

1978 Training results

1978 Test results

Hidden nodes

Recidivist correct
(%)

Non-recidivist correct
(%)

Total correct
(%)

Recidivist correct
(%)

Non-recidivist correct
(%)

Total correct
(%)

39
26
20
30
33
13
29
38
28
44

35.79
36.49
35.26
38.95
37.02
36.14
34.56
38.07
39.47
37.72

87.42
88.04
89.28
86.60
87.01
88.97
89.90
87.01
85.57
86.91

68.31
68.96
69.29
68.96
68.51
69.42
69.42
68.90
68.51
68.70

37.97
38.84
37.97
40.66
39.88
38.58
35.10
40.05
41.96
40.40

87.86
87.29
87.65
85.78
86.25
86.92
88.89
85.78
84.59
85.52

69.20
69.17
69.07
68.91
68.91
68.84
68.78
68.68
68.65
68.65

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

Table 2
Results for di€erent neural network con®gurations on 1978 data

277

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284
Table 3
Classi®cation accuracy for training data
Model

Recidivist correct (%)

Non-recidivist correct (%)

Total correct (%)

1978
1978
1980
1980

38.60
31.23
47.93
35.09

86.08
89.69
85.30
87.51

68.51
68.05
71.50
68.15

Neural network
Logistic regression
Neural network
Logistic regression

Table 4
Classi®cation accuracy for test data
Model

Recidivist correct (%)

Non-recidivist correct (%)

Total correct (%)

1978 Neural network
1978 Logistic regression
1980 Neural network
1980 Logistic regression
1978/1980 Neural network
1978/1980 Logistic regression

41.36
30.41
40.93
30.53
39.01
36.35

85.89
88.43
82.63
86.84
82.15
81.07

69.23
66.73
66.98
65.71
65.96
64.29

Table 5
Measures of association for training results
Model

Odds ratio

Yule's Q

RIOC

1978
1978
1980
1980

3.887
3.951
5.342
3.790

0.591
0.596
0.684
0.582

0.396
0.429
0.455
0.401

Neural network
Logistic regression
Neural network
Logistic regression

Table 6
Measures of association for test results
Model

Odds ratio

Yule's Q

RIOC

1978 Neural network
1978 Logistic regression
1980 Neural network
1980 Logistic regression
1978/1980 Neural network
1978/1980 Logistic regression

4.291
3.325
3.297
2.898
2.944
2.446

0.622
0.537
0.534
0.486
0.492
0.419

0.419
0.377
0.337
0.331
0.308
0.257

278

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

the total percentages of correct classi®cations from the two models are virtually identical on
these data. All the measures in Table 6 are consistent, indicating that the NNs provided a
better ®t to the test data than did LR.
McNemar's test [34] was used to compare the predictive accuracy of the NN and LR models
for the test data. For all subjects in the 1978 and 1980 test sets, the NNs developed with the
corresponding training set predicted signi®cantly more outcomes successfully than did LR
(X 2=14.623, P < 0.001 for the 1978 data and X 2=4.078, P = 0.043 for the 1980 data). When
both models were trained using the 1978 data and asked to predict the 1980 data, the NN also
correctly predicted a higher percentage of all cases. McNemar's test was again signi®cant
(X 2=7.732, P = 0.005). For the recidivists alone, all of the NN models Ð 1978, 1980, and
1978/1980 Ð likewise successfully predicted more outcomes than did the related LR models
(X 2=69.754, P < 0.001; X 2=78.782, P < 0.001; X 2=5.784, P = 0.016). The 1978 and 1980
LR models were, however, signi®cantly better at predicting the non-recidivists in the tests data
(X 2=13.474, P < 0.001 and X 2=34.748, P < 0.001, respectively). On the other hand, the NN
developed with the 1978 training data and applied to the 1980 test data was not signi®cantly
worse with the non-recidivists (X 2=2.259, P = 0.133).
Researchers such as Tam and Kiang [37] have pointed out that a disadvantage of NNs is
their lack of explanatory capability in comparison to traditional statistical models. While there
is no formal method for interpreting NN weights, Garson [13] has proposed a simple heuristic
for assessing the relative contribution of the input variables in determining network
predictions. The underlying idea in Garson's method is to ®nd the relative percentage of the
network's output associated with each input node by `partitioning' the weights from the input
layer to the hidden layer and from the hidden layer to the output layer. Table 7 displays the
relative input-node share percentages for our NNs. The standardized estimates of the LR
coecients are also shown. For comparison, the relative ranking of each input variable is
indicated in parentheses, with the ranking for LR based on the absolute value of the
coecient. All models emphasize the number of prior incarcerations (not including the sample

Table 7
Neural network input node shares (%) and standardized regression coecients
1978 Training

1980 Training

Variable

Neural network

Logistic regressiona

Neural network

Logistic regression

Race
Alcohol problems
Drug user
Felony or misdemeanor
Property crime
Gender
Prior incarcerations
Age
Time served

4.00
3.75
3.92
3.92
1.03
4.31
28.63
31.08
19.36

ÿ0.1573
0.1146
0.0935
ÿ0.1904
0.1093
0.1095
0.2004
ÿ0.2660
0.2191

1.64
6.19
3.16
2.90
1.23
0.80
45.88
14.32
13.81

ÿ1.009
0.1130
0.0839
ÿ0.0611
0.1367
0.0772
0.3233
ÿ0.3747
0.2262

a

Signi®cant at 0.01 level.



(5)
(8)
(6/7)
(6/7)
(9)
(4)
(2)
(1)
(3)

Signi®cant at 0.05 level.

(5)
(6)
(9)
(4)
(8)
(7)
(3)
(1)
(2)

(7)
(4)
(5)
(6)
(8)
(9)
(1)
(2)
(3)

(6)
(5)
(7)
(9)
(4)
(8)
(2)
(1)
(3)

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

279

sentence), age at the time of release, and the time served for the sample sentence in determining
the classi®cation of an individual. In the 1978 training data, the binary variable that indicates
whether the sample sentence was for a crime against property or not was considered relatively
unimportant by both the NN and LR. Similarly, both models ranked the binary variable for
an individual's gender as the least or the next-to-least important input in training on the 1980
data.
4.3. E€ect of cut-o€ value on classi®cation results
As a ®nal step in our study, we examined the e€ect of varying the 0.5 cut-o€ value on
classi®cation accuracy. An output computed from the LR model corresponds to the posterior
probability, the conditional probability of recidivism given the individual's independent
variable values [16]. The outputs produced by the NN (with one output node using a logisticactivation function) are also numerical values between 0 and 1 that can thus be interpreted as
posterior probabilities [23,25]. However, the classi®cation of individuals as either nonrecidivists or recidivists using both models depends on the cut-o€ value that is used, as
discussed in [35].

Fig. 1. E€ect of cut-o€ value on 1978 test results (using 1978-trained models).

280

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

Fig. 1 compares the total percentage of correct classi®cations made on the 1978 test data
with the 1978-trained NN and LR models, using cut-o€ values between 0.3 and 0.7. The NN
maintains its superior performance with approximately the same margin over the entire range.
A similar pattern occurs for classi®cation results on the 1980 test data, using both the 1980trained and the 1978-trained models, as shown in Figs. 2 and 3, respectively.
The graphs in Figs. 1±3 show that a cut-o€ of 0.5 provides the highest, or very close to the
highest, overall classi®cation accuracy on both data sets. However, the choice of a particular
cut-o€ value directly a€ects the percentage of recidivists that are correctly classi®ed as
recidivists and the percentage of non-recidivists that are correctly classi®ed as non-recidivists.
As expected, higher cut-o€ values reduced the classi®cation accuracy on non-recidivists but
increased the accuracy on recidivists for both models. It is noteworthy that the NN o€ers the
same ¯exibility as LR in allowing adjustment of the cut-o€ value to re¯ect the preferences of
the model's users.
As an alternative to setting a speci®c cut-o€ value, Schmidt and Witte [30,31] used the base
rate of recidivism in the training-data set and predicted recidivism for the percentage of testdata individuals who had the highest probabilities of recidivism. Using their split population
models, they reported correct predictions of 52.80% on the recidivists and 72.23% on the nonrecidivists, giving an overall classi®cation accuracy of 65.17% for the 1978 test set. Following

Fig. 2. E€ect of cut-o€ value on 1980 test results (using 1980-trained models).

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

281

this approach, the LR model correctly classi®ed 52.82% of the recidivists, 73.38% of the nonrecidivists, and 65.69% overall. In comparison, the 1978-trained NN outperformed both the
split population and LR models by correctly predicting 53.95% of the recidivists, 74.00% of
the non-recidivists, and 66.50% overall.

5. Conclusions
We have presented NNs as an alternative to traditional statistical models for generating
case-by-case results in criminal recidivism prediction. In doing so, we have demonstrated that
they o€er a viable modeling approach for this problem. Our ®ndings indicate that NNs may be
able to obtain signi®cantly higher classi®cation accuracy for criminal recidivism outcomes
relative to LR and should thus be considered when choosing a technique for estimating causal
relationships in criminology. However, their approximation and generalization capabilities are
known to depend heavily on the choice of the network topology, including number of hidden
layers, number of nodes in each hidden layer, and node activation functions, as well as the
training methodologies used. Fortunately, research has provided some general guidelines for
NN development which appear to work well in practice, as demonstrated in this paper. And,

Fig. 3. E€ect of cut-o€ value on 1980 test results (using 1978-trained models).

282

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

since evidence to date indicates that the ¯exibility and adaptability of NNs can provide
superior performance, we believe that the bene®ts of using these types of models for criminalrecidivism prediction outweigh the diculties that might be encountered during their
development.
While the models employed in our study performed fairly well, there is still a need for
further research to identify more predictive variables and models. Furthermore, while the
current study addressed the use of NNs to classify individuals as either non-recidivists or
recidivists, it did not apply survival-time models to predict the timing of recidivism. The
success of split-population models, as reported by Schmidt and Witte [30,31] and Chung et al.
[7], indicates that they merit further investigation. In our future research, we thus plan to
develop split models using NNs to provide initial predictions of which individuals (in terms of
speci®c characteristics) will eventually return to prison. These models will then be evaluated
relative to the performance of logit lognormal models.

Acknowledgements
We thank Pam Lattimore of the National Institute of Justice, and Joanna Baker, Director of
the School of Information Technology at University of North Carolina-Charlotte, for their
encouragement and interest in this project as well as for their valuable comments on the
current paper. We also thank two anonymous reviewers and the Editor-in-chief for their
suggestions in improving the paper.

References
[1] Ashford JB, LeCroy CW. Juvenile recidivism: a comparison of three prediction instruments. Adolescence
1990;25:441±50.
[2] Blumstein A, Cohen J, Roth JA, Visher CA. In: Criminal career and career criminal, Vols. I and II.
Washington, DC: National Academy Press, 1986.
[3] Brodzinski JD, Crable EA, Scherer RF. Using arti®cial intelligence to model juvenile recidivism patterns.
Computers in Human Services 1994;10:1±18.
[4] Burgess EW. Factors determining success or failure on parole. In: The workings of the indeterminate sentence
law and the parole system in Illinois. Spring®eld, IL: Illinois State Board of Parole, 1928.
[5] Byrd KR, O'Connor K, Thackrey M, Sacks JM. The utility of self-concept as a predictor of recidivism among
juvenile o€enders. The Journal of Psychology 1993;127:195±201.
[6] Caulkins J, Cohen J, Gorr W, Wei J. Predicting the criminal recidivism: a comparison of neural network
models with statistical models. Journal of Criminal Justice 1996;24:227±40.
[7] Chung C, Schmidt P, Witte AD. Survival analysis: a survey. Journal of Quantitative Criminology 1991;7:59±97.
[8] Craig RJ, Dres D. Predicting DUI recidivism with the MMPI. Alcoholism Treatment Quarterly 1989;6:97±103.
[9] DeSilets L, Golden B, Wang Q, Kumar R. Predicting salinity in the Chesapeake Bay using backpropagation.
Computers and Operations Research 1992;19:277±85.
[10] Ellerman R, Pasquale S, Tien JM. An alternative approach to modeling recidivism using quantile residual life
functions. Operations Research 1992;40:485±504.
[11] Farrington DP. Predicting individual crime rates. In: Gottfredson DM, Tonry J, editors. Crime and justice: an
annual review of research, Vol. 9. Chicago: University of Chicago Press, 1987.

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

283

[12] Farrington DP, Loeber R. Relative improvement over chance (RIOC) and phi as measures of predictive
eciency and strength of association in 2  2 tables. Journal of Quantitative Criminology 1989;5:201±13.
[13] Garson GD. A comparison of neural network and expert systems algorithms with common multivariate
procedures for analysis of social science data. Social Science Computer Review 1991;9:399±434.
[14] Goss EP, Ramchandani H. Survival prediction in the intensive care unit: a comparison of neural networks and
binary logit regression. Socio-Economic Planning Sciences 1998;32:189±98.
[15] Hinton G. How neural networks learn from experience. Scienti®c American, Special Issue: Mind and Brain
1992;September:145±51.
[16] Hosmer Jr DW, Lemeshow S. Applied logistic regression. New York: Wiley, 1989.
[17] Liang T, Chandler JS, Han I, Roan J. An empirical investigation of some data e€ects on the classi®cation
accuracy of probit, ID3, and neural networks. Contemporary Accounting Research 1992;9:306±28.
[18] Loeber R, Dishion T. Early predictors of male delinquency: a review. Psychological Bulletin 1983;94:68±99.
[19] Palocsay S, Stevens S, Brookshire R, et al. Using neural networks for trauma outcome evaluation. European
Journal of Operational Research 1996;93:369±86.
[20] Philipoom PR, Rees LP, Wiegmann L. Using neural networks to determine internally-set due date assignments
for shop scheduling. Decision Science 1994;25:825±51.
[21] Polk-Walker GC, Chan W, Meltzer AA, Goldapp G, Williams B. Psychiatric recidivism prediction factors.
Western Journal of Nursing Research 1993;15(2):163±76.
[22] Reynolds HT. Analysis of nominal data. Beverly Hills: Sage, 1977.
[23] Richard MD, Lippmann RP. Neural network classi®ers estimate Bayesian a posteriori probabilities. Neural
Computation 1991;3:461±83.
[24] Riedmiller M, Braun H. A direct adaptive method for faster backpropagation learning: the RPROP algorithm.
In: IEEE International Conference on Neural Networks, San Francisco, CA, 1993. p. 586±91.
[25] Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW. The multilayer perceptron as an approximation to
a Bayes optimal discriminant function. IEEE Transactions on Neural Networks 1990;1:296±8.
[26] Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Parallel
distributed processing, Vol. 1. Cambridge, MA: MIT Press, 1986. p. 318±459.
[27] Rumelhart DE, Widrow B, Lehr MA. The basic ideas in neural networks. Communications of the ACM
1994;37:87±91.
[28] Salchenberger LM, Cinar EM, Lash NA. Neural networks: a new tool for prediction of thrift failures. Decision
Sciences 1992;23:899±916.
[29] Schmidt P, Witte AD. Some thoughts on how and when to predict in criminal justice settings. In: New
directions in the study of justice, law, and social control. Chapter 11. New York: Plenum Press, 1990. Prepared
by the School of Justice Studies, Arizona State University, Tempe, Arizona.
[30] Schmidt P, Witte AD. Predicting criminal recidivism using `split population' survival time models. Journal of
Econometrics 1989;40:141±59.
[31] Schmidt P, Witte AD. Predicting recidivism using survival models. New York: Springer-Verlag, 1988.
[32] Schmidt P, Witte AD. Predicting Recidivism in North Carolina, 1978 and 1980. ICPSR editions. Ann-Arbor,
MI: Inter-university Consortium for Political and Social Research, 1984.
[33] Sharda R. Neural networks for the MS/OR analyst: an application bibliography. Interfaces 1994;24:116±30.
[34] Siegel S. Nonparametric statistics for the behavioral sciences. New York: McGraw-Hill, 1956.
[35] Smith WR. The e€ects of base rate and cuto€ point choice on commonly used measures of association and
accuracy in recidivism research. Journal of Quantitative Criminology 1996;12:83±111.
[36] Subramanian V, Hung MS, Hu MY. An experimental evaluation of neural networks for classi®cation.
Computers and Operations Research 1993;20:769±82.
[37] Tam KY, Kiang MY. Managerial applications of neural networks: the case of bank failure predictions.
Management Science 1992;38:926±47.
[38] Wang Q, Sun X, Golden BL, et al. A neural network model for the wire bonding process. Computers and
Operations Research 1993;20:879±88.
[39] Weiss SM, Kulikowski CA. Computer systems that learn. San Mateo, CA: Morgan Kaufman, 1991.
[40] Werbos PJ. Generalization of backpropagation with application to a recurrent gas market model. Neural
Networks 1988;1:339±56.

284

S.W. Palocsay et al. / Socio-Economic Planning Sciences 34 (2000) 271±284

[41] Werbos PJ. Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD Thesis,
Harvard University, Cambridge, MA, 1974.