Introduction Directory UMM :Data Elmu:jurnal:E:Economics of Education Review:Vol20.Issue1.2001:

Economics of Education Review 20 2001 81–92 www.elsevier.comlocateeconedurev Can flexible non-linear modeling tell us anything new about educational productivity? Bruce D. Baker Department of Teaching and Leadership, The University of Kansas, Lawrence, KS 66045, USA Received 28 November 1998; accepted 24 March 1999 Abstract The objective of this study is to test, under relatively simple circumstances, whether flexible non-linear models — including neural networks and genetic algorithms — can reveal otherwise unexpected patterns of relationship in typical school productivity data. Further, it is my objective to identify useful methods by which “questions raised” by flexible modeling can be explored with respect to our theoretical understandings of educational productivity. This study applies three types of algorithm — Backpropagation, Generalized Regression Neural Networks GRNN and Group Method of Data Handling GMDH — alongside linear regression modeling to school-level data on 183 elementary schools. The study finds that flexible modeling does raise unique questions in the form of identifiable non-linear relationships that go otherwise unnoticed when applying conventional methods.  2001 Elsevier Science Ltd. All rights reserved. JEL classification: I21 Keywords: Neural networks; Functional form

1. Introduction

Production function applications to educational research have gradually, but steadily, evolved since the Coleman Report of 1966. Among the current trends are increased emphasis on student-level data Goldhaber Brewer, 1996, school-level data Harter, 1999; Mur- nane Levy, 1996, greater understanding of the hier- archical design of our system of schooling Kaplan Elliott, 1997, 1 and the relevance of the structural nature of direct and indirect relationships within that system Kaplan Elliott, 1997. In addition, substantial empha- sis has been placed on identifying more useful measures E-mail address: bdbakerukans.edu B.D. Baker. 1 Although not formally a production function study, Kaplan and Elliott’s hierarchical structural model for validating policy indicators explores the sensitivity of schooling performance out- comes to changes in sets of policy indicators in a way that parallels the production function theoretical framework. 0272-775701 - see front matter  2001 Elsevier Science Ltd. All rights reserved. PII: S 0 2 7 2 - 7 7 5 7 9 9 0 0 0 5 1 - 5 of the outcome — “educational productivity”. Progress in outcome measurement has, however, led to a diver- gence rather than a convergence of philosophies, with current preferences ranging from economic impacts on the labor market and earnings Betts, 1996; Card Krueger, 1996 to more basal school achievement meas- ures such as minimal concept mastery Harter, 1999. Despite the apparent substantive progress made with respect to conceptual and methodological concerns, a few basic rules continue to govern production function methodologies. First, production function studies are typically performed within the narrow confines of formal deductive hypothesis testing. That is, the researcher begins with a question along the lines of — “Are school- level instructional expenditures per pupil related to stud- ent achievement outcomes?” Next, the researcher estab- lishes hisher hypothesis, based on prior research and theoretical assumptions regarding the expected out- comes, and constructs a statistical model for testing the hypothesis. Although a well-understood and generally accepted 82 B.D. Baker Economics of Education Review 20 2001 81–92 paradigm, this purely deductive approach presents cer- tain potential difficulties to the researcher. For one, this approach requires that the researcher has or finds some prior knowledge as to how the system in question works. This knowledge may ultimately be rooted in anything from valid theoretical constructs to personal or political biases, the latter of these problems becoming more prevalent when dealing with politically heated issues like educational productivity. Problems associated with a priori understanding of the system are confounded when applied to the development and application of a statistical model for hypothesis test- ing. Typically, the production function is expressed as follows: fQ, X uS50, 1 such that outcomes, Q, are a function of schooling inputs, X, and non-school inputs, S. This function is most often analyzed in the form of a linear regression equ- ation: Q ij 5bX ij 1gS j 1e ij , 2 where Q ij is the outcome of student i in school j, X ij are the schooling inputs to that student, S j is a vector of non- schooling inputs and e is a stochastic error term. Linear regression applications to production function modeling and related estimation procedures are limited in a variety of ways. These limitations include, but are not limited to, difficulties with the selection of model parameters. The usefulness of linear regression models lies in our ability to interpret individual regression coef- ficients, their statistical significance and respective mag- nitudes. More meaningful models tend to be those that are parsimonious, addressing necessarily narrow ques- tions as exemplified by Goldhaber and Brewer 1996 in their study of the effects of teacher characteristics on student performance outcomes. While such studies pro- vide valuable insights with respect to the question at hand, the necessity to repeatedly narrow research ques- tions to this degree increases the probability that edu- cation researchers and economists may miss potentially important questions. Other studies have relied on data dumping 2 of massive numbers of potential inputs, neglecting the effects of multicollinearity on both the magnitude and significance of the regression coefficients of interest. Harter 1999, 2 I choose the term dumping here where others might use data mining. The intent is not to choose a more derogatory term, but to use a term that emphasizes the distinct difference between this method and data “mining” methods discussed later. Data dumping, as it is used herein, refers to attempting to include all possible variables in a single model whereas data mining is used to describe a process of sifting through all possible vari- ables to find potential relationships that may be modeled. for example, separately includes salaries and benefits for each category of personnel in Texas schools, eventually concluding none of them to be significant, but finding more obscure measures such as salary supplements to be positively related to performance and substitute pay to be negatively related to performance. 3 While data dump- ing may in some ways serve as a useful preliminary step in such complex analyses, it is unlikely to yield clear, definitive or even useful results in linear regression modeling. In addition, tools such as step-wise regression for selecting more parsimonious linear regression models from among the various predictors are generally inad- equate. Selection of functional form is similarly problematic in regression modeling. Linear regression modeling, by definition, seeks to identify linear relationships between specified input and outcome measures. 4 That is, relation- ships are assessed on the extent to which unit increases in X are constantly related to unit increases in Y. Hanu- shek 1996, p. 55, for example, discusses 90 studies which collectively generate 377 attempts to estimate a linear, or some highly restricted variant, 5 relationship between teacher–pupil ratios andor teacher education and student performance. Hanushek concludes that no systematic linear relationship exists. We would perhaps be wise to consider the possibility, if not the probability, that within matrices of data on schooling productivity, there are actually some non-lin- ear relationships that are “tighter” 6 than some linear relationships. These relationships, where their curvilinear nature substantially violates assumptions of linearity, may go unrecognized or their magnitude underestimated when using linear methods Cohn Geske, 1990, p. 166. The only way to identify these relationships via conventional methods is to know or at least expect in advance that they exist and integrate them into econo- metric models as higher-order terms or alternative func- tional forms. A common a priori assumption of non-linearity rooted in economic theory is that of diminishing returns. As noted by Betts 1996, p. 163, “the education production function, like all well-behaved production functions 3 Harter 1999, p. 294. 4 Similarly, in structural models such as those applied by Kaplan and Elliott 1997, each direct effect in the structural equation model is represented as a linear relationship. Although combinations of direct and indirect effects may yield non-lin- earities, structural equation models do not explicitly allow for sets of non-linear direct effects. It is presumed that these relationships could be accommodated by either data re-scaling log–log relationships or the inclusion of higher-order terms squared, cubed, etc.. 5 For example, linear relationship between logged ln terms and other functional forms to be discussed later. 6 In terms of R-squared if fitted with a curve. 83 B.D. Baker Economics of Education Review 20 2001 81–92 [emphasis added] is subject to diminishing returns. This behavior is generally well captured by applying a log– log specification of wages relative to per pupil spend- ing”. Others have replaced the log of spending with a quadratic function, achieving a similar interpretation Johnson Stafford, 1973. More recently, Figlio 1999 questioned the effectiveness of highly restrictive speci- fications of functional form for estimating education pro- duction functions, noting in particular the usefulness of more flexible estimation procedures. 7 Betts’ choice of the phrase “well-behaved” is indica- tive of the standard mindset with which we approach production function modeling. This common econo- metric phrase suggests that our primary objective as a researcher is to determine the extent to which reality or data generated by the underlying processes of our reality “behaves” according to the mathematical specification of our mental model of that process. 8 An underlying pre- sumption being that if the data fail to conform to our model, that there is either some flaw in the data or the system, rather than a flaw in our mental model. In light of this perspective, an appropriate re-framing of Hanu- shek’s 1996 conclusion might be: “We have yet to gen- erate statistical findings to support that the relationship between teacher–pupil ratios andor teacher education and student performance conforms to our mental model for that relationship”. 9 Complementary inductive methods and analytical tools do exist — some of which can specifically provide support in the areas of parameter selection and identifi- cation of potential non-linear forms. The methods dem- onstrated in this study fall under the broad analytic umbrella of Data Mining. Data mining is the process of exploring the available data for patterns and relationships Lemke, 1997. Data mining activities range from the visual exploration of bivariate scatterplots, often done as a preliminary to formal econometric modeling, to the use of iterative pattern learning algorithms or neural net- works to search for potential relationships in data sets. While it is presumed that the development of most econometric models involves a great deal of inductive 7 In particular, Figlio finds that a more flexible transcen- dental logarithmic translog functional form Christensen, Jor- gensen Lau, 1971 provides more sensitive estimates of the spending-to-achievement relationship than a more restrictive Cobb–Douglas specification. See also Douglas and Sulock 1995. 8 The intent of this criticism is not to make a particular example of Betts, but to exemplify how deeply rooted and broadly accepted this mindset has become. The phrasing chosen by Betts is indeed standard. It just happens that he was the author of the passage that I chose to cite. 9 And even more specifically, that the relationship fails to conform to our “restricted mathematical representation of our existing mental model”. tinkering by the researcher, it is also presumed that the researcher cannot efficiently explore all possibilities or conceptualize the plethora of non-linear relationships that may exist. In addition, the human researcher brings with himher the baggage of personal and political pre- disposition as to what the data should say. Thus, this study explores the use of flexible non-linear modeling, including neural networks, as a supplement to the typical preliminary activities of induction, and as a complement to conventional deductive production function analysis. Broadly speaking, neural networks are iterative pattern learning algorithms modeled after the physiology of human cognitive processes. Unfortunately, the term “neural network” is also frequently misused as an over- arching classification encompassing other types of algor- ithms, including genetic algorithms, that achieve similar ends, but by different means. This study applies both neural and genetic algorithms and refers to them collec- tively as flexible non-linear models. Applied to econometrics, flexible non-linear models are free of a priori assumptions of functional form, deriv- ing deterministic equations from available data, selecting predictors that best serve the modeling objective — pre- diction accuracy. Cross-sectional predictive and time- series forecasting accuracy of flexible non-linear models has been validated in the fields of medicine Buchman, Kubos, Seidler Siegforth, 1994, real estate valuation Worzala, Lenk Silva, 1995, bankruptcy assessment Odom Sharda, 1994 and forecasting education spending Baker Richards, 2000. Others have noted the potential usefulness of neural networks for exploring data in social science research Liao, 1992. The objective of this study is to test whether flexible non-linear models can reveal otherwise unexpected pat- terns of relationship in typical school productivity data. This study builds on the work of Figlio 1999 by com- bining flexible functional form with inductive estimation algorithms to provide a more sensitive estimation of potential relationships in the given data set. The ultimate goal of this exercise is to identify useful methods and develop a framework by which “questions raised” by flexible modeling can be explored with respect to our theoretical understandings of educational productivity. This study applies three types of flexible estimation pro- cedure, alongside linear regression modeling, to school- level data on 183 elementary schools.

2. Methods