Methods Directory UMM :Data Elmu:jurnal:E:Economics of Education Review:Vol18.Issue4.Oct1999:

407 B.D. Baker, C.E. Richards Economics of Education Review 18 1999 405–415 same applies for all Xs. While for inferential purposes this replication results in irresolvable collinearities, in the three-layer backpropagation network it allows for alter- nate weighting schemes to be applied to the same inputs, creating the possibility of different sensitivities of the outcome measure at different levels of each input, resulting in heightened prediction accuracy. The resca- ling procedure, activation function or “hidden layer transfer function” McMenamin, 1997, sometimes referred to as squashing Rao Rao, 1993, typically involves rescaling all inputs to a sigmoid distribution using either a logistic or hyperbolic tangent function. Backpropagation has proven an effective tool for both time-series prediction Hansen Nelson, 1997; Lachter- macher Fuller, 1995 and cross-sectional prediction Buchman et al, 1994; Odom Sharda, 1994; Worzala et al, 1995. Two alternatives used in addition to backpropagation in this study are 1 Generalized Regression neural net- works GRNN Specht, 1991 and 2 Group Method of Data Handling GMDH polynomial neural networks Farlow, 1984. Both involve identifying best predicting non-linear regression models. An advantage of Specht’s GRNN is removal of the necessity to specify a functional form by using the observed probability density function pdf of the data Caudill, 1995, p. 47. GRNN interp- olates the relationship between inputs, and inputs and outcomes by applying smoothing parameters a to mod- erate the degree of non-linearity in the relationships and serve as a sensitivity measure of the non-linear response of the outcome to changes in the inputs. Smoothing para- meters typically vary among model inputs with the opti- mal combination of smoothing parameters being selected by 1 a holdout method 1 or 2 a genetic adaptive method. 2 GRNN has been implicated for effective cross- sectional prediction of binary outcomes Buchman et al, 1994 and recommended for time-series prediction, parti- cularly for use with sparse data and data widely varying in scale Caudill, 1995, p. 47. A.G. Ivakhnenko 1966, in Farlow, 1984 proposed GMDH for identifying a best prediction polynomial via a Kolmogorov–Gabor specification. 3 GMDH polynomial fitting differs from backpropagation and GRNN in that 1 Described by Specht 1991 but not used in this study. Thus due to space constraints we opt not to discuss this method furth- er. 2 Recommended for identifying best predicting models where “input variables are of different types and some may have more of an impact on predicting the output than others” using Neuro- shell 2 WSG, 1995, p. 138. 3 y 5 a 1 O M i 5 1 a i x i 1 O M i 5 1 O M j 5 1 a ij x i x j 1 O M i 5 1 O M j 5 1 O M k 5 1 a ijk x i x j x k , where Xx 1 , x 2 , … x m is the vector of inputs and Aa 1 , a 2 , … a m is the vector of coefficients or weights Liao, 1992. no training set is specified. Rather a measure referred to as FCPSE Full Complexity Prediction Squared Error is used. FCPSE consists of a combination of Training Squared Error 4 combined with an overfitting penalty similar to that used for the PSE Prediction Squared Error 5 but including additional penalty measures for model complexity. 6 Also unlike backpropagation, GMDH generally applies linear scaling to inputs. 7

3. Methods

3.1. Data All data used for this study were provided by the National Center for Education Statistics NCES see Appendix A. Complete annual time series for all vari- ables in the models were available from 1959 through 1995. Variables used in the analyses include: CUREXP: Current expenditures per pupil in average daily attendance PCI: Per capita income ADAPOP: Ratio of average daily attendance to the population SGRNT: Local governments’ education receipts from state sources per capita BUSTAX: Business taxes and non-tax receipts to state and local governments per capita PERTAX: Personal taxes and non-tax receipts to state and local governments per capita RCPIANN: Inflation rate measured by the consumer price index All variables are measured in 1982–84 constant dol- lars. 8 While actual data were available for use as predictors for 1991 through 1995, the intent of this study was to mimic true forecasting circumstances, where such values would not be available, thus necessitating univariate 4 Referred to as Norm.MSE. Discussed in more detail in WSG 1995 pp. 149–151. 5 PSE 5 Norm.MSE 1 2 3 varp 3 kN, where N is the number of patterns in the pattern file, and k is the number of coefficients in the model which are determined in order to min- imize Norm.MSE WSG, 1995, p. 149. 6 WSG retains proprietary rights to the design of FCPSE and therefore does not disclose the formula for its determination WSG, 1995, p. 150. 7 X9 5 2 3 X 2 MinMax 2 Min 2 1, further described in WSG 1995, p. 158. 8 Historical data available in Appendix A. Data are accumu- lated from a variety of sources and compiled into the format provided for this study by the National Center for Education Statistics. For a detailed discussion of data sources and relevant adjustments see Gerald and Hussar 1996–1998, pp. 151–152. 408 B.D. Baker, C.E. Richards Economics of Education Review 18 1999 405–415 forecasts of predictors. For all univariate analyses mod- els were estimated to data from 1959 through 1990 and forecasts generated from 1991 through 1995. For all multivariate models, equations were estimated, or neural networks trained, on time series from 1960 9 through 1990 and forecasts generated from 1991 through 1995. Forecasts 1991–1995 were compared for accuracy with actual values for Current Expenditures per Pupil reported by NCES. 3.2. Univariate models and forecasts Univariate models were estimated for all series using the SAS System v6.12 selection procedure, which applies a variety of trend analyses, exponential smoo- thing methods and ARIMA models to untreated, logged and differenced forms of each series. Forecasting models were selected for each series on the basis of Root Mean Square Error RMSE. 3.3. The multivariate AR1 regression model The multivariate regression model used for forecasting educational spending is based on the National Center for Education Statistics models annually published in the Projections of Education Statistics series Gerald Hus- sar, 1996–1998. The model consists of two multiple regression equations that have taken various forms over the years. The model is based on a median voter model which presumes that spending for public goods reflects the preferences of the median voter; that is, the voter in the community with the median income andor median property value. In recent years, both equations in the model have assumed a multiplicative functional form, thus requiring the use of the natural log form of all ser- ies. Estimation of the model equations has typically been performed using AR1 methods in order to correct for autocorrelation of residuals. 10 The primary equation for forecasting Current Expendi- tures per Pupil can be expressed: ln CUREXP 5 b 1 b 1 ln PCI 1 b 2 ln SGRNT 1 1 b 3 ln ADAPOP 1 u where CUREXP, PCI, SGRNT and ADAPOP are as pre- viously defined, ln refers to the natural logarithm and “u” is the error term expected to display first-order auto- correlation. The model requires a secondary equation for generat- 9 While data were available back to 1959, accommodation of lagged variables reduced the estimable series by one period. 10 For a more in-depth discussion of the theoretical basis and mathematical specification of the model, consult Gerald and Hussar 1996–1998, pp. 151–154. ing forecast predictors of SGRNT, the measure of state contributions. The equation takes the following form: ln SGRNT 5 b 1 b 1 ln BUSTAX{1} 1 b 2 ln PERTAX{1} 1 b 3 ln ADAPOP 2 1 b 4 ln RCPIANNRCPIANN{1} 1 u where BUSTAX, 11 PERTAX, ADAPOP and RCPIANN are as previously defined, {1} refers to a lag of one per- iod and “u” is the error term expected to display first- order autocorrelation. AR1 estimation of the model was conducted using RATS 4.0 Regression Analysis of Time Series, 32-bit version. 12 Forecasts of ln SGRNT and ln CUREXP were also prepared in RATS applying the respective equations. Updating of coefficient esti- mates was not used. 3.4. Neural network methods All neural networks were estimated trained and fore- casts prepared using Neuroshell 2 Release 3.0, 32-bit WSG, 1995. For backpropagation, a Jordan–Elman recurrent backpropagation architecture, where the net- work includes a connection from the output layer to the input layer, was selected. 13 Caudill 1995, pp. 19–24 recommends recurrent backpropagation for time-series prediction. Default learning parameters were applied. 14 For the Generalized Regression neural networks GRNN, the genetic adaptive algorithm was used for smoothing parameter selection. Most default parameters 11 BUSTAX is not included in Gerald and Hussar’s most recent 1997 models, but was included in earlier iterations 1996. We have chosen to include this variable for two reasons: 1 our re-estimates of the NCES models, the effects of includ- ing this variable on forecast accuracy are negligible; and 2 we wished to include this variable in the training procedures of the neural network algorithms in order to allow the neural network to determine, for itself, the value of the additional predictor. Therefore, inclusion in the conventional model was for the pri- mary purpose of retaining comparability. 12 For more detail on the RATS AR1 procedure see RATS user’s manual Doan, 1996, pp. 5–6, 14–6. 13 The particular importance of the output–input connection is discussed in WSG 1995, pp. 106–107. 14 These include a stop criterion of 20,000 iterations without improvement of test set prediction and saving as the trained network, the network that best predicts the test set. Logistic activation functions middle layer were also used along with the Neuroshell 2 modified “Vanilla” algorithm for updating weights learning rate 5 0.1 and default momentum 0.1. For more detail see WSG 1995, pp. 118–125. 409 B.D. Baker, C.E. Richards Economics of Education Review 18 1999 405–415 were applied. 15 For Group Method of Data Handling GMDH, default parameters were also applied. 16 For both the recurrent backpropagation models and Generalized Regression neural networks, the training sets consisted of data from 1960 to 1980 and test sets consisted of data from 1981 through 1990. 17 Alternate models were trained with log ln transformed data. Pre- dictors and lags were the same as those specified for the linear regression equation. Only the main equation for the prediction of CUREXP was modeled with the neural networks. Forecasts were generated for the period from 1991 through 1995 by providing the trained neural net- works with the same set of forecast predictors univariate results used for the linear regression model including regression predictions of SGRNT from the secondary multiple linear regression equation.

4. Results and discussion