Literature Review Estimating Goldfelds Conventional Money Demand Function Using Ridge Regression.

I. Introduction

Problem of multicollinearity arising from the nature of economic models and data often result in highly unstable and unrealistic estimates of structural parameters, especially if one is restricted to unbiased estimators such as OLS. If this problem is resolved by deleting important variables from the models, one risks a serious omitted-variables-specification bias Brown and Beattie, 1975. Ridge regression has been known and developed as an alternative prediction method to ordinary least squares regression OLS in cases where there is a high degree of multicollinearity among the predictor variables. If the mean square error MSE criterion is used as a measure of accuracy, there always exists a more accurate “ridge regression” estimate than the unbiased OLS estimate, as shown by Hoerl and Kennard 1970a, pp.61-63. Estimating Goldfeld’s conventional money demand function is an area where there is a high degree of overlap among the predictor variables and should thus be an excellent area to apply ridge regression instead of the usual OLS regression used Tracey and Sedlacek, 1983. It has long been recognized that multicollinearity does not hinder and sometimes aids in forecasting as long as the multicollinearity remains in the forecast period. However, in a demand for money equation with several interest rate, a changing term structure may alter the multicollinearity pattern. In such circumstances, ridge estimates which are less affected by multicollinearity may provide better forecasts than OLS estimates Watson and White, 1976. Obtaining valid prediction equations in this area is often difficult because the high degree of multicollinearity tends to create very different prediction equations from year to year. Thus the process of validating these equations typically involves collecting very large samples over many years. Ridge regression has been found to be most effective in exactly these cases. Ridge regression should result in more stable equations with high multicollinear data, and thus, should be more valid using smaller samples than typically required by least squares method. The purpose of this study was to examine the efficacy of ridge regression over ordinary least squares regression in Goldfeld’s conventional money demand function. This study has two aims : 1. To propose a test of the ability of the ridge regression to forecast. 2 To show that the coefficients of the ridge regression statistically are not different as the OLS coefficients.

II. Literature Review

Some research about the use of Ridge regression had been conducted over the years. According to Darlington 1978 and Faden 1978, Ridge regression was developed to be used in exactly these situations of high multi collinearity. Overall, the OLS and Ridge Regression procedures are identical except that in ridge regression, a small constant is added to the main diagonal of the variance-covariance matrix prior to the determination of the regression equation. This adding of the constant creates a ridge on the main diagonal, hence the name. Adding this ridge is an artificial means of decreasing the relative amount of collinearity in the data. The determination of the specific constant =delta that is added to the matrix is determined by using an iterative approach; selecting the delta value that results in the lowest total mean square error for the prediction equation. Consider the normal linear regression model : Y = X β + ε ………………….1 Where y is n x1 vector of dependent variable, X is an n x p matrix of explanatory variables and ε is an n x 1 vector of disturbances, distributed as ε ∼ N0, 2 I n . The p x 1 parameter vector is assumed unknown and to be estimated by the data, y and X. It is well known that the ordinary least squares OLS estimator for is given by : b = X’X -1 X’y ………………….2 Under the model assumptions, the OLS estimator is the best linear unbiased estimator by the Gauss- Markov Theorem. However, comparing it with non linear or biased estimators, the OLS estimator may perform worse in particular situations. One of these is the case of near multicollinearity, where the matrix X’X is nearly singular. In that situation, the variance of b, given by 2 X’X -1 , can be very large. A biased estimator with less dispersion may in that case be more efficient in terms of the mean squared error criterion. This is the basic idea of ridge and shrinkage estimators, which were introduced by Hoerl and Kennard 1970 for the above regression model. Grob 2003 gives an excellent surveys of alternatives to least squares estimation such as ridge estimators and shrinkage estimators. He gives a possible justification that the addition of matrix kI p where k is scalar to X’X yields a more stable matrix X’X to kI p and that the ridge estimator of , β = X’X + kI p -1 X’y …………………..3 Var-Cov β = 2 X’X + k λ -1 X’X X’X+k λ -1 Where λ is a diagonal matrix of order p consisting of the sums of squares. The ridge estimator of β should have a smaller dispersion or variance than the OLS estimator. To discuss the properties of the ridge estimator, one usually transforms the above linear regression model to a canonical form where the X’X matrix is diagonal. Let P denote the orthogonal matrix whose columns are the eigen vectors of X’X and let be the diagonal matrix containing the eigenvalues. Consider the Spectral decomposition, X’X = P P’, and define α = P’β, X=XP, and c =X’y. then model 1 can be written as : y = X α + ε and the OLS estimator of α as : α =X’X -1 X’y = P’X’XP -1 c = -1 .c In scalar notion we have : i i i c α λ = , i = 1,2,…,P ……………………4 It easily follows from 3 that the principle of ridge regression is to add a constant k to denominator of 4 to obtain : r i i i c k α λ = + …………………….5 Ridge regression will yield lower mean square error MSE than OLS because the bias squared of β would increase with the square of the true value. Because the level of k has a marked influenced on the MSE of the estimated parameter, selection of k is an important considerations. Although there always exists a k0 such that a smaller SME can be obtained from ridge regression that from ordinary least squares, the best method for selecting a particular value of k is not obvious. Hoerl and Kennard seem to place the most reliance on the ridge trace. They claimed that “best on experience, the best method for achieving a better estimate of β is to use k i = k for all i and use the Ridge Trace to select a single value of k and a unique β . Hoerl and Kennard than indicate several considerations that can be used as a guide to a choice of a particular k value : stability of a system as k is increased, reasonable absolute values and signs of the estimated coefficients, and a reasonable variance of regression as indicated by the residual sum of squares. Several researchers had applied the ridge regression to some economic models. Brown and Beattie 1975 improved the estimation of the parameters in Cobb Douglas function by use of ridge regression. They estimated Ruttan’s Cobb-Douglas Function to measure the effect of irrigation on the output of irrigated Cropland in 1956. Overall, their findings indicate that the ridge estimation appears to have promise for estimating Cobb-Douglas Production Function p.31. Watson and White 1976 applied ridge regression in forecasting demand for money under changing term structure of interest rate in USA. The results indicated that ridge regression can be a better predictor than OLS in the presence of multicollinearity. Ridge regression is useful not only in estimating the term structure of interest rate but in many applications in larger scale econometrics models. Sedlacek and Brooks 1976 postulated non-cognitive variables that are predictive of minority student academic success. Tracey and Sedlacek in press developed a brief questionnaire, the Non- Cognitive Questionnaire NCQ, to assess these variables and found eight non-cognitive factors to be highly predictive of grades and enrollment status for both whites and blacks above and beyond using SAT scores alone. But, it was also found that these variables shared a high degree of variance with the SAT scores, so there was a fairly high degree of multicollinearity. It was felt that these ten variables SATV, SATM, and the eight non-cognitive factors would be an ideal application of ridge regression. Tracey, Sedlacek, and Miars 1983 compared ridge regression and least squares regression in predicting freshman year cumulative grade point average GPA based on SAT scores and high school GPA. They found that ridge regression resulted in cross validated correlations similar to those found by using ordinary least squares regression. The failure of ridge regression to yield less shrinkage over OLS regression was postulated to have been due to a relatively low ratio of the number of predictors p to sample size n used in the study. Faden 1978 found that the key dimension where ridge regression proved superior to OLS regression was where the pn ratio was high.

III. Design of The Experiment