Multicollinearity, Normality and Heteroskedas- ticity

23.5 Multicollinearity, Normality and Heteroskedas- ticity

The first set of diagnostics provided in the regression output window consists of three traditional measures: the multicollinearity condition number, a test for non-normality (Jarque-Bera), and three diagnostics for heteroskedastic- ity (Breusch-Pagan, Koenker-Bassett, and White). 2

The results for the linear and quadratic trend surface model are listed

1 This is because OLS residuals are already correlated by construction and the permu- tation approach ignores this fact. 2

For methodological details on these diagnostics, see most intermediate Econometrics texts.

Figure 23.20: Regression diagnostics – linear trend surface model.

Figure 23.21: Regression diagnostics – quadratic trend surface model. in, respectively, Figure 23.20 and Figure 23.21. First, consider the mul-

ticollinearity condition number. This is not a test statistic per se, but a diagnostic to suggest problems with the stability of the regression results due to multicollinearity (the explanatory variables are too correlated and provide insufficient separate information). Typically, an indicator over 30 is suggestive of problems. In trend surface regressions, this is very com- mon, since the explanatory variables are simply powers and cross products of each other. In our example, the linear model has a value of 90.8, but the ticollinearity condition number. This is not a test statistic per se, but a diagnostic to suggest problems with the stability of the regression results due to multicollinearity (the explanatory variables are too correlated and provide insufficient separate information). Typically, an indicator over 30 is suggestive of problems. In trend surface regressions, this is very com- mon, since the explanatory variables are simply powers and cross products of each other. In our example, the linear model has a value of 90.8, but the

statistic with 2 degrees of freedom. In both cases, there is strong suggestion of non-normality of the errors. In and of itself, this may not be too serious

a problem, since many properties in regression analysis hold asymptotically even without assuming normality. However, for finite sample (or exact) inference, normality is essential and the current models clearly violate that assumption.

The next three diagnostics are common statistics to detect heteroskedas- ticity, i.e., a non-constant error variance. Both the Breusch-Pagan and Koenker-Bassett tests are implemented as tests on random coefficients, which

assumes a specific functional form for the heteroskedasticity. 3 The Koenker- Bassett test is essentially the same as the Breusch-Pagan test, except that

the residuals are studentized, i.e., they are made robust to non-normality. Both test statistics indicate serious problems with heteroskedasticity in each of the trend surface specifications.

The White test is a so-called specification-robust test for heteroskedastic- ity, in that it does not assume a specific functional form for the heteroskedas- ticity. Instead, it approximates a large range of possibilities by all square powers and cross-products of the explanatory variables in the model. In some instances, this creates a problem when a cross-product is already in- cluded as an interaction term in the model. This is the case for the quadratic trend surface, which already included the squares and cross-product of the x and y variables. In such an instance, there is perfect multicollinearity. Currently, GeoDa is not able to correct for this and reports N/A instead, as in Figure 23.21. In the linear trend model, the White statistics is 35.4, which supports the evidence of heteroskedasticity provided by the other two tests. This result does not necessarily always hold, since it may be that the random coefficient assumption implemented in the Breusch-Pagan and Koenker-Bassett tests is not appropriate. In such an instance, the White test may be significant, but the other two may not be. It is important to keep in mind that the White test is against a more general form of het- eroskedasticity.

3 Specifically, the heteroskedasticity is a function of the squares of the explanatory variables. This is the form implemented in GeoDa. In some other econometric software,

the x-values themselves are used, and not the squares, which may give slightly different results.

Figure 23.22: Spatial autocorrelation diagnostics – linear trend surface model.

Figure 23.23: Spatial autocorrelation diagnostics – quadratic trend surface model.