US real GNP, 1875–1993 Directory UMM :Data Elmu:jurnal:B:Biosystems:Vol53.Issue1-3.1999:

is the sum of an integrated component with drift and an occasional large transitory compo- nent will generate data in finite samples that is difficult to distinguish from a trend stationary process with a trend break. They report that purely transitory disturbances can cause rejection of the unit root null hypothesis and spurious results indicating that there has been a permanent shift in the level. Third, it is also the case that outliers Franses and Haldrup, 1994 or structural breaks Leybourne, Mills and Newbold, 1998 can induce spurious rejections by Dickey-Fuller tests. This issue was raised by Nelson and Murray 2000 in discussing tests for unit roots in RGNP. Lengthening the span of data increases the possibility of major structural changes. Failure to account for such structural changes will bias unit root tests. For example, Leybourne, Mills and Newbold 1998 found results converse to those of Perron 1989. They show that when a series is generated by a process that is I1, with an early but not late break, application of the Dickey-Fuller test will lead to spurious rejection of the unit root null hypothesis. These results suggest that the application of the Dickey-Fuller tests will yield misleading results when applied to data that have a structural break early in the series. One important implication of these findings is that studies that justify using longer spans of data on the grounds of increasing the power of the tests and more precise estimates, can lead to erroneous findings if the early data contains a structural break. In our view, statistical tests of unit roots will almost inevitably be based explicitly or implicitly on models that are drastic simplifications of the underlying generating process. It is simply impossible, given available data, to simultaneously contemplate models that are more highly parameterized than the typical parsimonious forms, including possible moving average terms, and allow for the possibility of both outliers and structural breaks of unknown form and type, occurring at unknown points in time. Whether the simplified structures on which the tests are necessarily based are helpful or misleading is difficult to anticipate in any application.

3. US real GNP, 1875–1993

In the remainder of this paper we present statistical evidence against the TS hypothesis, while accepting that this does not imply DS as a likely alternative. To allow comparison with earlier studies, we employ the same series as Diebold and Senhadji 1996 in our empirical analyses. These authors used four series, GNP-BG, GNP-R, GNP-BGPC, GNP-RPC based on whether measures from Balke and Gordon 1989 or Romer 1989 were used or whether RGNP was expressed in per capita form. In all cases the logarithmic transformation was applied. Diebold and Senhadji 1996 showed that Dickey-Fuller tests applied to the four RGNP series over the whole period 1875–1993 yielded very strong rejections of the null hypothesis of difference-stationarity. As mentioned above, Schwert 1989 documented severe size distortions of the Dickey-Fuller test for ARIMA0,1,1 processes with large MA coeffi- cients. 8 Could the rejection of null of DS for RGNP by Diebold and Senhadji 1996 and others, have been spuriously generated by a substantial moving average component? Despite the fact that RGNP most likely does not have a large MA root, we explore the possibility, 88 P. Newbold et al. Journal of Economics and Business 53 2001 85–102 by considering autoregressive-moving average, ARMAp,q models as possible generators of the series of first differences. If y t denotes the logarithm of RGNP, these models are of the form ~1 2 f 1 L 2 . . . 2 f p L p ~Dy t 2 b 5 ~ 1 2 u 1 L 2 . . . 2 u q L q e t 1 where L is the lag operator and e t zero-mean white noise. To ensure replicability of our results, the orders p,q were chosen through the Schwarz Bayesian Criterion SBC, rather than judgmental methods. This criterion is known to yield consistent estimators of p,q when the true model is in the contemplated set Hannan, 1982. The parameters of 1 were estimated through full maximum likelihood, using a GAUSS subroutine, and all combina- tions of orders with p 1 q 7 were estimated. For all four time series, SBC selected a model whose estimated parameters implied a unit moving average root, indicating overdifferencing in the model 1. 9 These results reinforce the Diebold-Senhadji evidence against difference-stationarity, and moreover do so with no prior assumption of a pure autoregressive generating model. However, our evidence is not entirely conclusive, as it is well known [Cryer and Ledolter 1981, Shephard and Harvey 1990, and Davis and Dunsmuir 1996] that maximum likelihood estimates can fall on the boundary of the invertibility region in the absence of a unit moving average root. Nevertheless, it is difficult to see how an investigator can proceed with a DS model of the form 1 in these circumstances. If that investigator is committed to linear models with fixed parameters, the next step would be to consider stationarity around a linear trend. We applied the test of Leybourne and McCabe 1994, 1996, in which the null hypothesis is TS and the alternative is DS. In no case was TS rejected at the usual significance levels [see Cheung and Chinn 1997, for a similar finding over almost the same period]. The apparently strong evidence of TS over a period of 119 years is somewhat perplexing, as many analyses using post-World War II quarterly RGNP have supported a DS model. Moreover, Dickey-Fuller tests applied to the annual series over the period 1950 –1993 failed to yield strong evidence against DS. 10 The test statistics are the t-ratios associated with the least squares estimate of g in the regression Dy t 5 a 1 bt 1 gy t2 1 1 O j5 1 k d j Dy t2j 1 e t 2 In line with the recommendation of Ng and Perron 1995, we chose k in 2 through general-to-specific testing at the 10-level, with a maximum possible value of 5. The resulting test statistics are 21.993 for RGNP and 22.516 for RGNP per capita. Neither is significant at the 10-level. Nelson and Murray 1997 present further tests which, on balance, failed to provide strong evidence of trend-stationarity in the postwar period. One possible reaction to these results is that tests based on the subperiod will have considerably less power than tests applied to the full sample. That is, a longer span of data may allow one to detect trend reversion more readily. 11 A second possibility is that there could have been a structural change over time in the process generating the data over the 89 P. Newbold et al. Journal of Economics and Business 53 2001 85–102 longer period. 12 The true DGP may have changed between the postwar and prewar periods. A third possibility is that the existence of structural breaks leads to spurious results. To explore these possibilities, and assist in sorting out some of the complexities mentioned above when attempting to test DS against TS specifications, we develop a new diagnostic methodology. First, we applied the Dickey-Fuller test lag 5 5, implemented precisely as described in the previous paragraph, to all blocks of 44 consecutive observations, beginning with 1875–1918, and ending with 1950 –1993, giving 76 Dickey-Fuller statistics. The results for the four RGNP series are graphed in Fig. 1. Two notable features of these graphs are: i two periods of extreme volatility and ii a very wide range for the test statistics. Intuition would suggest that patterns of this kind are very unlikely to be found for trend stationary generating processes. Trend stationary processes would yield much larger and stable Dickey- Fuller values. Indeed, thoughtful graphical inspection can often alert one to at least the most extreme problems. To check this intuition, we carried out a simulation experiment, in which series of 119 observations were generated from the second order autoregressive TS models given in Table 1 of Diebold and Senhadji 1996. 13 Moving Dickey-Fuller statistics, based on 44 observa- tions, precisely as in Fig. 1, were calculated. For each of the 2000 replications, the standard deviations and ranges of the 76 Dickey-Fuller statistics were compared with the correspond- ing values from the actual time series. In the terminology of Tsay 1992, this can be viewed as a parametric bootstrap specification test of the TS model. The p-values for these tests are shown in Table 1. 14 For example, in only 1.6 of replications was the standard deviation of the moving Dickey-Fuller statistics greater than that for the actual GNP-R data. Put differ- ently, 98.4 of the 2000 replicated standard deviations of the 76 Dickey-Fuller statistics had values less than those of Fig. 1. Stated another way, if the actual data were TS then one Fig. 1. Moving Dickey–Fuller statistics. 90 P. Newbold et al. Journal of Economics and Business 53 2001 85–102 would obtain the actual moving Dickey-Fuller statistic pattern or one more extreme only 1.6 of the time for GNP-R. This means that one would reject the null of TS at the 1.6 level with these large values. The moving Dickey-Fuller parametric bootstrap tests which we offer here, based on standard deviations, suggest that the adequacy of the TS specifications can be rejected at significance levels between 1.6 for GNP-R and 3.5 for GNP-BGPC, while the tests based on ranges generate rejections at levels of practically zero. It is for this reason that, though we concur with the conclusion of Diebold and Senhadji 1996 and others, that DS over the whole 119-year period is unlikely, we are even more skeptical about the hypothesis of TS over this period. Our test procedure indicates that it is very unlikely that the Dickey-Fuller statistics of Fig. 1 could have come from a TS AR2 model. The implausibility of Fig. 1 under TS can be seen graphically by comparison with Fig. 2, generated from a typical series simulated from the Diebold-Senhadji TS model for GNP-R. We chose a realization giving a standard deviation at the median over all replications. The contrast with Fig. 1 is quite stark. Table 1 p-values of parametric bootstrap tests of trend-stationary models, based on moving Dickey-Fuller tests GBP-R GNP-RPC GNP-BG GNP-BGPC Std. deviation 0.016 0.028 0.025 0.035 Range 0.000 0.000 0.004 0.000 Fig. 2. Simulated moving Dickey–Fuller statistics. 91 P. Newbold et al. Journal of Economics and Business 53 2001 85–102 This bootstrap method is not without limitations. One might argue that in order to test for power we should also reverse the null to be DS and redo the test. But issues of power are more of a concern when one is not able to reject the null of TS when DS is true. In the case where b i.e. the likelihood of not rejecting TS, given that DS is true is large, the power of the test will be low. 15 Reversing things and generating moving Dickey-Fuller tests from a DS series, would yield p-values that are very large. That is, the likelihood of getting the observed actual series would be very high in this case. This means that the likelihood of not rejecting TS, given that DS is true b, would be small. With a small value of b the power of the test would be high. 16 Regardless of the above, since our objective is to investigate the degree of evidence against TS, repeating our simulation, replacing the current null of TS with DS, would yield little as Diebold and Senhadji 1996 have already shown there is evidence against the DS model. Although the earlier Dickey-Fuller tests and ARMA models clearly cast doubt on the DS specification over the 119 years of data, from the moving Dickey-Fuller tests it appears equally unlikely that the series over 119 years was generated from a TS model. 17 The progression over time of the graphs of Fig. 1 is interesting and suggestive. As observations from the early 1930s enter towards the end of a block of data, we find Dickey-Fuller statistics close to zero, suggesting, on the surface, virtually no evidence against DS. If the true process was TS, we could view this as reflecting the finding of Perron 1989 of very low power of Dickey-Fuller tests in the presence of a structural break. By contrast, when this possible break is early in the block, the Dickey-Fuller statistics are very far from zero. Leybourne, Mills and Newbold 1998 have shown that, if the true generating process is DS with an early but not late break, spurious rejections by Dickey-Fuller tests will frequently occur. Our own view is that the series of RGNP over the whole period 1875–1993 are neither trend-stationary nor difference-stationary. Indeed, a casual inspection of a graph of the data would make this point transparent see Figs. 5 and 6, discussed in the next section. The aberrant behavior of the time series over the period 1930 –1949 is a strong factor in arriving at this conclusion. The behavior of the series over a period of approximately twenty years, from 1930 –1949, is quite different from anything previously or subsequently. It is simply unreasonable to believe that the same generating regime operated over this period as elsewhere. As we have noted, it is precisely this period that generates the peculiar graphs of Fig. 1. One possibility might be to attempt to model the entire 119-year series making special allowance for the 1930 –1949 period. This might be achieved through allowing for outliers, structural breaks, increased volatility, and so on. The possibilities are virtually endless, and given that the offending period covers just twenty observations a thorough analysis would quickly exhaust available degrees of freedom. Moreover, consideration of the history of the period, with depression ushered in by the Great Crash, the ensuing New Deal, the onset of World War II, and the following recovery period, suggests no simple structure would be adequate. An extensive search might well reveal a model that reasonably fits this period, but to adopt such a model would be unscientific, as there is no comparable subsequent period over which to verify it. Our preference, followed in the next section, is to ignore the years 1930 –1949 in assessing trend-stationarity. 92 P. Newbold et al. Journal of Economics and Business 53 2001 85–102

4. US real GNP, 1875–1929 and 1950 –1993