Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500106000000279

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

A More Timely and Useful Index of Leading
Indicators
Robert H McGuckin, Ataman Ozyildirim & Victor Zarnowitz
To cite this article: Robert H McGuckin, Ataman Ozyildirim & Victor Zarnowitz (2007) A More
Timely and Useful Index of Leading Indicators, Journal of Business & Economic Statistics, 25:1,
110-120, DOI: 10.1198/073500106000000279
To link to this article: http://dx.doi.org/10.1198/073500106000000279

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 91

View related articles

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 12 January 2016, At: 22:59

A More Timely and Useful Index of
Leading Indicators
Robert H. M C G UCKIN, Ataman O ZYILDIRIM, and Victor Z ARNOWITZ
The Conference Board Economics Program, New York, NY 10022 (a.ozyildirim@conference-board.org )
Effectively predicting cyclical movements in the economy is a major challenge. The U.S. leading index (LI) has long been used to analyze and predict economic fluctuations. We describe and test a new
procedure for making the LI more timely. The new LI significantly outperforms its older counterpart. It
offers substantial gains in real-time, out-of-sample forecasts of changes in aggregate economic activity
(real GDP, the index of coincident indicators, and industrial production) and provides timely and accurate
ex ante information for predicting not only business cycle turning points, but also monthly changes in the
economy.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

KEY WORDS: Business cycle; Forecasting; Indicators; Leading index; Times series.


1. INTRODUCTION
Most of the macroeconomic data for the United States require considerable time to collect, process, and release. Lags
of 1 month are common for principal monthly indicators. For
the quarterly data, the lags are longer. Moreover, many of these
indicators are subject to sizeable and time-consuming revisions
that presumably reduce measurement errors but add to uncertainty and forecasting errors.
The actual recognition lags are considerably longer than
these publication lags, because it is difficult to recognize signals in noisy data. As a result, private and public decision and
policy makers are faced with long delays and much uncertainty
about the current (let alone the future) conditions of the economy. Although government is the main source of macroeconomic statistics, its access to the data is not much more timely
than that of the public.
At the same time, the publication lags and revision schedules vary greatly, and some indicators are available promptly.
The U.S. leading index (LI) includes stock prices and interest rate spreads that have no significant data lags and relatively
few, if any, revisions. The financial market indicators convey
a great deal of information with predictive value, yet until recently, these indicators were represented in the LI, not by their
most recent monthly values, but by their values in the preceding
month for which data for other indicators were also available.
This article evaluates a new procedure for calculating the
U.S. LI that combines its seven current financial and nonfinancial components with simple forecasts of the three remaining components that are available only with publication lags.
Real-time, out-of-sample tests demonstrate that the new, much

timelier LI is superior to the old LI with the same components.
Moreover, in contrast to some recent studies, here we show that
the LI provides significant ex ante information for predicting
changes in total output and other measures of economic activity. Some nonparametric rules and probability model applications of the composite LI performed well in forecasting recent
cyclical movements even using real-time data, as reported by
Filardo (2004). Other studies question the value of the LI and
emphasize selected financial indicators as better predictors of
business cycle turning points (Diebold and Rudebusch 1991;
Estrella and Mishkin 1998).
Section 2 describes the traditional method of constructing an
LI, sets out its rationale, and critiques it. It then discusses the

availability and currency of the components of the old index
and provides an outline of the procedures for making the new
more timely and useful LI.
Section 3 analyzes the structure of the underlying data and
the role of data revisions. It compares the historical LI calculated with the latest available data with successive vintages of
the old and new LIs. Using real-time data for each of their components and holding their composition constant, we make the
new and old LIs consistent and comparable, so that their relative predictive capabilities can be strictly assessed. However,
the new LI typically involves only 1-month ahead estimates that

closely approximate the missing data, which the old LI included
in the next month; hence the main difference between the old
and the new LIs is the better timeliness of the latter.
Section 4 introduces the current conditions index (CCI),
which the LI is designed to predict. It asks how well the LI predicts the historical and the real time estimates for CCI and real
gross domestic product (RGDP). The former are values taken
after all revisions of the data, whereas the latter are the published values on the date to which the original forecast applied.
To establish benchmarks for evaluating the value of the timelier LI, we specify a broad class of linear models to test the LI’s
performance in forecasts of growth rates of CCI and RGDP.
Using growth rates of the CCI and RGDP is in the spirit of the
indicator approach and, as a practical matter, is essential because there is very little difference in the ability of the new and
old LIs to distinguish turning points. We find that both the old
LI and the new LI provide improved real-time, out-of-sample
forecasts compared with an autoregressive benchmark model.
After evaluating the predictive accuracy of the old and new LIs,
we modify the model to assess the gains from the more timely
LI. Concentrating on the issue of timeliness, we find significant
gains to the new procedure.
We conclude in Section 5 by drawing together our findings
and placing them in the context of some other related findings in

the literature. Because of the large scope of our empirical work
and the need to economize on journal space, most of the details
of the analysis and measurement is relegated to an appendix,
available on The Conference Board website (www.conferenceboard.org/economics/workingpapers.cfm).

110

© 2007 American Statistical Association
Journal of Business & Economic Statistics
January 2007, Vol. 25, No. 1
DOI 10.1198/073500106000000279

McGuckin, Ozyildirim, and Zarnowitz: Index of Leading Indicators

2. CONSTRUCTION OF THE COMPOSITE
LEADING INDEX

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

2.1 Logic and Consequences of the Traditional Method

The indicators used to construct the LI tend to move ahead
of the business cycle as represented by the monthly coincident
indicators and the quarterly RGDP; for example, businesses
adjust hours before changing employment by hiring or firing,
and new orders for machinery and equipment are placed before
completing investment plans. Thus, by design, the composite
index of leading indicators should help predict changes in real
economic activity.
Business cycles vary in duration, magnitude, and consequences largely because their specific sources differ over time.
The multicausal, multifactor nature of the economic movements helps explain why the LI works better over time than
does any of its individual components: the average workweek,
initial claims for unemployment insurance, new investment
commitments (orders, contracts, housing permits), real money
supply, yield spread, stock prices, and consumer expectations.
The leading series themselves vary in terms of timing, smoothness, currency, and other features. The index gains from this diversification. However, many technical problems arise from this
diversity, perhaps none more vexing than those stemming from
the fact that some indicators are available promptly, whereas
others are available only with substantial lags.
The indicator approach, one of several techniques of business cycle analysis, has been a major component of the National Bureau of Economic Research (NBER) program since
the work of Burns and Mitchell (1946). The NBER, the Bureau

of Economic Analysis (BEA) of the U.S. Department of Commerce, and, until recently, The Conference Board successively
used the traditional method. It incorporated two rules. First, all
components of the LI refer to the same month. Second, only
actual data—no forecasts—are used. This long-followed procedure had its logic. The dataset used was time-consistent, because it covered the same period, as is usual in LI construction.
In addition, because it consisted of actual data, the LI avoided
errors inevitably associated with forecasting.
However, this methodology suffered from failing to use the
latest available financial and other data with presumably more
relevance and more predictive value than the data actually used,
which were 1 month older. Although forecasting the missing
variables introduces error, it also improves the timeliness of the
composite index.
The old procedure had no good way to cope with the serious
problem of missing data. The prevailing practice was to calculate the LI with a partial set of components, for example,
a minimum of 40–60% depending on the country according to the Organization for Economic Cooperation and Development (OECD) (see Nilsson 1987; also see http://www.
oecd.org/std/li1.htm). An equally arbitrary rule of at least 50%
of components was used in the United States until recently. Although any such rule allowed the LI to be more up to date,
all raised serious problems. First, there was a very undesirable
trade-off between the coverage and timing of the LI: the more
complete the coverage, the less timely the LI. Second, without

a full set of components, the volatility adjustment (standardiza-

111

tion) factors used to calculate the contributions of the components often changed dramatically depending on which series,
and how many of them, were missing.
The only effective way to avoid these problems while adhering to the rules of the traditional method was to delay issuing
the LI until preliminary data became available for all of its components. For the monthly indicators used in the most recent version of the traditional LI, this meant a production lag of almost
2 months.
2.2 The Gain in Timeliness From the New Method
In the old procedure, the LI released during the current
month (t) referred to the month (t − 2). In the new procedure,
implemented by The Conference Board since January 2001, the
LI released in the same month (t) refers to the month (t − 1).
For example, the old LI would be calculated in the first week of
March (t) for January (t − 2), the month with a complete set of
components, whereas the new LI is calculated in the third week
of March for February (t − 1), the month for which 70% of the
components are available and 30% are forecast. Before the introduction of the new method, users of the indicator approach
had to wait for 2 more weeks until April for the February LI.

This is a major gain in timeliness in a world where business and
government analysts revise and update their predictions nearly
every week.
Let Y be the vector of indicator series with data lags such that
they are not available in the current publication period. Variables in Y are generally data on real macroeconomic activity
and price indexes. Specifically, for the present U.S. LI, these include new orders for consumer goods and materials, new orders
for nondefense capital goods, and real money supply. (Nominal
money supply is available, but the personal consumption expenditure deflator used to adjust it is not.)
Let X be the vector of the indicator series available for the
most recent complete month. These include the promptest financial indicators such as stock prices, bond prices, interest
rates, and yield spreads. They also include many other, less
prompt but frequently reported series. Seven of the 10 components of the U.S. LI fall into this category.
A simple formalization of the old and new indexing procedures may be given as
LI old
t,t−2 = I(Xt−2 , Yt−2 )

(1)

ˆ new
LI

t,t−1 = I(Xt−1 , Ŷt−1 ).

(2)

and

Here I(·) denotes the indexing procedure used; LI t,t−i = I(·)
represents the value of the LI for the month t − i published in
the month t, where i denotes the publication lag. The first subscript in the LI gives the month of release; the second subscript,
the month of the target of the forecast or the reference data.
The symbol ˆ refers to a magnitude based at least in part on
some kind of forecasting. Herein we take I(·) to be fixed and
identical to The Conference Board’s indexing procedure. The
Conference Board follows long-standing practice of using standardization factors to equalize the volatility of the LI components so that relatively more volatile series do not exert undue
influence on the LI (for details of indexing, see The Conference
Board 2001.)

112

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016


2.3 The Costs and Benefits of the New Procedure
The old LI performed its forecasting function with errors due
largely to missing data and other measurement problems. In the
new LI, some of these problems are reduced, but new errors are
introduced by the forecasts of Ŷt−1 .
Using Xt−1 instead of Xt−2 results in a substantial advantage
of greater timeliness, because the new LI is available more than
half a month earlier than the corresponding old LI. Thus, for the
new procedure to be preferred, it is only necessary that the erˆ new not be so large as to negate that advantage. In other
rors of LI
ˆ new must do a better job of forecasting the economy
words, LI
old
than LI . Although not assured, this is likely for the following
reasons. First, the Ŷt−1 forecasts are typically short, and hence
they should produce relatively small errors. Second, the individual errors of the components of the vector Ŷt−1 may offset one
another when combined to form the composite index. Third, for
any forecast horizon, the new LI will be 2 weeks closer to it.
The Ŷt−1 forecasts for the United States are restricted to
1-month-ahead forecasts, but for other countries, multistep
forecasts for some Y variables are necessary. In most foreign
countries there are fewer weekly and monthly and more quarterly and annual series, and for this and other reasons the lags
tend to be longer, up to 3–5 months. Hence, the need for the new
procedure is even greater outside the United States than for the
United States, although the potential errors from forecasting the
Y variables are greater as well.
Of course, there are numerous ways to forecast Yt . However,
the advantages of simplicity, stability, and low cost argue for
concentrating on easily implemented autoregressive models. Of
these, the simple second-order model excelled and was adopted
after passing some fairly strict tests. For practical reasons associated with production of the LIs on a monthly basis, frequent
changes in the forecast model are avoided, and The Conference
Board uses the same model for fixed periods of at least a year or
two, but reestimates it every month (see McGuckin, Ozyildirim,
and Zarnowitz 2001).
3. COMPARING THE LEADING INDEXES AND THEIR
PREDICTIVE TARGETS
3.1 Calculating the New and Old Indexes
in a Consistent Manner
To compare the new and old LIs on a consistent basis, we
calculated the LIs using real-time data for each of the 10 series
that compose the current LI published monthly by The Conference Board. This puts the new and old LIs on strictly equal
footing, eliminating all changes in composition or methodology
(e.g., base years, standardization factors) and hence all possible
discontinuities or differences due to these factors.
The important issue of the consequences of changes in the
composition of the LI has been treated in a separate article.
McGuckin and Ozyildirim (2004) found that revisions to the
components of the LI are typically undertaken because of data
problems (e.g., discontinued series or quality deterioration)
and to account for changes in economic structure. These revisions generally resolve the problems with the data or structural
changes. However, our present task of evaluating the new LI

Journal of Business & Economic Statistics, January 2007

relative to the old, less timely LI calls for tests of composite indexes with identical, constant composition; the past changes in
index composition are not material to this task.
The real-time data used in this study were first electronically
archived in 1989 by the former Statistical Indicators Division
of the U.S. Department of Commerce and, since 1995, by The
Conference Board. The data available in January 1989, called
the “January 1989 vintage,” consist of a monthly sample covering the period January 1959–November 1988. Each consecutive
monthly vintage adds the next month’s observation. The dataset
used in this article covers vintages from January 1989 through
September 2002. Hence 165 vintages and 165 corresponding
sets of data are used to create an equal number of versions of
the LI, each starting in January 1959.
3.2 Comparing the Old and New Indexes
ˆ new for the period December 1988–August
Figure 1 plots LI
old
2002 and LI for November 1988–July 2002. Each point on
the graph, for both the new LI and the old LI, represents the
end value of one of the 165 vintages of data. In addition, Figure 1(a) shows the “historical LI,” that is, the last vintage that incorporates all revisions to date. Note that the chart provides the
data for the old and new LIs for the same month, even though
the new LI would be published somewhat earlier. Section 4.5
amends the analysis to assess the improvement from timeliness.
By construction, any difference between the old and new LIs
must be due to the effects of data revisions and forecasting the
missing data. But at any time, historical postrevision values account for the great bulk of all observations and are generally
identical in the different vintages. Forecasts for missing data
apply only to the most recent values (end months for each vintage) of three of the components of LI. This explains why the
differences between the vintages tend to be so limited.
The historical index deviates from both the old LI and the
new LI by much more than these LIs deviate from each other,
but mainly in the level, not in the pattern of cyclical change,
which is shared by all three series. This is well illustrated in
Figure 1 for the period since 1989, which covers two recessions
and the long expansion between them. But it applies to the entire period 1959–2002 with its seven business cycles, as shown
in Figure 2. The patterns of cyclical change are virtually identical in the different vintages of the new LI; moreover, even the
short irregular fluctuations appear very similar. The successive
vintages show an upward tilt. For example, the historical index,
which is the last vintage (September 2002), shows the strongest
upward trend. Underestimation of growth and inflation appears
to be a frequent characteristic of preliminary data in long expansions like the 1960s and 1990s (cf. Zarnowitz 1992, chap. 13).
Figure 1(b) shows the percent differences between the
two real-time level series plotted in Figure 1(a). It is evident that these differences are very small most of the time;
ˆ new and LI old practically overlap. Further, the time series of
LI
the differences is essentially random and has little if any bias;
the positive and negative differences balance out each other.
The randomness of the differences between the old LI and
the new LI is a welcome feature, because it means that forecasting the missing components of the index introduces no net
systematic error; so is the fact that the discrepancies are generally small (fractions of 1%). This suggests that any errors

McGuckin, Ozyildirim, and Zarnowitz: Index of Leading Indicators

113
(a)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

(b)

(

Figure 1. Old and New Composite LIs, November 1988–August 2002. (a) Index (1987 = 100) ( , new LI; , old LI;
, ratio, old to new). Mean, .027244; standard deviation, .227264.

caused by the new procedures are likely to be more than offset
by improved timeliness. Nonetheless, the differences in timing
between the two indexes occasionally cause them to differ significantly because of large benchmark revisions in some comˆ new uses prebenchmark
ponents; in a few scattered months, LI
old
data, whereas LI uses postbenchmark data.
Figure 1 also shows that the old and new real-time leading
composites available since 1989 can hardly be distinguished
by their cyclical timing. This, too, is good news, because it
confirms the robustness and validity of our dating procedure.
Because their cyclical timing is essentially identical, the new
and old LIs cannot be distinguished by how well they anticipate the onset of business cycle recessions and recoveries. With
the same composition, they face the same data and produce the
same leads. But the new procedure reduces the delays involved
in publication and results in a timelier index, which should
prove helpful in actual recognition of turning points. Because
differences in forecasting the turning points are not an issue, all
tests are based on how well the new LI versus the old LI does
in forecasting times series that represent total economic activity. These tests are both more general and more demanding than
turning point comparisons; they use a regression framework and
distinguish quantitatively between historical and real-time data.
4. HOW WELL DOES THE LEADING
INDEX PREDICT?
4.1 Data, Testing Standards, and Procedures
The LI is widely used as a tool to forecast changes in the direction of aggregate economic activity, particularly the business

, historical LI). (b) Percent

cycle turning points, whose dates are determined historically
by the NBER. In this task, the NBER relies to a large extent on
the monthly coincident indicators, which together constitute the
CCI: nonfarm establishment employment, real personal income
less transfers, real manufacturing and trade sales, and industrial
production. Hence business cycle peaks and troughs are well
approximated by the dates of peaks and troughs in the CCI (for
details, see Zarnowitz 2001). This is documented in Figure 3,
which also shows that the LI leads the CCI at all business cycle
peaks and troughs, and that the CCI and RGDP, which is the
most comprehensive measure of U.S. output, are very closely
associated.
Because our principal purpose is to compare the accuracy of
monthly forecasts, a true monthly index such as the CCI is a
particularly appropriate target measure of economic activity to
be predicted. The LI was developed to predict the CCI, and the
relationship between the two composite indexes is of major analytical interest. The CCI is composed of several variables (not
just output), all of which cover important economic activities.
The RGDP is subject to long strings of revisions, which are often large; the CCI is revised less, partly because the revisions of
its components frequently offset each other. Researchers often
use the LIs to predict changes in industrial production (IP), particularly in Europe. These forecasts perform well. But because
IP covers a relatively small and declining part of the economy
(manufacturing, mining, and utilities), we report tests based on
IP only in an appendix available on The Conference Board’s
website.
Although these points argue in favor of using the CCI rather
than the RGDP, because of the interest in and importance of

Journal of Business & Economic Statistics, January 2007

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

114

Figure 2. Historical LI (as of September 2002) and Three Selected Vintages of the New LI, January 1959–August 2002.

the RGDP as a comprehensive measure of economic activity,
we undertake tests with both variables. Because no reliable
monthly RGDP data are available (estimates of monthly RGDP
by Macroeconomic Advisers, a private forecasting firm, are
limited to the 1990s and are not widely accepted), we work with
quarterly LI, even though this transformation causes a consid-

erable loss of information. Alternatively, interpolations of quarterly to monthly RGDP to take advantage of the fact that the
leading indicators are monthly can adversely affect the results.
This is because the interpolations arbitrarily smooth RGDP,
which is the series used both as the dependent variable and,
lagged, as one of the explanatory variables. The results estab-

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

McGuckin, Ozyildirim, and Zarnowitz: Index of Leading Indicators

115

Figure 3. U.S. Current Conditions Index, U.S. Leading Index and Real GDP January 1959–August 2002. The shaded areas represent U.S.
busines cycle recessions as dated by the National Bureau of Economic Research. The latest shading relates to the recession of 2001 and is dated
according to the cyclical contraction of the CGI (the U.S. current conditions or coincident index). P denotes the specific-cycle peaks and T the
troughs in the Leading and Current Conditions Indexes. The numbers at the P and T markings denote the leads or lags in months at the business
cycle peaks and troughs respectively.

lish the usefulness of the LI as a forecasting tool for both the
CCI and the RGDP; the LI adds important information to forecasts of growth in either measure of economic activity. It is particularly important to know that the LI improves forecasts of the
CCI, because the procedure for making the LI timelier is best
evaluated with high-frequency monthly data.
In determining whether the LI (old or new) adds useful information to forecasts of basic measures of aggregate economy,
we use the following standard: LI should improve on simple autoregressive forecasts for the monthly and quarterly measures
of aggregate activity such as IP, CCI, and RGDP. We first confirm the well-known and much-tested finding that the historical
LI improves on the autoregression of the target variable. This
has been done in an earlier version of this study (McGuckin et
al. 2001, pp. 15–19) and is not restated here. We then compare
the old and new leading LIs using out-of-sample, real-time tests
that ignore the potential gains from the greater timeliness of the
new LI. This provides a good benchmark for isolating the gains
to timeliness in Section 4.5.
The regressions have the same structure for each vintage and
are used to create a sequence of forecast errors based on differences between the predicted and the historical values of the
corresponding actual growth rates in the target economywide
aggregate. The mean squared error (MSE) is our preferred measure of accuracy. The same procedure is applied in a series
of exercises that vary the forecast horizons (1, 3, or 6 months
ahead), spans over which growth is measured in the estimating
equation (1, 3, 6, or 9 months) the number of lags in the dependent variables (1, 3, 6, or 9) for CCI forecasts. In RGDP forecasts, the horizons vary from 1 to 2 and 3 quarters, the growth

rate spans vary from 1 to 2, 3, and 4 quarters, and the number
of lagged terms varies from 1 to 2, 3, and 4.
The forecast regression models are specified in changes in
natural logarithms for all variables: RGDP, the coincident index, and the LIs. This conforms to the way in which analysts
generally use the recent growth rates in the LI to detect signals
regarding the direction of economic activity. In addition, it also
avoids spuriously high correlations due to common trends that
obtain in the levels of the indexes. As noted by Camacho and
Perez-Quiros (2002, pp. 62–63), the augmented Dickey–Fuller
test cannot reject the null hypothesis of a unit root in the levels
of the LI series, but is consistent with stationarity of log differences of LI.
All of our tests use as predictors exclusively real-time data
that would have been available to the contemporary forecaster
at the time. [Real-time data were obtained from the website
of the Federal Reserve Bank of Philadelphia for RGDP (see
Croushore and Stark 2001) and for the CCI and LI from The
Conference Board.] This has the advantage of reproducing with
reasonable fairness the actual forecasting situation.
For both RGDPt and CCI t , two alternative versions of the
variable are forecast. One version uses the historical (i.e., revised data available at present), and the other uses the realtime data (i.e., the preliminary estimates available soon after
the forecast). The latter approach is more realistic and tends to
produce smaller errors because it absolves the forecaster from
predicting the cumulative effects of future revisions of the data
(which may or may not be predictable). However, using the historical target data follows a rather common practice of requiring
the forecaster to use preliminary estimates to predict data in-

116

Journal of Business & Economic Statistics, January 2007

corporating future revisions in the target variable. Thus the first
approach allows a comparison of our results to those from other
studies that pursue the same strategy (see in particular Diebold
and Rudebusch 1991). The revised data are believed to be closer
to the truth. The use of revised data in lagged values of the dependent variable gives the autoregressive element an advantage
compared with the contribution of the LI term, which is based
on preliminary data. Assessments of the forecasts thus mix forecasting and measurement errors, which makes it more difficult
for the LI to improve the forecast (see also Amato and Swanson 2001; Croushore and Stark 2002; Robertson and Tallman
1998).

No effort was made to optimize the predictive regression
specifications (this belongs in another article). Rather, we tried
to get a sufficiently comprehensive and diverse picture of what
the alternative LIs—historical and real time, old and new—can
contribute, even under relatively unfavorable conditions. This
approach—looking at a broad and symmetric set of models—
was modified in one way; only results for models for which the
span of growth in the variables in the model is greater than or
equal to the forecast horizon are reported. Longer forecasts are
not well served by short growth rates, and using short spans in
the longer forecasts provided unreliable results.
4.3 Out-of-Sample Forecasts of Changes in RGDP

4.2 Testing Models Formalized

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

q

To give the models a formal representation, let j xtm (j xt )
denote the growth rate of CCI (RGDP) over the past j months
(quarters) ending in month (quarter) t. To provide a standard
for evaluating the forecasting power of the LI, a simple autoreq
gressive equation is used as a benchmark in which j xtm (j xt )
q
m
is related to its own lags, j xt−k (j xt−k ), with the number of
lagged terms, k, varying from 1 to 3, 6, and 9 months or from 1
to 2, 3, and 4 quarters.
Consider, for example, the benchmark equation for the
monthly CCI, which takes the form
j CCI t = α1 +

k


β1,i j CCI t−i−p+1 + ε1,t ,

i=1



k, j = 1, 3, 6, 9
p = 1, 4, 7.

(3)

There follow tests of whether adding lags of the old or new LI
to this equation reduces out-of-sample forecast errors. Equation (4) adds lags of the old LI to the benchmark equation (3),
j CCI t = α2 +

k


β2,i j CCI t−i−p+1

i=1

+

k


δ2,i j LI old
t−i−p+1 + ε2,t ,

i=1



k, j = 1, 3, 6, 9
p = 1, 4, 7.
(4)



k, j = 1, 3, 6, 9
p = 1, 4, 7.
(5)

Equation (5) adds lags of the new LI instead,
j CCI t = α3 +

k


β3,i j CCI t−i−p+1

i=1

+

k

i=1

δ3,i j LI new
t−i−p+1 + ε3,t ,

This gives 16 different combinations of the spans of growth
rates ( j) and number of lags (k) for each of the foregoing
three models. We repeat the same exercise for forecasts 3 and 6
months ahead ( p = 4 and 7). This provides us with 48 forecast exercises classified by three factors: the length of forecast
horizon, the number of the lagged explanatory terms, and transformation of the data (span of the growth rates). Using analogous equations in the quarterly frequency, we undertake also 48
forecast exercises for the RGDP.

4.3.1 Models With LI versus Autoregressive Models. Table 1 gives the MSEs averaged across all corresponding equations used to compute the forecasts of the growth rates of
RGDP. Columns 1 and 2 refer to the 1-, 2-, and 3-quarter-ahead
forecasts of the benchmark autoregressive model. Next, in the
same order, are the results for the models that add the old LI
(columns 3–5) and the new LI (columns 6–8). Columns 3 and 6
show the average levels of the MSEs across the models in each
horizon. Columns 4 and 7 show the ratios of these summary error measures to the corresponding number for the autoregressive model, which is set equal to 1 in column 2. Entries in
columns 5 and 8 are the number of models in each category
in which adding the LI to a forecast equation has significantly
reduced the out-of-sample errors at the 5% level.
The main finding of Table 1 is that the autoregressive forecast
errors are reduced significantly by the addition of the LI terms
in each line (i.e., on the average across all models for each horizon), regardless whether the target is the historical or the realtime RGDP growth rate. Using new LI data reduces the errors
slightly more than using the old LI in five of the six cases (compare columns 4 and 7, remembering that the lower the ratio, the
greater the error reduction). The only exception is the 2-quarterahead forecast of real time RGDP growth rate (line 5), where
the addition of a new LI team produces a minutely smaller improvement than the addition of an old LI term.
Real-time data predict real-time RGDP growth rates much
more accurately than they predict historical RGDP growth
rates. The average MSEs in lines 4–6 are 22–50% lower than
the corresponding entries in lines 1–3. This is what one would
expect, because, as hinted earlier, forecasting the measurement
errors to be eliminated by the future RGDP revision is probably
a difficult and possibly impossible task. (These errors should
be largely random or else they could have been predicted by the
compilers of the national income and product accounts themselves.) The contributions of the LI terms are larger in lines
1 and 2 than in lines 4 and 5, but larger in line 6 than in line 3.
The leading indicators are more helpful in the intermediate
forecasts than in the shortest forecasts, and the noisy measurement errors (as estimated by the data revision) hurt the shortest
forecasts the most.
4.3.2 Significance Tests. Our tests involve comparing the
predictive accuracy in nested models (i.e., an autoregressive
model vs. an alternative obtained by adding the LI). Therefore, a standard test of predictive accuracy, such as the popular Diebold–Mariano (DM) test (Diebold and Mariano 1995),

McGuckin, Ozyildirim, and Zarnowitz: Index of Leading Indicators

117

Table 1. Out-of-Sample Forecasts of Log Changes in U.S. RGDP: Forecast MSEs for Autoregressive Benchmark Models and
for Models With the LI, 1989 Q1–2002 Q3

Line

Forecast horizon,
quarters (k)

RGDP alone
Level
Ratio
(1)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

Historical RGDP predicted with real-time data
1
1
6.747
(2.875)
2
2
11.726
(3.446)
3
3
18.488
(3.113)
Real-time RGDP predicted with real-time data
4
1
3.387
(.963)
5
2
7.983
(1.653)
6
3
14.464
(2.027)

Average MSEs of regression equations that use:
RGDP and old LI
Level
Ratio
Significance
Level

RGDP and new LI
Ratio
Significance

(2)

(3)

(4)

(5)

(6)

(7)

(8)

1.00

5.339
(1.986)
9.269
(1.945)
15.099
(1.993)

.79

4/16

.77

6/16

.79

11/12

.76

11/12

.82

8/8

5.179
(1.948)
8.863
(1.853)
14.664
(1.970)

.79

8/8

1.00

3/16

.98

3/16

.83

9/12

.84

10/12

.80

8/8

3.324
(.748)
6.679
(.809)
11.361
(1.037)

.79

8/8

1.00
1.00

1.00
1.00
1.00

3.379
(.747)
6.662
(.842)
11.621
(1.056)

NOTE: The entries in columns 1, 3, and 5 are averages of the MSEs in each category (16 for k = 1, 12 for k = 2, and 8 for k = 3, where k is the forecast horizon in quarters); those in
parentheses are the corresponding standard deviations of the MSEs in each category. The entries in columns 2, 4, and 6 are ratios, the average MSE in each class is divided by its counterpart for
the autoregressive model. Real-time data on RGDP was obtained from the website of the Federal Reserve Bank of Philadelphia (see Croushore and Stark 2001). Entries in columns 5 and 8 are
the number of models in each category where the LI has significant forecast ability at the 5% level. Significance results are based on the Mp test statistic (CCS) developed by Chao et. al. (2001).
An alternative test statistic proposed by Clark and McCracken (2001) gave similar results. The detailed results for both test statistics in each forecast model are reported in the appendix on The
Conference Board website’s appendix (see endnote 2).

is not appropriate, because under the null of equal predictive
ability, the limiting distribution of the DM test is not normal
(see McCracken 2000; Clark and McCracken 2001).
Fortunately, there is a growing literature on significance tests
for choosing between nested models (see Corradi and Swanson
2003 for a review.) We used the out-of-sample errors from our
forecast models to calculate a test statistic, CCS, developed by
Chao, Corradi, and Swanson (2001), which is easy to construct
and has a chi-squared distribution with degrees of freedom
equal to the number of extra explanatory variables, assuming
that the parameter estimation error vanishes. (See also Corradi
and Swanson 2002 for a more general version of this test.)
An alternative encompassing test statistic proposed by Clark
and McCracken (2001), which is also easy to construct but has
a nonstandard distribution, gave similar results. In the summary
tables, we report results based on the CCS test, because, unlike
the Clark and McCracken test, it is applicable to all forecast
horizons. Moreover, the CCS test is robust to dynamic misspecification. The detailed results for both test statistics in each
forecast model and horizon combination are reported in the appendix available on The Conference Board website, as mentioned in Section 1.
For 1-quarter-ahead forecasts, the addition of old LI lowers the forecast errors significantly at the 5% level for only 4
of the 16 models (see line 1, column 5). The addition of the
new LI does only a little better, reducing the errors significantly
for 6 of the 16 models (line 1, column 8). This is so when historical RGDP is predicted; for real-time RGDP, the results are
worse, with 3 of the 16 models gaining from the addition of
either the old LI or the new LI terms. However, significant improvements are obtained for as many as 11 of the 12 models,
for the 2-quarter-ahead and for all 8 models for the 3-quarterahead forecasts of historical RGDP with both the old and new
LIs (see lines 2 and 3, columns 5 and 8). The results for predicting real-time RGDP are almost as positive for the 2-quarterahead forecasts and equally positive for the 3-quarter-ahead

forecasts (lines 5 and 6). This provides additional evidence that
the leading indicators are particularly valuable for predictions
over longer horizons.
4.4 Out-of-Sample Forecasts of Changes in
the Current Conditions Index
4.4.1 Models With LI versus Autoregressive Models. Table 2, which has the same format as Table 1, demonstrates
that the U.S. LI systematically improves on the autoregressive benchmark model for the CCI. The overall MSEs (averages for 1-month-ahead, 3-month-ahead, and 6-month-ahead
predictions) are substantially lower for the regressions with
lagged CCI and LI terms than for the regressions with lagged
CCI terms only. This is true not only for the real-time old LI
(columns 3 and 4), but also for the new LI, even when the timeliness advantage of the latter is ignored (columns 6 and 7).
Again, in each case the entries in odd columns of part B
are lower than the corresponding entries in part A: real-time
data predict real-time CCI more accurately than they predict revised historical CCI. This parallels our finding for RGDP in Table 1. Also, adding LI lowers the forecasting errors compared
with those of the autoregressive benchmark models more for
the comparisons with the historical target series than for the
comparisons with the real-time target series (cf. the ratios in
columns 4 and 7). Unlike Table 1, Table 2 shows no exceptions
from the latter rule. The appendix on The Conference Board
website reports all of the underlying details and lists the MSEs
for all of the forecast exercises.
4.4.2 Significance Tests. The tests of significance
summed in Table 2, columns 5 and 8, also parallel those in
Table 1 in terms of format and results. Again, they are the
weakest for the shortest forecasts, especially so when real-time
CCI growth for the next month is predicted (cf. lines 1 and 4).
For 3-month-ahead forecasts, LI decreases the errors from the

118

Journal of Business & Economic Statistics, January 2007

Table 2. Out-of-Sample Forecasts of Log Changes in the CCI: Forecast MSE, for Autoregressive Benchmark Models and
for Models With the LI, January 1989–October 2002

Line

Forecast horizon,
months (k)

CGI alone
Level
Ratio
(1)

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

Historical CCI predicted with real-time data
1
1
2.182
(1.184)
2
3
5.112
(2.071)
3
6
11.191
(2.414)
Real-time CCI predicted with real-time data
4
1
.997
(.448)
5
3
3.252
(1.030)
6
6
9.089
(1.673)

Average MSEs of regression equations that use:
CCI and old LI
Level
Ratio
Significance
Level

CCI and new LI
Ratio
Significance

(2)

(3)

(4)

(5)

(6)

(7)

(8)

1.00

1.945
(.989)
3.958
(1.251)
8.516
(1.145)

.89

9/16

.90

8/16

.77

12/12

.79

12/12

.76

8/8

1.966
(1.000)
4.040
(1.317)
8.751
(1.255)

.78

8/8

.92

3/16

.93

3/16

.86

11/12

.88

11/12

.90

8/8

.930
(.396)
2.850
(.723)
8.286
(1.061)

.91

8/8

1.00
1.00

1.00
1.00
1.00

.914
(.388)
2.813
(.685)
8.155
(.970)

NOTE: The entries in columns 1, 3, and 5 are averages of the MSEs in each category (16 for k = 1, 12 for k = 3, and 8 for k = 6, where k is the forecast horizon in months); those in parentheses
are the corresponding standard deviations of the MSEs in each category. The entries in columns 2, 4, and 6 are ratios, the average MSE in each class is divided by its counterpart for autoregressive
model. Real-time data on the CCI are obtained from The Conference Board archives. Note that because of methodological changes in 1993 and 1997, we have reconstructed the real-time CCI using
real-time data for 1989–1996 vintages but with the current methodology to match the vintages from 1997–2002. (See “Business Cycle Indicators: Upcoming Revision of the Composite Indexes,”
Green and Beckman 1993, pp. 44–51, and The Conference Board 2001, p. 57.) Entries in columns 5 and 8 are the number of models in each category where the LI has significant forecast ability
at the 5% level. Significance results are based on the test statistic (CCS) developed by Chao et. al. (2001). An alternative test statistic proposed by Clark and McCracken (2001) also gave similar
results. The detailed results for both test statistics in each forecast model are reported in the appendix on The Conference Board website’s appendix (see endnote 2).

autoregressive models in all but 1 of the 24 cases; for 6-monthahead forecasts, LI is successful in all 16 cases.
We conclude that the leading indicators display greater net
predictive power over longer horizons. They also improve on
the autoregressive benchmark forecasts more when historical
targets are used than when real time targets are used. All of this
applies to the CCI forecasts, as well as to the RGDP forecasts.
4.5 Improved Timeliness Generates Gains in
Forecasting Accuracy
The tests so far have neglected the fact that the target period
has a different date for the old LI than for the new LI. For example, “1-month ahead” means something different for the two
indexes; thus to forecast growth in the CCI in March, the old
LI requires a 2-month lead (from January), whereas the new LI
reaches the same target month in 1 month (from February). The
new LI shortens the appropriate forecast horizon, but the shorter
the forecast, generally the smaller the forecast errors. We now
extend the analysis to evaluate the gains from the earlier release
of the LI. To do this, we need to measure explicitly how the two
LI forecast the same target period.
As an example, consider a 1-month-ahead forecast for
j CCI t . There are two ways to use j LI old
t−2 to forecast j CCI t .
The first is a direct (one-step) prediction in the regression
framework: j LI old
t−2 and j CCI t−2 are used to jointly predict
j CCI t . The second is a two-step prediction; a second-order
autoregressive model is used to estimate j LI old
t−1 , and then the
latter value, along with j CCI t−1 , is used to predict j CCI t .
Table 3 reports the results where the difference in timeliness
is accounted for with the old LI and replicates the results for the
new LI from Table 2 (columns 6 and 7). The second approach
yields somewhat more accurate results, as can be seen in Table 3
(compare columns 3, 4, and 5 with columns 6, 7, and 8).
When the dates of the target months are made identical for
the two indexes, the old LI is in effect made to forecast 1 month

further ahead. Therefore, the MSEs of models using the old
LI increase substantially (cf. Table 2, column 3 and Table 3,
columns 3 and 6). Now the out-of-sample errors of forecasts
with the new LI are smaller than the errors of the corresponding forecasts with the old LI in 11 of the 12 available comparisons (cf. Table 3, column 1 with columns 3 and 6; the only
exception is in line 4, where the entry in column 6 is slightly
less than the entry in column 1). However, the DM tests comparing the errors of the models using the old LI with the errors
of the models using the new LI (columns 5 and 8) indicate that
for most of the short forecasts, the differences are not significant at the 5% level. (Here the alternative models, one with the
new LI and the other with the old LI, are nonnested models.
The null hypothesis is that the errors from the latter are equal
to those from the former. The DM statistic is calculated using
the Newey–West heteroscedasticity and autocorrelation consistent standard errors with lag truncation equal to 4.) That is, the
null hypothesis that the two sets of errors are equal cannot be
rejected. Nonetheless, at longer horizons of 3 and 6 months, the
performance of the new LI improves, especially when predicting real-time CCI; that is, in about two-thirds of the cases, the
new LI leads to significantly lower out-of-sample errors. When
the difference in timeliness is ignored, both the old and the new
LIs lead to similar reductions in mean MSEs (6–10%) compared with the benchmark autoregressive model. Based on the
DM statistic the null hypothesis that forecast errors are equal
cannot be rejected. Compared with the similarity in the forecast
accuracy between the old and new LIs when their timeliness is
ignored, the results demonstrate that when the same month is
targeted by both LIs, the old LI suffers from being less timely
than the new LI.
5. CONCLUDING THOUGHTS
The main objective of this article is to evaluate a new procedure designed to make more efficient and complete use of the

McGuckin, Ozyildirim, and Zarnowitz: Index of Leading Indicators

119

Table 3. Out-of-Sample Forecasts of Log Changes in the CCI: Forecast MSE, for Autoregressive Benchmark Models and
for Models With the LI, January 1989–October 2002

Line

Forecast horizon,
months (k)

CCI and new LI
Level
Ratio

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:59 12 January 2016

(1)
Historical CCI predicted with real-time data
1
1
1.966
(1.000)
2
3
4.040
(1.317)
3
6
8.751
(1.255)
Real-time CCI predicted with real-tune data
4
1
.930
(.396)
5
3
2.850
(.723)
6
6
8.286
(1.061)

Average MSEs of regression equations that use:
Direct forecast,
CCI and old LI
Level
Ratio
Significance
Level

Two-step forecast,
CCI and old LI
Ratio
Significance

(2)

(3)

(4)

(5)

(6)

(7)

(8)

.90

2.370
(1.251)
4.685
(1.642)
9.893
(1.872)

1.09

6/16

.91

5/16

.92

4/12

.83

6/12

.88

5/8

1.983
(1.001)
4.222
(1.374)
9.343
(1.380)

.83

6/8

1.445
(.694)
3.928
(1.217)
9.695
(1.710)

1.45

13/16

.93

0/16

1.21

12/12

.93

10/12

1.07

5/8

.98

5/8

.79
.78

.93
.88
.91

.927
(.388)
3.039
(.721)
8.888
(1.169)

NOTE: See the notes for Table 2. The entries in columns 1, 3, and 5 are averages of the MSEs in each category; those in parentheses are the corresponding standard deviations of the MSEs in
each category. The entries in columns 2, 4, and 6 are ratios: the average MSE in each class is divided by its counterpart for the autoregressive model from Table 2, column 1. Entries in columns
5 and 8 are the number of models in each category where the predictive ability of the new LI (column 1) is significantly greater than that of the old LI (colum