Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500107000000322

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

The Henderson Smoother in Reproducing Kernel
Hilbert Space
Estela Bee Dagum & Silvia Bianconcini
To cite this article: Estela Bee Dagum & Silvia Bianconcini (2008) The Henderson Smoother in
Reproducing Kernel Hilbert Space, Journal of Business & Economic Statistics, 26:4, 536-545,
DOI: 10.1198/073500107000000322
To link to this article: http://dx.doi.org/10.1198/073500107000000322

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 73

View related articles

Citing articles: 3 View citing articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 12 January 2016, At: 22:51

The Henderson Smoother in Reproducing
Kernel Hilbert Space
Estela Bee DAGUM and Silvia B IANCONCINI
Department of Statistics, University of Bologna, Via Belle Arti 41, 40126 Bologna, Italy
(estela.beedagum@unibo.it ; silvia.bianconcini@unibo.it )
The Henderson smoother has been traditionally applied for trend-cycle estimation in the context of nonparametric seasonal adjustment software officially adopted by statistical agencies. This study introduces
a Henderson third-order kernel representation by means of the reproducing kernel Hilbert space (RKHS)
methodology. Two density functions and corresponding orthonormal polynomials have been calculated.
Both are shown to give excellent representations for short- and medium-length filters. Theoretical and
empirical comparisons of the Henderson third-order kernel asymmetric filters are made with the classical
ones. The former are shown to be superior in terms of signal passing, noise suppression, and revision size.

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016


KEY WORDS: Biweight density function; Higher order kernels; Local weighted least squares; Spectral
properties; Symmetric and asymmetric weighting systems.

1. INTRODUCTION
The linear smoother developed by Henderson (1916) is the
most widely applied to estimate the trend-cycle latent component in nonparametric seasonal adjustment software, such as
the U.S. Bureau of the Census X11 method (Shiskin, Young,
and Musgrave 1967) and its variants, the X11ARIMA (Dagum
1988) and X12ARIMA (Findley, Monsell, Bell, Otto, and Chen
1998). The study of the properties and limitations of the Henderson filter has been done in different contexts and attracted
the attention of a large number of authors, among them, Cholette (1981), Kenny and Durbin (1982), Castles (1987), Dagum
and Laniel (1987), Cleveland, Cleveland, McRae, and Terpenning (1990), Dagum (1996), Gray and Thomson (1996), Loader
(1999), Dalton and Keogh (2000), Quenneville, Ladiray, and
Lefrançois (2003), and Dagum and Luati (2004). However,
none of these studies have approached the Henderson smoother
from a reproducing kernel Hilbert space (RKHS) perspective.
A RKHS is a Hilbert space characterized by a kernel that
reproduces, via an inner product, every function of the space
or, equivalently, a Hilbert space of real-valued functions with

the property that every point evaluation functional is a bounded
linear functional.
Parzen (1959) was the first to introduce a RKHS approach
to time series applying the famous Loève (1948) theorem by
which there is an isometric isomorphism between the closed
linear span of a second-order stationary stochastic process
and the RKHS determined by its covariance function. Parzen
demonstrated that the RKHS approach provides a unified
framework to three fundamental problems concerning (1) least
squares estimation, (2) minimum variance unbiased estimation
of regression parameters, and (3) identification of unknown signals perturbed by noise. Parzen’s approach is parametric, and
basically consists of estimating the unknown signal by generalized least squares in terms of the inner product between the
observations and the covariance function. A nonparametric approach of the RKHS methodology was developed by DeBoor
and Lynch (1966) in the context of cubic spline approximation. Later, Kimeldorf and Wahba (1970) exploited both developments and treated the general spline smoothing problem
from a RKHS stochastic equivalence perspective. These authors

proved that minimum norm interpolation and smoothing problems with quadratic constraints imply an equivalent Gaussian
stochastic process.
The main purpose of this study is to introduce a nonparametric RKHS representation of the Henderson smoother. This
Henderson kernel representation enables the construction of a

hierarchy of kernels with varying smoothing properties. Furthermore, for each kernel order, the asymmetric filters can be
derived coherently with the corresponding symmetric weights
or from a lower or higher order kernel within the hierarchy, if
more appropriate. In the particular case of the currently applied
asymmetric Henderson filters, those obtained by means of the
RKHS, coherent to the symmetric smoother, are shown to have
superior properties from the viewpoint of signal passing, noise
suppression, and revisions. We compare the performance of the
kernel representations relative to the classical filters using real
life series.
Section 2 briefly deals with the basic properties of Hilbert
spaces and reproducing kernels. Section 3 discusses the classical Henderson symmetric smoother and two density functions
are derived to generate the corresponding third-order kernel
representations. It illustrates the approximations for spans of 9,
13, and 23 terms. Section 4 presents the asymmetric Henderson
kernel filters and makes a comparison with the currently being used by means of spectral analysis. Section 5 illustrates the
new asymmetric Henderson kernel smoothers with applications
to real data. Finally, Section 6 gives the conclusions.
2. LINEAR FILTERS IN REPRODUCING KERNEL
HILBERT SPACES

Let {yt , t = 1, 2, . . . , N} denote the input series. In this study
we work under the following (basic) specification for the input
time series.
Assumption 1. The input series {yt , t = 1, 2, . . . , N} can be
decomposed into the sum of a systematic component, called the

536

© 2008 American Statistical Association
Journal of Business & Economic Statistics
October 2008, Vol. 26, No. 4
DOI 10.1198/073500107000000322

Dagum and Bianconcini: Henderson Smoother in Reproducing Kernel Hilbert Space

signal (or nonstationary mean) gt , plus an erratic component ut ,
called the noise, such that
yt = g t + u t .

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016


If the input series {yt , t = 1, 2, . . . , N} is seasonally adjusted
or without seasonality, the signal gt represents the trend and
cyclical components, usually referred to as trend-cycle for they
are estimated jointly. The trend-cycle can be deterministic or
stochastic, and have a global or a local representation. It can be
represented locally by a polynomial of degree p of a variable j,
which measures the distance between yt and the neighboring
observations yt+j .
Assumption 2. Given ut for some time point t, it is possible
to find a local polynomial trend estimator
p

gt (j) = a0 + a1 j + · · · + ap j + εt (j),

(2)

where a0 , a1 , . . . , ap ∈ R and εt is assumed to be purely random
and mutually uncorrelated with ut .
The coefficients a0 , a1 , . . . , ap can be estimated by ordinary

or weighted least squares or by summation formulas. The solution for â0 provides the trend-cycle estimate ĝt (0), which equivalently consists in a weighted average applied in a moving manner (Kendall, Stuart, and Ord 1983), such that
m


wj yt+j ,

(3)

j=−m

where wj , j < N, denotes the weights to be applied to the observations yt+j to get the estimate ĝt for each point in time
t = 1, 2, . . . , N.
The weights depend on (1) the degree of the fitted polynomial, (2) the amplitude of the neighborhood, and (3) the shape
of the function used to average the observations in each neighborhood.
Once a (symmetric) span 2m + 1 of the neighborhood has
been selected, the wj ’s for the observations corresponding to
points falling out of the neighborhood of any target point are
null or approximately null, such that the estimates of the N −2m
central observations are obtained by applying 2m + 1 symmetric weights to the observations neighboring the target point.
The missing estimates for the first and last m observations can

be obtained by applying asymmetric moving averages of variable length to the first and last m observations, respectively. The
length of the moving average or time-invariant symmetric linear filter is 2m + 1, whereas the asymmetric linear filters length
is time varying.
Using the backshift operator B, such that Bn yt = yt−n , (3) can
be written as
ĝt =

m


Definition 1. Given p ≥ 2, W(B) is a pth-order kernel if
m


(1)

The noise ut is assumed either to be a white noise, WN(0, σu2 ),
or, more generally, to follow a stationary and invertible autoregressive moving average (ARMA) process.

ĝt (0) = ĝt =


537

wj = 1

(5)

ji wj = 0,

(6)

j=−m

and
m


j=−m

for some i = 1, 2, . . . , p − 1. In other words, it will reproduce a

polynomial trend of degree p − 1 without distortion.
A different characterization of a pth-order nonparametric estimator can be provided by means of the RKHS methodology.
A Hilbert space is a complete linear space with a norm given
by an inner product. The space of square integrable functions
L2 and the finite p-dimensional space Rp are those used in this
study.
Assumption 3. The time series {yt , t = 1, 2, . . . , N} is a finite realization of a family
of square Lebesgue integrable ran
dom variables, that is, R |Yt |2 < ∞. Hence, the random process
{Yt }t∈R belongs to the space L2 (R).
The space L2 (R) is a Hilbert space endowed with the inner
product defined by

U(t), V(t) = E(U(t)V(t)) = U(t)V(t)f0 (t) dt, (7)
R

L2 (R),

where U(t)V(t) ∈
and f0 is a probability density function, weighting each observation to take into account its position in time. In the following, L2 (R) will be indicated as L2 (f0 ).

Under Assumption 2, the local trend gt (·) belongs to the
Pp space of polynomials of degree at most p, p being a nonnegative integer.
Pp is a Hilbert subspace of L2 (f0 ), hence inherits its inner
product given by

(8)
P(t), Q(t) = P(t)Q(t)f0 (t) dt,
R

where P(t), Q(t) ∈ Pp .
Corollary 1. The space Pp is a reproducing kernel Hilbert
space of polynomials on some domain T ⊆ R, that is, for all
t ∈ T and for all P ∈ Pp , there exists an element
Rp (t; ·) ∈ Pp ,

such that P(t) = P(·); Rp (t; ·).

The proof easily follows by the fact that any finite-dimensional Hilbert space has a reproducing kernel (see, for details,
Berlinet and Thomas-Agnan 2003).
Rp (t, ·) is called reproducing kernel because
R(t, ·), R(·, s) = R(s, t).
Formally, R is a function
R : T × T → R,
(s, t) → R(s, t),

j

wj B yt = W(B)yt ,

t = 1, 2, . . . , N,

j=−m

where W(B) is a linear nonparametric estimator.

(4)

satisfying the following properties:
1. R(t, ·) ∈ H, ∀t ∈ T;
2. g(·), R(t, ·) = g(t), ∀t ∈ T and g ∈ H.

(9)

538

Journal of Business & Economic Statistics, October 2008

This last condition is called the “reproducing property”: the
value of the function g at the point t is reproduced by the inner product of g with R(t, ·).
Suppose that {yt , t = 1, 2, . . . , N} ∈ L2 (f0 ) can be decomposed as in (1); the estimate ĝt of (2) can be obtained by minimizing the distance between yt+j and gt (j), that is,
 m
2
yt+j − gt (j) =
(yt+j − gt (j))2 f0 (j) dj,
(10)
−m

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

where the positive real number m determines the neighborhood
of t on which the deviation between yt+j and gt (j) is taken into
account in the L2 -sense. For this reason, 2m + 1 is called the
bandwidth. The weighting function f0 depends on the distance
between the target point t and each observation in the 2m + 1
points neighborhood (for m + 1 ≤ t ≤ N − m).
Theorem 2. Under Assumptions 1, 2, and 3, the minimization problem (10) has a unique and explicit solution.
Proof. By the projection theorem (see, e.g., Priestley 1981),
each element yt+j of the Hilbert space L2 (f0 ) can be decomposed into the sum of its projection in a Hilbert subspace of
L2 (f0 ), such as the space Pp , plus its orthogonal complement as
follows:
yt+j = Pp [yt+j ] + {yt+j − Pp [yt+j ]},

(11)

where Pp [yt+j ] denotes the projection of the observations
yt+j , j = −m, . . . , m, on Pp . By orthogonality, for every j ∈ T
yt+j , Rp (j, 0) = Pp [yt+j ], Rp (j, 0)
= Pp [yt ] = ĝt (0) = ĝt .
Thus, ĝt (0) is given by
 m
ĝt (0) =
yt+j Rp (j, 0)f0 (j) dj
=

−m
 m

−m

Pp [yt+j ]Rp (j, 0)f0 (j) dj,

(12)

(13)

(15)

where p is the degree of the fitted polynomial.
The following result, proved by Berlinet (1993), is fundamental.
Corollary 3. Kernels of order (p + 1), p ≥ 1, can be written as products of the reproducing kernel Rp (t, ·) of the space
Pp ⊆ L2 (f0 ) and a density function f0 with finite moments up to
order 2p. That is,
Kp+1 (t) = Rp (t, 0)f0 (t).
Remark 1 (Christoffel–Darboux formula). For any sequence
(Pi )0≤i≤p of (p + 1) orthonormal polynomials in L2 (f0 ),
Rp (t, 0) =

i=0

Pi (t)Pi (0).

p


Pi (t)Pi (0)f0 (t).

(17)

i=0

An important outcome of the RKHS theory is that linear filters
can be grouped into hierarchies {Kp , p = 2, 3, 4, . . .} with the
following property: each hierarchy is identified by a density f0
and contains kernels of order 2, 3, 4, . . . which are products of
orthonormal polynomials by f0 .
The weight system of a hierarchy is completely determined
by specifying (1) the bandwidth or smoothing parameter, (2) the
maximum order of the estimator in the family, and (3) the density f0 .
There are several procedures to determine the bandwidth
or smoothing parameter (for a detailed discussion see, e.g.,
Berlinet and Devroye 1994). In this study, however, the smoothing parameter is not derived by data-dependent optimization
criteria, but we fixed it with the aim to obtain a kernel representation of the most often applied Henderson smoothers. Kernels
of any length, including infinite ones, can be obtained with the
above approach. Consequently, the results discussed can be easily extended to any filter length as long as the density function
and its orthonormal polynomials are specified.
The identification and specification of the density is one of
the most crucial tasks for smoothers based on local polynomial
fitting by weighted least squares, as Loess and the Henderson
smoother. The density is related to the weighting penalizing
function of the minimization problem.
We remark that the RKHS approach can be applied to any
linear filter characterized by varying degrees of fidelity and
smoothness as described by Gray and Thomson (1996). In particular, applications to local polynomial smoothers are treated
in Dagum and Bianconcini (2006), and to smoothing splines in
Wahba (1990).
3. THE SYMMETRIC HENDERSON SMOOTHER AND
ITS KERNEL REPRESENTATION

Hence, the estimate ĝt can be equivalently seen as the projection of yt on Pp and as a local weighted average of the observations for the discrete version of the filter given in (4), where the
weights wj are derived by a kernel function K of order p + 1,

p


Kp+1 (t) =

(14)

where Rp is the reproducing kernel of the space Pp .

Kp+1 (t) = Rp (t, 0)f0 (t),

Therefore, (15) becomes

(16)

Recognition of the fact that the smoothness of the estimated
trend-cycle curve depends directly on the smoothness of the
weight diagram led Henderson (1916) to develop a formula
which makes the sum of squares of the third differences of the
smoothed series a minimum for any number of terms. Henderson’s starting point was the requirement that the filter should
reproduce a cubic polynomial trend without distortion. Henderson proved that three alternative smoothing criteria give the
same formula, as shown explicitly by Kenny and Durbin (1982)
and Gray and Thomson (1996): (1) minimization of the variance of the third differences of the series defined by the application of the moving average; (2) minimization of the sum of
squares of the third differences of the coefficients of the moving average formula; (3) fitting a cubic polynomial by weighted
least squares, where the weights are chosen to minimize the
sum of squares of their third differences.
The problem is one of fitting a cubic trend by weighted least
squares to the observations yt+j , j = −m, . . . , m, the value of the
fitted function at j = 0 being taken as the smoothed observation ĝt . Representing the weight assigned to the residuals from

Dagum and Bianconcini: Henderson Smoother in Reproducing Kernel Hilbert Space

the local polynomial regression by Wj , j = −m, . . . , m, where
Wj = W−j , the problem is the minimization of
m


Wj [yt+j − a0 − a1 j − a2 j2 − a3 j3 ]2 ,

(18)

j=−m

where the solution for the constant term â0 is the smoothed observation ĝt . Henderson (1916) showed that ĝt is given by
ĝt =

m


φ(j)Wj yt+j =

(19)

wj yt+j ,

j=−m

j=−m

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

m


where φ(j) is a cubic polynomial whose coefficients have the
property that the smoother reproduces the data if they follow
a cubic. Henderson also proved the converse: if the coefficients
of a cubic-reproducing summation formula {wj , j = −m, . . . , m}
do not change their sign more than three times within the filter span, then the formula can be represented as a local cubic
smoother with weights Wj > 0 and a cubic polynomial φ(j),
such that φ(j)Wj = wj . Henderson (1916)
 measured the amount
of smoothing of the input series by (3 yt )2 or equivalently
by the sum
 of squares of the third differences of the weight
diagram, (3 wj )2 . The solution is that resulting from the
minimization of a cubic polynomial function by weighted least
squares with
2

2

2

2

2

2

Wj ∝ {(m + 1) − j }{(m + 2) − j }{(m + 3) − j }

(20)

as the weighting penalty function of criterion (3) above. Following Henderson (1916), the weight diagram {wj , j = −m, . . . , m}
corresponding to (20), known as Henderson’s ideal formula, is
obtained, for a filter length equal to 2m′ − 3, by

wj = 315[(m′ − 1)2 − j2 ](m′2 − j2 )

× [(m′ + 1)2 − j2 ](3m′2 − 16 − 11j2 )
 ′ ′2

8m (m − 1)(4m′2 − 1)(4m′2 − 9)(4m′2 − 25) . (21)

This optimality result has been rediscovered several times
in modern literature, usually for asymptotic variants. Loader
(1999) showed that Henderson’s ideal formula (21) is a finite
sample variant of a kernel with second-order vanishing moments which minimizes the third derivative of the function
given by Müller (1984). In particular, Loader showed that for
large m, the weights of Henderson’s ideal penalty function Wj
are approximately m6 W(j/m), where W(j/m) is the triweight
function. He concluded that, for very large m, the weight diagram is approximately (315/512) ∗ W(j/m)(3 − 11(j/m)2 )
equivalent to the kernel given by Müller (1984).
To derive the Henderson kernel hierarchy by means of the
RKHS methodology, the density corresponding to Wj and its
orthonormal polynomials have to be determined.
The triweight density function gives very poor results when
the Henderson smoother spans are of short or medium lengths,
as in most application cases, ranging from 5 to 23 terms. Hence,
we derive the exact density function corresponding to Wj .
Theorem 4. The exact probability density corresponding to
Henderson’s ideal weighting penalty function (20) is given by
f0H (t) =
where k =

(m + 1)
W((m + 1)t),
k

 m+1

−m−1 W(j) dj

t ∈ [−1, 1],

and j = (m + 1)t.

(22)

539

Proof. Interpolating the weighting function (20), we can
note that W(j) is nonnegative in the intervals (−m − 1, m + 1),
(−m − 3, −m − 2), (m + 2, m + 3) and negative otherwise. W(j)
is also equal to zero if j = ±(m + 1), ±(m + 2), ±(m + 3), and
on [−m − 1, m + 1], W(j) is increasing in [−m − 1, 0), decreasing on (0, m + 1], and reaches its maximum at j = 0.
Therefore, we choose the support [−m − 1, m + 1] to satisfy
 m+1
the positive definite condition. The integral k = −m−1 W(j) dj
is different from 1 and represents the integration constant on
this support. It follows that the density corresponding to W(j)
on the interval [−m − 1, m + 1] is given by
f0 (j) = W(j)/k.
To eliminate the dependence of the support on the bandwidth
parameter m, a new variable ranging on [−1, 1], t = j/(m + 1),
is considered. Applying the change of variables method,
−1
∂t (j)
−1
,
f0H (t) = f0 (t (j))
∂t

where t(j) =
substitution.

j
m+1

and t−1 (j) = (m + 1)t. The result follows by

The density f0H (t) is symmetric, that is, f0H (−t) = f0H (t),
nonnegative on [−1, 1], and is equal to zero when t = −1 or
t = 1. Furthermore, f0H (t) is increasing on [−1, 0), decreasing
on (0, 1], and reaches its maximum at t = 0. For m = 6, the
filter is the classical 13-term Henderson and the corresponding
probability function results:
f0H (t) =

15
(5,184 − 12,289t2 + 9,506t4 − 2,401t6 ),
79,376
t ∈ [−1, 1].

(23)

Following Corollary 3, to obtain higher order kernels the corresponding orthonormal polynomials have to be computed for
the density (22). The polynomials can be derived by the Gram–
Schmidt orthonormalization procedure or by solving the Hankel system based on the moments of the density f0H (Brezinski 1980). This latter is the approach followed in this study.
The hierarchy corresponding to the 13-term Henderson kernel
is shown in Table 1, where for p = 3 it provides a representation
of the classical Henderson filter.
Because the triweight density function gives a poor approximation of the Henderson weights for small m (5 to 23 terms),
we search for another density function with well-known theoretical properties. The main reason is that the exact density (22)
is a function of the bandwidth and needs to be calculated any
time that m changes together with its corresponding orthonormal polynomials. We found the biweight to give almost equivalent results to those obtained with the exact density function
without the need to be calculated any time that the Henderson
smoother length changes.
Another important advantage is that the biweight density
function belongs to the well-known Beta distribution family,
that is,



r
(1 − |t|r )s I[−1,1] (t),
(24)
f (t) =
2B(s + 1, 1r )

540

Journal of Business & Economic Statistics, October 2008

Table 1. 13-term Henderson kernel hierarchy
Henderson kernels

Kernel orders

15
2
4
6
79,376 (5,184 − 12,289t + 9,506t − 2,401t )
1,372 2
15
2
4
6 2,175
79,376 (5,184 − 12,289t + 9,506t − 2,401t )( 1,274 − 265 t )

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

1
where B(a, b) = 0 ta−1 (1 − t)b−1 dt with a, b > 0 is the Beta
function. The orthonormal polynomials needed for the reproducing kernel associated to the biweight function are the Jacobi
polynomials, for which explicit expressions for computation are
available, and their properties have been widely studied in literature.
Therefore, we obtain another Henderson kernel hierarchy using the biweight density
15
(1 − t2 )2 I[−1,1] (t)
(25)
16
combined with the Jacobi orthonormal polynomials. These
latter are characterized by the following explicit expression
(Abramowitz and Stegun 1972):



n

1  n+α
n+β
Pα,β
(t − 1)n−m (t + 1)m ,
(t)
=
n
m
n−m
2n
f0B (t) =

m=0

(26)
where α = 2 and β = 2.
The Henderson second-order kernel is given by the density
function f0B , because the reproducing kernel R1 (j, 0) of the
space of polynomials of degree at most 1 is always equal to 1.
On the other hand, the third-order kernel is given by



15
7 21 2
2 2
(27)
(1 − |t| ) ×
− t .
16
4
4
In Table 2 we give the classical and the two kernel Henderson
symmetric weights for spans of 9, 13, and 23 terms, where the
central weight values are shown in bold. Figure 1 illustrates the
13-term Henderson smoothers.
The small discrepancy of the two kernel functions relative to
the classical Henderson smoother are due to the fact that the exact density is obtained by interpolation from a finite small number of points of the weighting penalty function Wj . On the other
hand, the biweight is already a density function which is made
discrete by choosing selected points to produce the weights. We

p=2
p=3


also calculated the smoothing measure (3 wj )2 for each filter span as shown in Table 3.
The smoothing powers of the filters are very close except for
the exact 9-term Henderson kernel which gives the smoothest
curve.
Given the “virtual” equivalence for symmetric weights, we
use the RKHS methodology to generate the correspondent
asymmetric filters for the m first and last points.
4. ASYMMETRIC HENDERSON SMOOTHERS AND
THEIR KERNEL REPRESENTATIONS
The asymmetric Henderson smoothers currently in use were
developed by Musgrave (1964a,b). They are based on the minimization of the mean squared revision between the final estimates (obtained by the application of the symmetric filter) and
the preliminary ones (obtained by the application of an asymmetric filter) subject to the constraint that the sum of the weights
is equal to 1 (see, e.g., Doherty 2001). The assumption made is
that at the end of the series, the seasonally adjusted values follow a linear trend-cycle plus a purely random irregular εt , such
that εt ∼ IID(0, σ 2 ).
The asymmetric weights of the Henderson kernels are derived by adapting the third-order kernel functions to the length
of the last m asymmetric filters such that
K(j/b)
,
i=−m K(i/b)

wj = q

j = −m, . . . , q,

(28)

where we denote with K(·) the third-order Henderson kernel,
j the distance to the target point t, b the bandwidth parameter
equal to m + 1, and m + q + 1 the asymmetric filter length.
For example, the asymmetric weights of the 13-term Henderson
kernel for the last point are given by
wj = 0

K(j/7)

i=−6 K(i/7)

,

j = −6, . . . , 0.

(29)

Table 2. Weight systems for 9-, 13-, and 23-term Henderson smoothers
Length

Henderson filter

9

Classical
Exact kernel
Biweight kernel

−.041
−.038
−.039

−.010
−.005
−.011

.118
.119
.120

.267
.262
.266

.331
.325
.328

13

Classical
Exact kernel
Biweight kernel

−.019
−.019
−.020

−.028
−.027
−.030

.000
.001
.002

.065
.066
.070

.147
.147
.149

.214
.213
.211

.240
.238
.234

23

Classical
Exact kernel
Biweight kernel

−.004
−.004
−.005

−.011
−.011
−.014

−.016
−.016
−.018

−.015
−.014
−.014

−.005
−.005
−.001

.013
.014
.019

.039
.039
.045

.068
.068
.072

.097
.097
.098

.122
.122
.118

.138
.138
.132

.144
.144
.137

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

Dagum and Bianconcini: Henderson Smoother in Reproducing Kernel Hilbert Space

541

Figure 1. Classical Henderson smoothers and third-order Henderson kernels of 13 terms (—– classical Henderson; · · · biweight kernel;
–+– exact kernel).

Figure 2 shows the gain functions of the symmetric Henderson smoother together with the classical and kernel representation for the last point superposed on the X11/X12ARIMA seasonal adjustment (SA) symmetric filter. This latter results from
the convolution of (1) 12-term centered moving average (MA),
(2) 3 × 3 MA, (3) 3 × 5 MA, and (4) 13-term Henderson MA.
Its property has been widely discussed in Dagum, Chaab, and
Chiu (1996). The gain function of the SA filter should be interpreted as relating the spectrum of the original series to the
spectrum of the estimated seasonally adjusted series.
It is apparent that the asymmetric Henderson kernel filter
does not amplify the signal as the classical asymmetric one and
converges faster to the final one. Furthermore, the asymmetric
kernel suppresses more noise relative to the Musgrave filter. Because the weights of the biweight and exact Henderson kernels
are equal up to the third digit, no visible differences are seen
in the corresponding gain and phaseshift functions. Hence, we
only show those corresponding to the exact Henderson kernel
in Figures 2 and 3.
There is an increase of the phase shift for the low frequencies
relative to that of the classical 13-term Henderson but both are
less than a month, as exhibited in Figure 3.
5. EMPIRICAL ANALYSIS
We apply the classical and kernel Henderson smoothers for
the last point to a set of 30 series selected from the Federal
Table 3. Sum of squares of the third differences of the weights for the
classical and kernel Henderson smoother
Filters
Classical Henderson smoother
Exact Henderson kernel
Biweight Henderson kernel

9-term

13-term

23-term

.053
.048
.052

.01
.01
.01

.00
.00
.00

Reserve Bank of St. Louis. These series are the most important concerning socioeconomic indicators and they have been
taken in seasonally adjusted form. The periods chosen vary to
sufficiently cover the various lengths published for these series.
For each series we calculate the differences between their corresponding absolute percentage revisions (APR) defined by
Difference in the APR



C
X̂ − X̂ CF X̂ K − X̂ KF
= t CF t − t KF t × 100


t

∀t = 1, 2, . . . , N,

t

(30)

where X̂tC and X̂tK denote the trend-cycle estimates at time t
obtained by the Henderson and kernel asymmetric filters, respectively, X̂tCF and X̂tKF are their final trend-cycle estimates,
and N is the number of observations. The results shown in Table 4 indicate that the mean of the APR differences is always
positive, hence the Henderson kernel last point predictor has
smaller APR than the classical one. We also included the minimum and maximum difference values for each series where
the maximum indicates that the Henderson kernel is to be preferred. The values show that the latter is systematically much
greater than its corresponding minimum. In view of these excellent results, we would recommend the replacement of the
current Musgrave filters by the kernel ones in seasonal adjustment software.
We now illustrate how the “reproducing” Henderson kernel
estimator responds to the variability of the data by means of two
series characterized by different levels of noise-to-signal ratios.
The first series, House Spending Index (HSI), is one of the
Canadian leading indicators discussed in Dagum (1996). This
series covers the period January 1981–December 1993 and is
published by Statistics Canada, in both original and seasonally
adjusted forms. The I/C ratio (noise over signal) for this series,
calculated by the X11/X12ARIMA software, is equal to 1.77
and hence, falls within the range where a 13-term Henderson is

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

542

Journal of Business & Economic Statistics, October 2008

Figure 2. Gain functions of the (last point) asymmetric 13-term filter (—) and kernel ( ) with the corresponding symmetric (– – –) and SA
cascade filter (· · ·).

chosen for trend-cycle estimation. The HSI signal is dominated
by relative short cycles, where according to the gain functions
shown in Figure 2 there should be an advantage for the kernel
asymmetric filter. We produce the trend-cycle estimates of the
series applying the asymmetric last point filters of both classical
and kernel Henderson representation together with the symmetric one.
To compare the performances of the two asymmetric
smoothers, in Figure 4 we superposed the temporal pattern of
the APR differences together with the original data. The temporal pattern of the APR differences clearly favors the kernel
asymmetric filter for the period September 1981–July 1983 and
from May 1990 up to the end of the series. These two periods
are characterized by the presence of short cycles and several
points of maxima and minima. The revisions differences oscillate between −.85 and 2.45, reaching the highest values at

turning points. From August 1983 to April 1990, the series is
characterized by an increasing steady trend with small fluctuations; hence the performances of the two filters are quite similar
with the difference ranging around .50.
The other series, U.S. Unemployed Women, 16–19 years old,
covers the period July 1981–June 1994. The original input for
both trend-cycle estimators is the seasonally adjusted series
modified by extreme values with zero weights as suggested by
Dagum (1996). The I/C ratio for the series is equal to 3.04 and
again the 13-term Henderson is chosen, but as shown in Figure 5 the trend is dominated by long-term periodic components.
The high value of the I/C ratio is mainly due to the irregulars.
In this case, the kernel filter still performs better than the classical Henderson given its noise reduction at the high-frequency
band, but the values oscillate between −.60 and 1.31 all along
the series.

Figure 3. Phase shifts of the asymmetric (end point) weights of the Henderson kernel (· · ·) and of the classical H13 filter (—).

Dagum and Bianconcini: Henderson Smoother in Reproducing Kernel Hilbert Space

543

Table 4. Some statistics values of the APR differences between classical and kernel Henderson last point filters

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

Series

Period

Mean

St. Dev.

Min

Max

U.S. TRADE AND INTERNATIONAL TRANSACTIONS
Goods and services
Goods
Services
Exports of goods
Imports of goods
Exports of services
Imports of services

07/92–02/06
07/92–02/06
07/92–02/06
01/93–12/02
01/93–12/02
01/93–12/02
01/95–12/04

.01
.03
.09
.05
.03
.07
.02

1.23
.65
.64
.26
.29
.21
.24

−2.72
−1.57
−1.92
−.59
−.46
−.28
−.41

5.55
2.42
2.13
.81
.82
.79
.73

EXCHANGE RATES
Canada/U.S.
Japan/U.S.
U.S./U.K.

01/90–12/99
01/90–12/99
01/95–12/04

.02
.04
.04

.15
.54
.54

−.32
−1.13
−1.72

.39
1.74
2.23

U.S. BANKING
1-year treasury constant maturity rate
Bank prime loan rate
Travelers checks outstanding
Other securities at all commercial banks
Demand deposits at commercial banks
Savings deposits—total

01/93–12/03
07/92–07/02
01/91–12/00
01/90–12/99
01/95–12/04
01/90–12/99

.01
.06
.01
.03
.08
.08

.98
1.18
.17
.42
.26
.21

−2.43
−3.00
−.29
−.94
−.39
−.29

3.36
3.50
.34
1.62
.87
.55

U.S. EMPLOYMENT AND POPULATION
Civilian unemployed—less than 5 weeks
Median duration of unemployment
Unemployed: 16 years and over
U.S. unemployment women (ages 16–21)

01/96–12/05
01/96–12/05
01/96–12/05
07/81–06/94

.06
.02
.01
.07

.34
.50
.24
.35

−.83
−1.00
−.38
−.60

.87
1.30
.55
1.31

CANADIAN LEADING INDICATORS
Household spending index
New order for durable goods
Average weekly hours in manufacturing

07/81–06/93
07/80–06/92
07/80–06/92

.07
.03
.03

.58
.36
.09

−.85
−.68
−.12

2.45
1.07
.19

U.S. ECONOMIC DATA
Lightweight vehicle sales: autos and light trucks
Housing starts: total
Manufacturers’ new order
Retail and food services sales
Capacity utilization: manufacturing (NAICS)
ISI manufacturing: PMI composite index
Capacity utilization: total industry

07/76–03/06
01/94–12/03
01/92–12/00
01/94–12/03
01/90–12/99
07/91–07/00
01/91–12/00

.14
.08
.01
.02
.03
.14
.01

.47
.39
.23
.11
.13
1.23
.21

−1.17
−.89
−.47
−.18
−.28
−1.98
−.63

2.61
1.16
.53
.78
.41
3.86
.73

Figure 4. House Spending Index (HSI) original series ( ) and APR differences (—).

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

544

Journal of Business & Economic Statistics, October 2008

Figure 5. U.S. unemployment women (age 16–19) original series ( ) and APR differences (—–).

6.

CONCLUSIONS

We introduced a new representation of the Henderson
smoother by means of the reproducing kernel Hilbert space
(RKHS) methodology. The linear estimator is first transformed
into a second-order kernel and from it a hierarchy is constructed
that includes higher order kernels. We showed that the biweight
density function is very close to the exact density calculated on
the basis of the Henderson weighting penalty function. The biweight has a computing advantage over the latter, in the sense
that it does not need to be calculated any time that the Henderson smoother length changes, as happens with the exact
one. Another advantage is that the biweight density function
belongs to the well-known Beta distribution family, where the
associated orthonormal polynomials are the Jacobi ones. We
used both densities and associated orthonormal polynomials
to generate Henderson kernels of order 2 and 3. The symmetric
Henderson kernel weights for both generating density functions
are very close for short spans and we illustrated those most often applied to monthly data.
The asymmetric weights of the Henderson kernels have been
derived by adapting the third-order kernel functions to the
length of the last six asymmetric filters. Applied to a set of 30
real series, the absolute percentage revisions (APR) of the last
point filter are systematically smaller for the kernel representation than for the classical Henderson. These results conform to
their respective gain functions.
ACKNOWLEDGMENTS
We thank very much the joint editor, Arthur Lewbel, an
anonymous associate editor, and two anonymous referees for
their thorough reading and insightful comments to an earlier
version of the paper.
[Received August 2006. Revised March 2007.]

REFERENCES
Abramowitz, M., and Stegun, I. (1972), Handbook of Mathematical Functions
With Formulas, Graphs and Mathematical Tables, Washington, DC: U.S.
Government Printing Office.
Berlinet, A. (1993), “Hierarchies of Higher Order Kernels,” Probability Theory
and Related Fields, 94, 489–504.
Berlinet, A., and Devroye, L. (1994), “A Comparison of Kernel Density Estimates,” Publications de l’Institut de Statistique de l’Université de Paris, 38,
3–59.
Berlinet, A., and Thomas-Agnan, C. (2003), Reproducing Kernel Hilbert
Spaces in Probability and Statistics, Amsterdam: Kluwer Academic.
Brezinski, C. (1980), Pade Approximation and General Orthogonal Polynomials, Basel: Birkhäuser.
Castles, I. (1987), “A Guide to Smoothing Time Series Estimates of Trend,”
Catalogue No. 1316, Australian Bureau.
Cholette, P. A. (1981), “A Comparison of Various Trend-Cycle Estimators,” in
Time Series Analysis, eds. O. D. Anderson and M. R. Perryman, Amsterdam:
North-Holland, pp. 77–87.
Cleveland, R., Cleveland, W., McRae, J., and Terpenning, I. (1990), “STL:
A Seasonal Trend Decomposition Procedure Based on LOESS,” Journal of
Official Statistics, 6, 3–33.
Dagum, E. B. (1988), “The X11ARIMA/88 Seasonal Adjustment Method—
Foundation and User’s Manual,” research paper, Time Series Research and
Analysis Division, Ottawa: Statistics Canada.
(1996), “A New Method to Reduce Unwanted Ripples and Revisions in
Trend-Cycle Estimates From X11ARIMA,” Survey Methodology, 22, 77–83.
Dagum, E. B., and Bianconcini, S. (2006), “Local Polynomial Trend-Cycle Predictors in Reproducing Kernel Hilbert Spaces for Current Economic Analysis,” Anales de Economia Aplicada, XX, 1–22.
Dagum, E. B., and Laniel, N. (1987), “Revisions of Trend-Cycle Estimators
of Moving Average Seasonal Adjustment Methods,” Journal of Business &
Economic Statistics, 5, 177–189.
Dagum, E. B., and Luati, A. (2004), “Relationship Between Local and Global
Nonparametric Estimators Measures of Fitting and Smoothing,” Studies in
Nonlinear Dynamics and Econometrics, 8, article 17.
Dagum, E. B., Chaab, N., and Chiu, K. (1996), “Derivation and Properties of the
X11ARIMA and Census X11 Linear Filters,” Journal of Official Statistics,
12, 329–348.
Dalton, P., and Keogh, G. (2000), “An Experimental Indicator to Forecast Turning Points in the Irish Business Cycle,” Journal of the Statistical and Social
Inquiry Society of Ireland, 29, 117–176.
DeBoor, C., and Lynch, R. (1966), “On Splines and Their Minimum Properties,” Journal of Mathematics and Mechanics, 15, 953–969.
Doherty, M. (2001), “The Surrogate Henderson Filters in X11,” Australian and
New Zealand Journal of Statistics, 43, 385–392.
Findley, D., Monsell, B., Bell, W., Otto, M., and Chen, B. (1998), “New Capabilities and Methods of the X12ARIMA Seasonal Adjustment Program,”
Journal of Business & Economic Statistics, 16, 127–152.

Dagum and Bianconcini: Henderson Smoother in Reproducing Kernel Hilbert Space

Downloaded by [Universitas Maritim Raja Ali Haji] at 22:51 12 January 2016

Gray, A., and Thomson, P. (1996), “Design of Moving-Average Trend Filters
Using Fidelity and Smoothness Criteria,” in Time Series Analysis, Vol. 2 (in
memory of E. J. Hannan), eds. P. M. Robinson and M. Rosenblatt, Springer
Lecture Notes in Statistics, Vol. 15, New York: Springer, pp. 205–219.
Henderson, R. (1916), “Note on Graduation by Adjusted Average,” Transaction
of Actuarial Society of America, 17, 43–48.
Kendall, M. G., Stuart, A., and Ord, J. (1983), The Advanced Theory of Statistics, Vol. 3, London: C. Griffin, eds.
Kenny, P., and Durbin, J. (1982), “Local Trend Estimation and Seasonal Adjustment of Economic and Social Time Series,” Journal of the Royal Statistical
Society, Ser. A, 145, 1–41.
Kimeldorf, G., and Wahba, G. (1970), “Splines Functions and Stochastic
Processes,” Sankhyā, Ser. A, 32, 173–180.
Loader, C. (1999), Local Regression and Likelihood, New York: Springer.
Loève, M. (1948), “Fonctions alèatories du second ordre,” Appendix to Levy, P., Processus Stochastiques et Movement Brownien, Paris: GauthierVillars.
Müller, H. G. (1984), “Smooth Optimum Kernel Estimators of Regression
Curves, Densities and Modes,” The Annals of Statistics, 12, 766–774.

545

Musgrave, J. (1964a), “A Set of End Weights to End All End Weights,” working
paper, Washington, DC: U.S. Bureau of the Census.
(1964b), “Alternative Sets of Weights for Proposed X-11 Seasonal Factor Curve Moving Averages,” working paper, Washington, DC: U.S. Bureau
of the Census.
Parzen, E. (1959), “Statistical Inference on Time Series by Hilbert Space Methods,” Technical Report 53, Stanford University, Statistics Dept.
Priestley, M. B. (1981), Spectral Analysis and Time Series, New York: Academic Press.
Quenneville, B., Ladiray, D., and Lefrançois, B. (2003), “A Note on Musgrave
Asymmetrical Trend-Cycle Filters,” International Journal of Forecasting,
19, 727–734.
Shiskin, J., Young, A., and Musgrave, J. (1967), “The X-11 Variant of the Census Method II Seasonal Adjustment Program,” Technical Paper 15, Washington, DC: U.S. Department of Commerce, Bureau of the Census.
Wahba, G. (1990), Spline Models for Observational Data, Philadelphia: Society
for Industrial and Applied Mathematics.