1. Introduction The aim of nonparametric regression is to estimate regression functions
without assuming a priori knowledge of their functional forms. The price for this exibility is that appreciably larger sample sizes are required to obtain reliable
nonparametric estimators than for parametric estimators. In this paper, we consider a system of regression equations that can seem unrelated, but actually
are because their errors are correlated. Such a system of equations is called a set of seemingly unrelated regressions, or a SUR model Zellner, 1962. This paper
provides a Bayesian framework for reliably estimating the regression functions in a nonparametric manner, even for moderate sample sizes, by taking advant-
age of the correlation structure in the errors. The most important consequence of this work is to show that if the errors are correlated, better nonparametric
estimators are obtained by taking advantage of this correlation structure com- pared to ignoring the correlation and estimating the equations one at a time.
Specically, we consider the system of m regression equations y
if ixiei, for i1, 2,2, m.
1.1 Here, the superscript denotes that this is the ith of m possible regressions, y
i is
the dependent variable, x
i is a vector of independent variables and f 1,2, f m are functions that require estimating in a nonparametric manner. As in the linear
Gaussian SUR model, the regressions are related through the correlation structure of the Gaussian errors e
i. That is,
e N0, R?In,
1.2
where e e
1, e2,2, em, ei is the vector of errors for the n observations of the
i th regression and R is a positive-denite m
]m matrix that also requires estimation. This paper provides a data-driven procedure for estimating the
unknown functions f i for i1,2, m and covariance matrix R in this model.
Such systems of regressions are frequently used in econometric, nancial and sociological modeling because taking into account the correlation structure in
the errors results in more ecient estimates than ignoring the correlation and estimating the equations one at a time. Most of the literature on estimating
a system of equations assumes that the f i are linear functions. For recent
examples, see Bartels et al. 1996, Min and Zellner 1993 and Mandy and Martins-Filho 1993. However, in practice the functional forms of the f
i in many regression applications are unknown a priori, so that an approach that
estimates their form is preferable. We examine two such cases here. The rst concerns print advertisements in a womens magazine and estimates the rela-
tionship between three measures of advertising exposure and the physical positioning of advertisements in the magazine. The second involves estimating
an intra-day model for average electricity load in two adjacent Australian states. In this example, we estimate the daily and weekly periodic components of load,
258 M. Smith, R. Kohn
Journal of Econometrics 98 2000 257}281
along with a temperature eect. In both examples, signicant nonlinear relation- ships are identied that are dicult to discern using a parametric SUR ap-
proach. In addition, substantial correlation is estimated between the regressions and the function estimates dier substantially from those obtained by estimating
each of the nonparametric regressions separately and ignoring the correlation between the equations.
Our approach for estimating the system of equations dened at 1.1 and 1.2 models each of the functions f
i as a linear combination of basis terms. We develop a Bayesian hierarchical model to explicitly parameterize the possibility
that these terms may be superuous and have corresponding coecient values that are exactly zero. A wide variety of bases can be used, including many with
a desired structure, such as periodicity or additivity, a point which is demon- strated in the empirical examples. The unknown regression functions are esti-
mated by their posterior means which attach the proper posterior probability to each subset of the basis elements, providing a nonparametric estimate that is
both exible and smooth. We develop a Markov chain Monte Carlo MCMC sampling scheme to calculate the posterior means because direct evaluation is
intractable. This sampling scheme is a correction of the focused sampler discussed in Wong et al. 1997 and our empirical work shows it to be reliable
and much more ecient than the Gibbs sampling alternative. We prove that the iterates of the focused sampler converge to the correct posterior distribution.
The performance of the new estimator is investigated empirically with a set of simulation experiments that cover a range of potential regression curves. These
demonstrate the improvement that can be obtained by exploiting the correla- tion structure in a system of regressions. We note that the solution to the
nonparametric SUR model discussed in this paper is easily extended to other nonparametric multivariate or vector regression models.
Zellner 1962, 1963 provides the seminal analysis of a system of regressions when the unknown functions f
i are assumed linear in the coecients. Srivastava and Giles 1987 summarize much of the literature dealing with this linear SUR
model. However, recent advances in Markov chain Monte Carlo methods enable Bayesian analyses of more complex variations of the SUR model. For
example, Chib and Greenberg 1995a develop sampling schemes that estimate a hierarchical linear SUR model with rst-order vector autoregressive or vector
moving average errors and extend the analysis to a time varying parameter model. Markov chain Monte Carlo methods also provide a solution to estima-
ting reliably nonparametric regressions in a variety of hitherto dicult situ- ations. For example, Smith and Kohn 1996 develop nonparametric regression
estimators for regression models where a data transformation may be required andor outliers may exist in the data. Yee and Wild 1996 use smoothing
splines to estimate a system of equations in a nonparametric manner, but they do not have data-driven estimators for the smoothing parameters. In the
example in their paper they use values of the smoothing parameters based on the
M. Smith, R. Kohn Journal of Econometrics 98 2000 257}281
259
independent variables, but not the dependent variable. Such an approach is an unsatisfactory way of estimating the smoothing parameters because it does not
take into account the curvature exhibited by the dependent variable. Nor is it fully automatic because the eective degrees of freedom is required as an input
from the user.
The paper is organized as follows. Section 2.1 discusses how to model the unknown functions and why they are estimated using a hierarchical model. The
rest of Section 2 introduces the Bayesian hierarchical SUR model and develops an ecient MCMC sampling scheme to enable its estimation. Section 3 uses the
methodology to t the print advertising and electricity load datasets. Section 4 contains simulation examples which investigate the improvements that can be
made using this estimation procedure over a series of separate nonparametric regressions. Appendix A provides the conditional posterior distributions em-
ployed in the sampling scheme, while Appendix B proves that the focused sampling step provides an iterate from the correct invariant distribution.
2. Methodology 2.1. Basis representation of functions
Each regression function is modeled as a linear combination of basis func- tions, so that for a function f,
f x
p +
i1
bibix.
2.1 Here, B
Mb1,2,bpN is a basis of p functions, while the bis are regression parameters.
A large number of authors use such linear decompositions with a variety of univariate and higher dimensional bases in the single equation case. For
example, Friedman 1991, Smith and Kohn 1996 and Denison et al. 1998 use regression splines, Luo and Wahba 1997 use several reproducing kernel bases
and Wahba 1990 uses natural splines. In particular, orthonormal bases, such as wavelet Donoho and Johnstone, 1994 or Fourier bases have been used.
However, the computational advantage provided by such orthonormal bases does not easily extend to the case where the errors are correlated, such as in the
SUR model. In the case of multiple regressors in an equation, additive models of univariate bases or radial bases Powell, 1987; Holmes and Mallick, 1998 can be
used.
Given a choice of a particular basis for the approximation at 2.1, the ith regression at 1.1 can be written as the linear model
y iXibiei.
2.2
260 M. Smith, R. Kohn
Journal of Econometrics 98 2000 257}281
Here, y
i is the vector of the n observations of the dependent variable, the design matrix X
i[b
1
Db2D2Dbp
i
], bj is a vector of the values of the basis function
bj evaluated at the n observations and b are the regression coecients. The
errors e
i are correlated with those from the other regressions, as specied in 1.2, and we denote the number of basis terms in the ith equation as p
i. Note that many basis expansions employ p
in basis terms and it is inappropriate to estimate the regression coecients using existing SUR methodology because the
function estimates f K i, i1,2, m, would interpolate the data rather than pro-
duce smooth estimates that account for the existence of noise in the regression. Therefore, we estimate the regression parameters using a Bayesian hierarchical
SUR model described below that explicitly accounts for the possibility that many of these terms may be redundant.
2.2. A Bayesian hierarchical SUR model Consider the ith regression of a linear SUR model given at Eq. 2.2, where the
design matrix X i is n
]pi and the coecient vector bi is of length pi. To account explicitly for the notion that variables in this regression can be redundant, we
introduce a vector of binary indicator variables c ici
1 ,
c i
2 ,2,
c ip
i
. Here, c
ik corresponds to the kth element of the coecient vector of the ith regression, say
b ik, with cik0 if bik0 and cik1 if bikO0. By dropping the redundant
terms with zero coecients, the ith regression can be rewritten, conditional on c i,
as
y
iXic
i
b ic
i
e
i. 2.3
If q ic+p
i
j1 c
ij, then the design matrix Xic
i
is of size n ]qic and bic
i
is a vector of q
ic elements. By stacking the linear models for the m regressions, the SUR model can be
written, conditional on c c
1, c2,2, cm, as
yXcbce. 2.4
Here, y y
1, y2,2, ym, XcdiagX1c