e2,2, em, ei is the vector of errors for the n observations of the

1. Introduction The aim of nonparametric regression is to estimate regression functions without assuming a priori knowledge of their functional forms. The price for this exibility is that appreciably larger sample sizes are required to obtain reliable nonparametric estimators than for parametric estimators. In this paper, we consider a system of regression equations that can seem unrelated, but actually are because their errors are correlated. Such a system of equations is called a set of seemingly unrelated regressions, or a SUR model Zellner, 1962. This paper provides a Bayesian framework for reliably estimating the regression functions in a nonparametric manner, even for moderate sample sizes, by taking advantage of the correlation structure in the errors. The most important consequence of this work is to show that if the errors are correlated, better nonparametric estimators are obtained by taking advantage of this correlation structure com- pared to ignoring the correlation and estimating the equations one at a time. Specically, we consider the system of m regression equations y if ixiei, for i1, 2,2, m. 1.1 Here, the superscript denotes that this is the ith of m possible regressions, y i is the dependent variable, x i is a vector of independent variables and f 1,2, f m are functions that require estimating in a nonparametric manner. As in the linear Gaussian SUR model, the regressions are related through the correlation structure of the Gaussian errors e i. That is, e N0, R?In, 1.2 where e e

1, e2,2, em, ei is the vector of errors for the n observations of the

i th regression and R is a positive-denite m ]m matrix that also requires estimation. This paper provides a data-driven procedure for estimating the unknown functions f i for i1,2, m and covariance matrix R in this model. Such systems of regressions are frequently used in econometric, nancial and sociological modeling because taking into account the correlation structure in the errors results in more ecient estimates than ignoring the correlation and estimating the equations one at a time. Most of the literature on estimating a system of equations assumes that the f i are linear functions. For recent examples, see Bartels et al. 1996, Min and Zellner 1993 and Mandy and Martins-Filho 1993. However, in practice the functional forms of the f i in many regression applications are unknown a priori, so that an approach that estimates their form is preferable. We examine two such cases here. The rst concerns print advertisements in a womens magazine and estimates the rela- tionship between three measures of advertising exposure and the physical positioning of advertisements in the magazine. The second involves estimating an intra-day model for average electricity load in two adjacent Australian states. In this example, we estimate the daily and weekly periodic components of load, 258 M. Smith, R. Kohn Journal of Econometrics 98 2000 257}281 along with a temperature eect. In both examples, signicant nonlinear relation- ships are identied that are dicult to discern using a parametric SUR approach. In addition, substantial correlation is estimated between the regressions and the function estimates dier substantially from those obtained by estimating each of the nonparametric regressions separately and ignoring the correlation between the equations. Our approach for estimating the system of equations dened at 1.1 and 1.2 models each of the functions f i as a linear combination of basis terms. We develop a Bayesian hierarchical model to explicitly parameterize the possibility that these terms may be superuous and have corresponding coecient values that are exactly zero. A wide variety of bases can be used, including many with a desired structure, such as periodicity or additivity, a point which is demon- strated in the empirical examples. The unknown regression functions are estimated by their posterior means which attach the proper posterior probability to each subset of the basis elements, providing a nonparametric estimate that is both exible and smooth. We develop a Markov chain Monte Carlo MCMC sampling scheme to calculate the posterior means because direct evaluation is intractable. This sampling scheme is a correction of the focused sampler discussed in Wong et al. 1997 and our empirical work shows it to be reliable and much more ecient than the Gibbs sampling alternative. We prove that the iterates of the focused sampler converge to the correct posterior distribution. The performance of the new estimator is investigated empirically with a set of simulation experiments that cover a range of potential regression curves. These demonstrate the improvement that can be obtained by exploiting the correlation structure in a system of regressions. We note that the solution to the nonparametric SUR model discussed in this paper is easily extended to other nonparametric multivariate or vector regression models. Zellner 1962, 1963 provides the seminal analysis of a system of regressions when the unknown functions f i are assumed linear in the coecients. Srivastava and Giles 1987 summarize much of the literature dealing with this linear SUR model. However, recent advances in Markov chain Monte Carlo methods enable Bayesian analyses of more complex variations of the SUR model. For example, Chib and Greenberg 1995a develop sampling schemes that estimate a hierarchical linear SUR model with rst-order vector autoregressive or vector moving average errors and extend the analysis to a time varying parameter model. Markov chain Monte Carlo methods also provide a solution to estimating reliably nonparametric regressions in a variety of hitherto dicult situ- ations. For example, Smith and Kohn 1996 develop nonparametric regression estimators for regression models where a data transformation may be required andor outliers may exist in the data. Yee and Wild 1996 use smoothing splines to estimate a system of equations in a nonparametric manner, but they do not have data-driven estimators for the smoothing parameters. In the example in their paper they use values of the smoothing parameters based on the M. Smith, R. Kohn Journal of Econometrics 98 2000 257}281 259 independent variables, but not the dependent variable. Such an approach is an unsatisfactory way of estimating the smoothing parameters because it does not take into account the curvature exhibited by the dependent variable. Nor is it fully automatic because the eective degrees of freedom is required as an input from the user. The paper is organized as follows. Section 2.1 discusses how to model the unknown functions and why they are estimated using a hierarchical model. The rest of Section 2 introduces the Bayesian hierarchical SUR model and develops an ecient MCMC sampling scheme to enable its estimation. Section 3 uses the methodology to t the print advertising and electricity load datasets. Section 4 contains simulation examples which investigate the improvements that can be made using this estimation procedure over a series of separate nonparametric regressions. Appendix A provides the conditional posterior distributions em- ployed in the sampling scheme, while Appendix B proves that the focused sampling step provides an iterate from the correct invariant distribution. 2. Methodology 2.1. Basis representation of functions Each regression function is modeled as a linear combination of basis functions, so that for a function f, f x p + i1 bibix. 2.1 Here, B Mb1,2,bpN is a basis of p functions, while the bis are regression parameters. A large number of authors use such linear decompositions with a variety of univariate and higher dimensional bases in the single equation case. For example, Friedman 1991, Smith and Kohn 1996 and Denison et al. 1998 use regression splines, Luo and Wahba 1997 use several reproducing kernel bases and Wahba 1990 uses natural splines. In particular, orthonormal bases, such as wavelet Donoho and Johnstone, 1994 or Fourier bases have been used. However, the computational advantage provided by such orthonormal bases does not easily extend to the case where the errors are correlated, such as in the SUR model. In the case of multiple regressors in an equation, additive models of univariate bases or radial bases Powell, 1987; Holmes and Mallick, 1998 can be used. Given a choice of a particular basis for the approximation at 2.1, the ith regression at 1.1 can be written as the linear model y iXibiei. 2.2 260 M. Smith, R. Kohn Journal of Econometrics 98 2000 257}281 Here, y i is the vector of the n observations of the dependent variable, the design matrix X i[b 1 Db2D2Dbp i ], bj is a vector of the values of the basis function bj evaluated at the n observations and b are the regression coecients. The errors e i are correlated with those from the other regressions, as specied in 1.2, and we denote the number of basis terms in the ith equation as p i. Note that many basis expansions employ p in basis terms and it is inappropriate to estimate the regression coecients using existing SUR methodology because the function estimates f K i, i1,2, m, would interpolate the data rather than pro- duce smooth estimates that account for the existence of noise in the regression. Therefore, we estimate the regression parameters using a Bayesian hierarchical SUR model described below that explicitly accounts for the possibility that many of these terms may be redundant. 2.2. A Bayesian hierarchical SUR model Consider the ith regression of a linear SUR model given at Eq. 2.2, where the design matrix X i is n ]pi and the coecient vector bi is of length pi. To account explicitly for the notion that variables in this regression can be redundant, we introduce a vector of binary indicator variables c ici 1 , c i 2 ,2, c ip i . Here, c ik corresponds to the kth element of the coecient vector of the ith regression, say b ik, with cik0 if bik0 and cik1 if bikO0. By dropping the redundant terms with zero coecients, the ith regression can be rewritten, conditional on c i, as y iXic i b ic i e i. 2.3 If q ic+p i j1 c ij, then the design matrix Xic i is of size n ]qic and bic i is a vector of q ic elements. By stacking the linear models for the m regressions, the SUR model can be written, conditional on c c 1, c2,2, cm, as yXcbce. 2.4 Here, y y

e2,2, em, ei is the vector of errors for the n observations of the

1, e2,2, em, ei is the vector of errors for the n observations of the

1, y2,2, ym, XcdiagX1c

Parts

Dokumen yang terkait

Otentikasi Pengguna Proxy Server Dengan Menggunakan LDAP Light Weight Directory Access Protocol

UMM Tambah Guru Besar, Pascasarjana UMM Kian Kokoh

Tampilan Implementasi Single Sign-On Berbasis Active Directory Sebagai Basis Data dan Layanan Direktori

Papua - Art Directory

Wiley Active Directory For Dummies 2nd Edition Aug 2008 ISBN 0470287209 pdf

Packt Active Directory Disaster Recovery Expert Guidance On Planning And Implementing Active Directory Disaster Recovery Plans Jun 2008 ISBN 1847193277 pdf

Active Directory Cookbook, 3rd Edition

Making A Directory (using MKDIR or MD)

Active Directory with PowerShell Uma Yellapragada 2015

Directory of Cultural Potency At Maros Pangkep Karst Area South Sulawesi Indonesia

Dukungan

Links

e2,2, em, ei is the vector of errors for the n observations of the

1, e2,2, em, ei is the vector of errors for the n observations of the

1, y2,2, ym, XcdiagX1c

Parts

Dokumen yang terkait

Otentikasi Pengguna Proxy Server Dengan Menggunakan LDAP Light Weight Directory Access Protocol

UMM Tambah Guru Besar, Pascasarjana UMM Kian Kokoh

Tampilan Implementasi Single Sign-On Berbasis Active Directory Sebagai Basis Data dan Layanan Direktori

Papua - Art Directory

Wiley Active Directory For Dummies 2nd Edition Aug 2008 ISBN 0470287209 pdf

Packt Active Directory Disaster Recovery Expert Guidance On Planning And Implementing Active Directory Disaster Recovery Plans Jun 2008 ISBN 1847193277 pdf

Active Directory Cookbook, 3rd Edition

Making A Directory (using MKDIR or MD)

Active Directory with PowerShell Uma Yellapragada 2015

Directory of Cultural Potency At Maros Pangkep Karst Area South Sulawesi Indonesia

Dokumen yang Anda mencari sudah siap untuk unduhkan