Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500103288619124

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 13 January 2016, At: 01:06

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Iterative and Recursive Estimation in Structural

Nonadaptive Models

Sergio Pastorello, Valentin Patilea & Eric Renault

To cite this article: Sergio Pastorello, Valentin Patilea & Eric Renault (2003) Iterative and

Recursive Estimation in Structural Nonadaptive Models, Journal of Business & Economic Statistics, 21:4, 449-509, DOI: 10.1198/073500103288619124

To link to this article: http://dx.doi.org/10.1198/073500103288619124

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 71

(2)

Iterative and Recursive Estimation in

Structural Nonadaptive Models

Sergio P

ASTORELLO

Dipartimento di Scienze Economiche, Università di Bologna, 40126 Bologna,Italy

Valentin P

ATILEA

LEO, Université d’Orléans, 3100 Tours, France

Eric R

ENAULT

CIRANO and CIREQ, Université de Montréal, Montréal, QC, H3C 3J7 Canada (Eric.Renault@UMontreal.CA) An inference method, called latent backtting, is proposed. This method appears well suited for

econo-metric models where the structural relationships of interest dene the observed endogenous variables as a known function of unobserved state variables and unknown parameters. This nonlinear state-space spec-ication paves the way for iterative or recursive EM-like strategies. In the E steps, the state variables are forecasted given the observations and a value of the parameters. In the M steps, these forecasts are used to deduce estimators of the unknown parameters from the statistical model of latent variables. The proposed iterative/recursive estimation is particularly useful for latent regression models and for dynamic equilib-rium models involving latent state variables. Practical implementation issues are discussed through the example of term structure models of interest rates.

1. INTRODUCTION

The goal of this article is to provide an unied theory for a va-riety of estimation algorithms that can be used as part of a like-lihood analysis of a nonlinear state-space model or as part of a conditional moment-based inference in a structural economet-ric model. Structural econometeconomet-ric models are challenging be-cause the quantities of interest are not directly related to sample moments of the observations. Typically, these quantities may be not only some unknown parameters, but also latent Markov states that enter the observation in a nonanalytical form, as the solution to differential equations.

Although this econometric setting arises in many empirical applications, we consider as a leading example for this arti-cle the econometrics of nancial market models. Beginning with the generalized method of moments (GMM) as applied by Hansen and Singleton (1982), econometric analysis of asset pricing models is dedicated mainly to an inversion problem: re-covering the pricing kernel or stochastic discount factor (SDF) from the observed asset prices.

Empirical asset pricing theory takes as given a set ofJ dis-cretely observed asset pricesYt .Yt2RJ/and tries to recover the SDF mtC1, which relates them to the vectorGtC1of their payoffs by the asset pricing model,

YtDE[mtC1¢GtC1jIt]; (1.1) whereItis the relevant conditioning information at timet. Note that (1.1) implicitly assumes that prices and payoffs are ob-served at equally spaced intervals, although this assumption is not important.

In the simplest cases, the asset pricing model (1.1) is fully specied by the denition of the joint conditional probability distribution of (mtC1;GtC1) given the observed values at timet of some state variables (which summarize the relevant condi-tioning informationIt) and the value of a vector¸ofqunknown parameters. The two most famous examples of such asset pric-ing models are the Sharpe–Lintner (CAPM) (Sharpe 1964;

Lintner 1965) and the Black–Scholes–Merton option pricing model (Black and Scholes 1973).

It has since been widely documented that both the standard CAPM and Black–Scholes (BS) models generate more often than not empirically untenable results. Typically, there is no hope that a xed volatility parameter in the BS option pric-ing formula or some xed shares of the wealth invested in the CAPM market portfolio characterize the SDF in a satisfactory way. This has been acknowledged since the mid-1970s through both the Roll critique and the practitioners’ rather bizarre uses of the BS formula. First, the Roll critique pointed out that the wealth portfolio needed for CAPM pricing might not be ob-servable. Second, practitioners have considered the BS pric-ing formula as a measurement equation to recover an implied volatility parameter. In complete contradiction to the original BS model, this BS implied volatility is seen as a stochastic and time-varying summary of the relevant information conveyed by option prices. Similarly, the Roll critique, revisited as the Hansen–Richard critique (see Cochrane 2001), stresses that the conditioning information of agents may not be observable, and that one cannot omit it for inference regarding asset pricing models.

This observation does not invalidate the methodology of sta-tistical inference in the context of structural models of nan-cial markets equilibrium. On the contrary, when one imagines that agents on the market observe some state variables that are not those that an econometrician can measure directly but the values of which are precisely incorporated in the observed as-set prices, the addition of state variables is an elegant way of reconciling the equilibrium paradigm and the statistical data on market prices. Althougth the use of the BS option pricing formula will surely lead to a logical contradiction (i.e., dif-ferent values of a volatility parameter are used, whereas its

(3)

constancy is a maintained assumption), the introduction of a convenient number of state variables will avoid this logical contradiction. This strategy of structural econometric model-ing of option pricmodel-ing errors (see Renault 1997 for a survey) maintains the no-arbitrage principle, whereas even very small pricing errors would open the door (up to transaction costs) for huge arbitrage opportunities.

The resulting statistical specication of the economic mo-del (1.1) appears as a relationship,

YtDg[Yt¤; ¸]; (1.2)

wheregis a given function completely dened by the economic model, possibly in a nonanalytic form, and Y_t¤ is a vector of possibly latent state variables that summarize the condition-ing information,It, needed to compute the conditional expec-tation (1.1) when the relevant distributional characteristics are specied up to a vector,¸, of unknown parameters. Typically, the stochastic process (m;G)_{D f}mt_C1;Gt_C1gT_t_D₁will be seen as a function of the path over the lifetime of the asset of the pos-sibly continuous-time stochastic processY¤ of state variables, the probability distribution of which is described by a vectorµ

ofpunknown parameters. Then, the joint denition of the SDF and the vector of payoffs will characterize the vector¸of “pric-ing parameters” as a given function of the vectorµof statistical parameters,

¸_D¸.µ /; µ₂2_½_Rp: (1.3) This “data augmentation” modeling strategy is often consid-ered a difcult challenge for empirical nance and has gener-ated a variety of new simulation-based minimum chi-squared methods (surveyed in Tauchen 1997). However, these meth-ods often use a fully parametric model as a simulator, whereas neither the parametric efciency nor the semiparametric ro-bustness is guaranteed (see Dridi and Renault 2001 for a dis-cussion). Nonetheless, the Bayesian literature, in particular the explosion of papers in the area of Markov chain Monte Carlo (MCMC), over the past 10 years told us that data augmentation should not amount to “difculty augmentation.”

As nicely summarized by Tanner (1996, chap. 4), “all these data augmentation algorithms share a common approach to problems: rather than performing a complicated maximization or simulation, one augments the observed data with ‘stuff’ (la-tent data) that simplies the calculation and subsequently per-forms a series of simple maximizations or simulations. This ‘stuff’ can be the ‘missing’ data or parameter values.” Typi-cally, in our case, this will include both the data augmentation from Y to Y¤ and the parameter augmentation from ¸ to µ. And, instead of increasing the difculty, it will allow for sim-pler maximization-based estimation procedures (M estimation or minimum distance) as applied to latent data.

The basic idea of this article can be described by analogy with the MCMC methodology. Consider the following algo-rithm: Given an initial estimatorµ.1/ of the parameters of in-terest, compute a proxyY_t¤.1/ of the latent variablesY_t¤ from the structural relationships:YtDg[Yt¤.1/; ¸.µ.1//],tD1; : : : ;T. This is the analog of the MCMC step of drawing latent vari-ables given the observvari-ables and an initial draw of the para-meters. Then using the simplicity of the latent model enables one to compute an estimatorµ.2/ of the parameters from the

“draw”Y_t¤.1/,t_D1; : : : ;T, of the latent data. This is the ana-log of the posterior mean (or a random draw) of the augmented posterior distribution that has been precisely used to obtain sim-plicity. Typically, the latent statistical model that characterizes the stochastic latent data generator is much simpler than the statistical model that characterizes the law of motion of the ob-served processY. Continuingin this fashion, the algorithm gen-erates a sequence of random variables [Yt¤.p/; µ.p/], which are consistent with both the observed data and the structural model, YtDg[Yt¤.p/; ¸.µ.p//]; tD1; : : : ;T: (1.4) Moreover, in the same way that a xed-point argument im-plies the convergence of the Markov chain produced by a MCMC algorithm, a xed-point argument is going to ensure that the limit [Y_t¤.1/; µ.1/]_Dlimp_!1[Yt¤.p/; µ.p/] of the fore-going algorithm will exist. Thenµ.1/will be a consistent esti-mator (for a sample sizeTgoing to innity) of the true valueµ0

of the parameters, whereasY_t¤.1/consistently “estimates” the best proxy ofY_t¤ that one could deduce from the exact pricing relationship,

YtDg[Yt¤; ¸.µ 0_/

]: (1.5)

We will show that it is even not necessary to iterate the fore-going algorithm to innity. A numberp.T/of iterations going to innity withTat a sufcient rate will ensure the consistency and root-T normality of the resulting estimators.

Therefore, although in a recent survey of MCMC methods for nancial econometrics, Johannes and Polson (Johannes and Polson 2001) claimed that “in contrast to all other estimation methods, MCMC is a unied estimation procedure, estimating both parameters and state variables simultaneously,” the iter-ative methodology put forward in this article can be seen as the frequentist analog of MCMC and shares most of its advan-tages. Among the three main advantages of MCMC methods highlighted by Johannes and Polson (2001), our iterative proce-dure shares two of them. Besides the aforementioned simulta-neous estimation of parameters and state variables, it is also true that it “exclusively uses conditional simulation, and it therefore avoids numerical maximization and long unconditional state variable simulation. Because of this,: : :estimation is typically extremely fast in terms of computing time.” Of course, when one reads “conditional simulation,” one should understand in the context of classic statistics “estimation” ofY_t¤andµ0, irre-spective of the fact that this estimation is simulation-based or not. The important point, particulaly for computational time, is that it is “conditional”; that is, it takes advantage of a previ-ous estimator of the parameters or of the state variables to work directly with the “stuff” (latent data) that simplies the calcula-tion.

Of course the third advantage of MCMC methods—namely, to “deliver exact nite sample inference”—is not shared by our frequentist methodology, which replaces the Bayesian par-adigm by an asymptotic one. However, the alleged exact nite-sample performance of the Bayesian approach rests on the validity of the specication of a prior, whereas our asymptotic approach may be much less demanding in terms of parametric specication. The likelihood-based inference the closest to our methodology is the EM algorithm (Dempster, Laird, and Rubin 1977). In some particular cases, our approach can be interpreted

(4)

as an EM algorithm. However, it is more general for at least two reasons. First, it is not limited to a likelihood framework and can be useful in a number of conditional moment-based inference procedures as are popular in modern semiparametric structural econometrics. The analog of the M step of the EM algorithm (i.e., computation of the “estimator”µ.pC1/from the “data”Y_t¤.p/;t_D1; : : : ;T/will then be performed by any ex-tremum or minimum-distance principle.

Second, even in its maximum likelihood (ML) type applica-tions, our approach does not resort to EM for the main examples of this article. In these examples, EM theory cannot be applied, because the support of the conditional probability distribution of the latent variables given the observable ones depend on the unknown parameters via the “inversion” of the measurement equation (1.5). The analog of the E step of the EM algorithm (i.e., computation of the “data”Y_t¤.pC1/ from the new estima-torµ.pC1/) will resort more to an “implied-state” approach as popularized in nance by the practitioners’ use of BS implied volatility parameters.

Typically, as explained earlier, it is often the case that a num-ber of state variables have been introduced only to get rid of ob-served pricing errors with respect to the theoretical asset pric-ing model. In this case, the measurement equation (1.5) will provide, for a given value of the parameters, a one-to-one rela-tionship between observed prices or returnsYtand latent state variablesY_t¤. Then implied statesY_t¤ can simply be recovered from observations by

YtDg[Yt¤; ¸] () Yt¤Dg¡1[Yt; ¸]: (1.6) Of course, this does not make the inference issue as trivial as it may appear at rst sight, because implied states can be recovered only for a given value of the unknown parameters. Even in the simplest case of a nonlinear state-space model de-ned by the measurement equation (1.5) [conformable to (1.6) with¸_D¸.µ /] jointly with a transition equation that species the dynamics of the latent processY¤as a parametric model in-dexed by the vectorµ of unknown parameters, it turns out that the identication and efciency issues for asymptotic statistical inference aboutµ have been largely ignored by the literature until now.

In the context of multifactor equilibrium models of the term structure of interest rates, Chen and Scott (1993), Pearson and Sun (1994), and Duan (1994) were the rst to propose max-imum likelihood estimation using price data of the derivative contract. These articles stressed in particular that because ob-served data are the transformed values of some underlying state variables, and “since the transformations in nance generally involve unknown parameters and the inverse transformations may not have analytical solutions, the likelihood functions can be difcult to derive” (Duan 1994) and in particular “it will be better to avoid the direct computation of the Jacobian for the inverse transformation” (Duan 1994).

Afne-term structure models (Dufe and Kan 1996; Dai and Singleton 2000) are very popular nowadays, precisely because the bond prices can be calculated quickly as a function of state variables solving a system of ordinary differential equations. It is usually argued that, given the closed-form expressions for bond prices, one can invert anynbond prices into then state

variables and use the implied state variables in the estimation as if they were directly observable.

This common belief is actually wrong on theoretical grounds. The key point is that, because transformations in nance gen-erally involve unknown parameters, these unknown parameters will appear in the likelihood function, not only through the para-metric model of the latent state variables, but also as inputs of the inverse transformation and its Jacobian. Besides the afore-mentioned computational difculties, the main consequence of that is a modication of the statistical information, as measured by the Fisher information matrix in the likelihood setting.

The rst contribution of this article is a theoretical study of the difference between the statistical information conveyed by the state variables and that conveyed by observed data. This study will be carried out not only in the fully parametric set-ting (implied states maximum likelihood and Fisher informa-tion matrix), but also in more general semiparametric contexts dened either by a set of conditional moment restrictions (im-plied states GMM and semiparametric efciency bound) or by some quasi-maximum likelihood or any extremum estimation (robustied Fisher information matrix).

Besides its theoretical drawbacks, the aforementioned confu-sion between infeasible efcient estimation with hypothetically observed state variables (hereafter “latent” estimation) and ac-tual “implied-states” estimation is misleading for the denition of well-suited estimation algorithms. The other contribution of this article is various algorithmic estimation procedures, rst with an iterative viewpoint, and second with a recursive ap-proach. A general asymptotic theory of the proposed estimators is developed.

As far as iterative estimation is concerned, the asymptotic properties of our EM-type estimator cannot be properly as-sessed without a clear characterization of its two main ingredi-ents: asymptotic probability distribution of the infeasible latent estimator on the one hand, and asymptotic contracting feature of the mappingµNTthat generates the sequences of random vari-ablesµ.p/on the other hand:

µ.pC1/_{D N}µT.µ.p//; pD1; : : : ;p.T/: (1.7) The general asymptotic theory of the estimator µp.T/ posed in this article nests several estimators previously pro-posed in the literature. Renault and Touzi (1996) considered, for the purpose of option pricing with stochastic volatility, the case of ML withp.T/_{D 1}. Renault (1997) sketched the case of more general extremum estimators and other elds of ap-plications. Pan (2002) focused on an implied states method of moments that is implicitly considered withp.T/_{D 1}.

Moreover, the structural econometric model of interest, (1.5), may be not only any asset pricing model, but also other equilib-rium models, as produced in particular by game theory. Florens, Protopopescu, and Richard (2001) considered the case of auc-tion markets. Our estimator also nests the iterative least squares (ILS) estimator as proposed by Gouriéroux, Monfort, Renault, and Trognon (1987) and studied extensively by Dominitz and Sherman (2001) in the context of semiparametric binary choice models. We show that also in this latter context, it is worth dis-entangling the semiparametric efciency concepts associated with latent estimation and implied-states estimation.

(5)

It is often the case that the ltering of the latent variables (i.e., computation of the implied states from a given value of the parameters) involves computationallyintensive calculations and/or simulations like in the E step of simulated (SEM) algo-rithms (see, e.g., Ruud 1991). Therefore, we also propose some recursive procedures that, contrary to the iterative ones, involve not the batch processing of the full block of data at each step of the algorithm, but rather only a simple updating of the estimator each time that new data arrive. Moreover, the updating scheme does not imply an optimization procedure (e.g., Young 1985; Kuan and White 1994a, b; Kushner and Yin 1997). Generally speaking, the advantages of a recursive approach are twofold. First, as long as the guess on the parameters is far from the true value, the updating of the guess may be performed using only a small part of the sample, which allows much faster nonlinear ltering procedures. Once the sequence of recursive guesses stabilizes, one may use these values as starting values for an iterative algorithm. Second, the quick and simple updating for-mula provided by the recursive approach is well suited for on-line estimation.

The price to pay for time-savings through recursive pro-cedures that are directly focused on the implied-states latent moment conditions is, in general, some efciency loss with respect to iterative procedures. We also characterize this ef-ciency loss in terms of the asymptotic contracting feature of the mapping,µNT. This efciency loss should be much smaller than the one produced by less-specic recursive procedures, like the Kalman lter, which do not take advantage of the exact non-linear structure of the model and introduce excessive error. For the simplest examples of afne-term structure models, we put forward some Monte Carlo results that demonstrate the bet-ter computational and statistical performance of our approach in comparison with traditional Kalman lter ones (Duan 1994; De Jong 2000). The superior performance of the exact implied-states recursive procedures proposed here should be even more striking in more nonlinear asset pricing models where the Kalman lter no longer admits any theoretical justication.

The article is organized as follows. In Section 2 we study the general identication problem for our implied-states approach and compare it with some classical issues of nonadaptivity in econometrics. The regression example allows us to interpret our iterative approach as an extension of classical backtting. The identication condition is further interpreted in the framework of latent regression models and ILS through generalized resid-uals. In Section 3 we specialize the implied states approach to the case (1.6) where the observed variables are one-to-one func-tions of the latent state variables. We show how our approach can apply to GMM and likelihood-typecriteria. We devote Sec-tion 4 to asymptotic theory of the iterative estimators and ex-tend this theory in Section 5 to a recursive version of the same algorithms. The efciency loss of the recursive implied-states estimator with respect to the iterative one is also analyzed. The practical issues associated with the implementation of the vari-ous competing approaches are sketched in Section 6 through a short empirical illustration in the context of a model of the term structure of interest rates. Miscellaneous proposals for further empirical research and concluding remarks appear in Section 7. Technical proofs are gathered in the Appendix.

2. IDENTIFICATION AND NONADAPTIVITY In this section we motivate our iterative approach as a natural way to address an issue of nonadaptivity,that is, the occurrence of some nuisance parameters that prevent one from directly ob-taining a consistent estimator of the parameters of interest. The price to pay for the proposed iterative strategy is that the re-quired identication condition may be stronger than the usual one, although we argue that this strengthening is rather weak.

To support this claim, we consider some benchmark exam-ples of nonadaptivity in econometrics and show that our itera-tive strategy nests some already well-documented procedures. The rst example is the partially parametric regression model, in which only some parameters of the parametric part are of in-terest. Our iterative estimation procedure is then tantamount to standard backtting, and the required identication condition appears to be, at least locally, no stronger than the identication condition of the regression model.

The second example is ILS through generalized residuals of a latent regression model. In this case, the iterative procedure can be interpreted as an extension of standard backtting, which we term “latent backtting.” The relevant identication conditions have already been extensively characterized by Dominitz and Sherman (2001), and we just revisit their argument to illustrate that the necessary identication condition is not much stronger than the usual one.

2.1 The General Framework

The focus of interest is a class of inference problems that can be described via the more general issue of nonadaptivity. We consider a semiparametric statistical model specied by a fam-ily,P, of probability measures,P, on the sampling space and two mappings,µ ._¢/:P_!2_½_Rpand¸._¢/:P_!0_½_Rq:The vectorµ_Dµ .P/contains thepparameters representing the fea-tures of interest of the probabilityP, which has hypothetically governed the sampling of the observations.The vector¸_D¸.P/

containsq nuisance parameters. The model is assumed to be well specied in the sense that there exists a true unknown probability,P0₂P, which denes the data-generating process (DGP), andµ0_Dµ .P0/and¸0_D¸.P0/are the corresponding true unknown values of the parameters of interest and of the nuisance parameters.

We are interested in an extremum estimator ofµ deduced from a sample-based criterion (objective function)

QT[µ ; ¸]DQT

µ ; ¸;_fYtg1_·t·T

;

which depends on vectorsµ ₂2 and¸₂0. We consider a rst set of assumptions similar to those usually imposed for ex-tremum estimation in the presence of nuisance parameters (see, e.g., Wooldridge 1994, sec. 4).

Assumption 1. a. For anyT _¸1, QT[¢;¢] satises the stan-dard measurability and continuity conditions; that is, it is mea-surable as function of observations and it is continuous as a function of parameters.µ ; ¸/.

b. There exists a limit criterionQ₁[_¢;_¢]_DQ₁_;_P0[¢;¢] such that

p lim

T!1QT[µ ; ¸]DQ1[µ ; ¸]; 8.µ ; ¸/22£0:

(6)

The specicity of the general framework considered in this article is the mode of occurrence of the nuisance parameters¸

into the objective functionQT. As noted in Section 1,µ is asso-ciated with the probability distribution of some state variables processY¤, whereas¸appears in the measurement equation

YtDg.Yt¤; ¸/: (2.1) In other words,QT[µ ; ¸0] will typically be an objective function provided by some standard latent extremum estimation princi-ple, whereas¸appears due to the need to recover implied states from the observations_fYtg1·t·T and the measurement equa-tion (2.1).

Because this measurement equation is dened by an equi-librium model stemming from, for example, option pricing or game theory, it is itself tightly related to the DGP of the state variables. Therefore, the true unknown value ¸0 of the nui-sance parameters is a function of the true unknown valueµ0

of the parameters of interest:To highlight this point, we write

¸.P/_D¸.µ .P//. Note that¸.P/may also depend on some other nuisance parameters insofar as we get a consistent estimator of them. The important point is that there generally is no hope of obtaining a preliminary consistent estimator of¸0 to be able to deduce a consistent estimator ofµ0, because the former is a function of the latter.

In other words, we consider that there is no reason to assume that an extremum estimatorµNT.¸/built as

µT.¸/Darg max

µ QT[µ ; ¸] (2.2) from an arbitrarily xed value¸of the nuisance parameters will provide a consistent estimation of the true valueµ0of the para-meters of interest. Typically, onlyµNT.¸¤/computed from some particular value¸¤ of the nuisance parameters will provide a consistent estimator. This is the general issue of nonadaptivity. Assumption 2. a. For any¸₂0, the functionµ _!Q₁[µ ; ¸] admits a unique maximizerµ .N P0; ¸/;whereP0is the probabil-ity measure governing the observations.

b. There exists some¸¤₂0such thatµ .N P0; ¸¤/_Dµ0. Perhaps the best-known example of nonadaptivity in econo-metrics is the linear regression model with two sets of explana-tory variables, which we now consider for illustration purposes. With a slight change of notation to introduce exogenous vari-ables, let.Yt;Xt/;tD1; : : : ;T, be iid random vectors such that Yt2R;XtD.X₁0_t;X₂0_t/02Rp£Rqand

8 > > > < > > > :

EP.XtX0t/is a nonsingular matrix

EP[Yt¡X10tµ .P/¡X20t¸.P/jX1t;X2t]D0 µ .P/_D2_DRp and ¸.P/_D0_DRq;

(2.3)

whereEPdenotes the expectation with respect to the probabil-ityP_I in particular,E_P0 DE0 stands for the expectation with respect to the DGP.

The ordinary least squares principle leads to the maximiza-tion criterion

Q₁[µ ; ¸]_{D ¡}E0.Yt¡X10tµ¡X20t¸/

2_: _(2.4)

Therefore, the set of values ¸¤ of the nuisance parameters ¸

such that

µ .P0; ¸¤/_Dµ0

is characterized by a translation of the kernel space of the matrix E0.X1tX0₂_t/,

¸¤_¡¸0₂KerE0.X1tX20t/:

Of course, we are faced with the nonadaptivity problem when the kernel subspace does not coincide with the whole spaceRq. Therefore, except for specic choices of the function¸._¢/, we have

µ0_6Darg max

µ Q1[µ ; ¸.µ /]: (2.5) In the simple linear model (2.3), the Frisch–Waugh theorem (see Frisch and Waugh 1933; Davidson and MacKinnon 1993, p. 19) provides a particular function,¸0._¢/, to avoid the “prob-lem” (2.5). Indeed, if we dene¸0._¢/by

E0[X2tX20t¸ 0_{.µ /}_]

DE0[X2t.Yt¡X01tµ /]; (2.6) then

µ0_Darg max

µ Q1[µ ; ¸

0_{.µ /}_]_: _(2.7)

In practice, it may happen that the function¸0._¢/is unknown but can be consistently estimated by, say,¸T.¢/:In the Frisch– Waugh example,

¸T.µ /D.X02X2/¡1X02[Y¡X1µ]

is the empirical counterpart of (2.6);Y,X1, andX2are the ma-trices corresponding to the rstT observations of the variables Y,X1, andX2. The nite-sample criterion associated with (2.4), where¸is replaced by the function¸T, equals

QT[µ ; ¸T.µ1/]D ¡T¡1 T

i_D1

[Yt¡X10tµ¡X20t¸T.µ1/]2 D ¡T¡1®®.Id_¡_P_X₂/_Y_¡_X1µCPX2X1µ

1®_®2

;

wherePX2DX2.X20X2/¡1X02is the orthogonal projection ma-trix onto the subspace ofRT spanned by the columns ofX2:

The focus of interest of this article is a class of structural econometric models where only a function¸._¢/such that

µ0_Darg max

µ Q1[µ ; ¸.µ

0_/_] _(2.8)

is known or is available by estimation, but (2.5) may hap-pen. Therefore, by analogy with standard backtting, natural estimation strategies based on the criterionQT should distin-guish the two occurrences ofµ. The criterion is then written as QT[µ ; ¸T.µ1/], and the estimations are dened through iterative steps,

µ.pC1/_Darg max µ QT

µ ; ¸.µ.p//¤; p_D1;2; : : : : (2.9) Note that for the sake of notational simplicity, we always letQT[µ ; ¸.µ1/] denote the nite-sample criterion, whereas in some cases,¸.µ1/should be replaced by a consistent estimator

¸T.µ1/:In other words, by a slight abuse, all of the data depen-dence is summarized by the notationQT:

(7)

It is important to keep in mind in this respect that all of the it-erative/recursive estimation strategies considered in this article do not change when the criterionQT is replaced by QQT con-formable to

µ ; ¸.µ.p//¤_DQT

µ ; ¸.µ.p//¤_C’T.µ.p//;

for an arbitrary numerical function’T.¢/dened on2. Thus we are denitely unable to check a condition like (2.7).

Moreover, this “limited information” use of the criterionQT requires that we strengthen the standard identication con-dition. Because by (2.8) we have in mind an infeasible ex-tremum estimator that would be dened from the criterion QT[µ ; ¸.µ0/], a basic identication condition to be imposed should be thatµ0is the unique maximizer of the limit criterion

µ_!Q₁[µ ; ¸.µ0/]; that is,

µ[P0; ¸.µ0/]_Dµ0: (2.10) This condition would be quite similar to the usual identica-tion condiidentica-tion that would have been imposed if the limit cri-terion were considered as a function.µ ; ¸/_!Q₁[µ ; ¸] and if

¸were nuisance parameters for which a consistent estimator is available (see, e.g., Wooldridge 1994, thm. 4.3). But, given the nonadaptivity problem and the need to build an estimation pro-cedure from the steps in (2.9), we need to maintain a stronger identication condition that imposes that the only xed point of the functionµN[P0; ¸._¢/] isµ0. Otherwise, it may happen that the statistician will not be able to reject a “bad guess”µ.p/_6Dµ0

used to build the criterionQT[¢; ¸.µ.p//]:This remark leads to the following identication condition required by our estima-tion strategy.

Assumption 3. µ0_Dµ .P0/ is the unique xed point of the mapµN[P0; ¸._¢/]:

Note that this assumption implies thatµ0 is well identied from the observations as the unique xed point of the func-tion µN[P0; ¸._¢/], where µN[P0; ¸] is the unique maximizer of plimQT[¢; ¸].

We now analyze some conditions ensuring Assumption 3. Consider that the function µN[P0; ¸._¢/] is differentiable with continuous partial derivatives. If µN[P0; ¸.µ0/]_Dµ0;then by a Taylor expansion argument we can deduce that (at least locally)

µ0is the unique xed point ofµ1! Nµ[P0; ¸.µ1/] provided that the matrix

Ip¡ @µN @µ1₀[P

0_{; ¸.µ}0_/_]

is invertible. In terms of the rst derivatives of the function

µ[P0; ¸._¢/], this appears to be the minimal condition for ensur-ing the unique xed point property locally. However,µ0will be estimated by iterations, which requires a stronger condition for ensuring the convergence of the estimator.

Suppose that the function µN[P0; ¸._¢/] is known. Then its unique xed point could be approximated through the itera-tions µ.k/_{D N}µ[P0; ¸.µ.k¡1//];k_D1;2; : : :. Note that, in fact, the iterative estimation strategy methodology proposes to re-placeµN[P0; ¸._¢/] with its sample counterpart (2.2) and to per-form the corresponding iterations. A suitable way to ensure the convergence of the iterationsµ.k/_{is to assume that}_µ_N_[P0_{; ¸.}

¢/] is a contracting mapping; that is, there existsc₂[0;1/such that

®µN[P0; ¸.µ0/]_{¡ N}µ[P0; ¸.µ00/]®®_·c_kµ0_¡µ00_k; µ0; µ00₂2 :

When µN[P0; ¸._¢/] admits continuous rst-order derivatives and µ0 is an interior point of 2, the contracting property is tantamount to the condition

® ® ® ®

@µN @µ1₀[P

0_{; ¸.µ}0_/_]

® ® ®

®<1; (2.11)

at least after reducing2. Here and throughout the rest of the article,_kA_k denotes the spectral norm of the matrix Aand is dened by_kA_k2_D½ .A0A/, where ½ .B/ stands for the spec-tral radius of the squared matrixB, that is, its largest eigen-value in absolute eigen-value. Clearly, if .2:11/ holds, then Ip ¡ @µ =@µN 10[P0; ¸.µ0/] is invertible, and thus the unique xed point property is guaranteed locally.

In conclusion, the contracting condition, or its local ver-sion.2:11/, will represent key assumptions for proving conver-gence results for our estimators in a general framework. How-ever, as far as identication is concerned, the relevant condition is the weaker Assumption 3.

We now illustrate the hierarchy of the identication hypotheses—the basic conditionµN[P0; ¸.µ0/]_Dµ0, the unique xed-point property as well as the model identication condition—in the context of the standard backtting algorithm. 2.2 Classical Backtting Revisited

Consider the backtting algorithm identication (see, e.g., Hastie and Tibshirani 1990) in the context of a general partially parametric regression (PPR) model. Such a model is character-ized by a regression function split into two parts: a parametric part, possibly nonlinear, and a nonparametric part (see Andrews 1994a, p. 52). In general, one writes

8 < :

YtDh.X1t; µ /C¸.X2t/Cut; µ22½Rp; ¸2G E.utjXt/D0; XtD.X₁0_t;X0₂_t/0; tD1; : : : ;T;

withh._¢;_¢/known and G an innite-dimensional set of real-valued functions. For the sake of simplicity, in this example we assume that the process_fut;Xt0gis stationary and ergodic. The quantities of interest, namely the true unknown valuesµ0and¸0

of the Euclidean parameterµ and the function¸, are dened as the unique minimizers of the mean squared error, that is, the unique maximizers of

Q₁[µ ; ¸]_{D ¡}E0[Yt¡h.X1t; µ /¡¸.X2t/]2:

The identication assumption expressed by this condition can be made even more precise by using the following decomposi-tion:

E[Yt¡h.X1t; µ /¡¸.X2t/]2

DE[Yt¡h.X1t; µ /¡E.Yt¡h.X1t; µ /jX2t/]2 CE£E¡Yt¡h.X1t; µ /jX2t

¡¸.X2t/

¤2

: (2.12)

Indeed, because for any value of µ, one can dene a func-tion¸._¢/that renders the second term of the right side of.2:12/

equal to 0, the identication condition for the Euclidean para-meter amounts to

h.X1t; µ1/¡E0

h.X1t; µ1/jX2t

Dh.X1t; µ0/¡E0

h.X1t; µ0/jX2t

H) µ1_Dµ0:

(8)

Given a guess¸.1/_T ;a natural way to estimateµ is to consider the empirical counterpartµ_T.2/of

µ .P0; ¸/_Darg max µ22¡E

0_[Y

t¡h.X1t; µ /¡¸.X2t/]2; with ¸ replaced by ¸.1/_T : With this estimate µ_T.2/, one can get an improved guess,¸.2/_T , of the function ¸ by smoothing Yt¡h.X1t; µ_T.2//onX2t:We can continue this iterative smooth-ing process, thus obtainsmooth-ing an example of classical backttsmooth-ing. If it exists, the limit of the backtting algorithm can be explic-itly characterized as solution of the following system of equa-tions:

b¸T DK

Y¡h._X1;bµT/

;

bµT Darg min µ22

t_D1

Yt¡h.X1t; µ /¡b¸T.X2t/

¤2

; (2.13)

where h.X1; µ / D .h.X11; µ /; : : : ;h.X1T; µ //0 and K is the smoothing matrix used to estimate the conditional expectation ofY_¡h.X1; µ /givenX2:Such a xed point may not exist in nite samples, but may exist only asymptotically whenT goes to innity. We address this issue later in our general theory (see Sec. 4) but is omitted here for the sake of expositional simplic-ity. HencebµTis obtained by solving the rst-order conditions

tD1

@ µ.X1t;bµT/

Yt¡h.X1t;bµT/¡b¸T.X2t/

D0;

which corresponds to the limit rst-order conditions E0

@µ.X1t; µ

.p_C1/_/£_Y

t¡h.X1t; µ.pC1//¡¸.p/.X2t/

¤¶

D0;

(2.14) where the function¸.p/._¢/is dened by

¸.p/.X2t/D¸.µ.p//.X2t/DE0

Yt¡h.X1t; µ.p//jX2t

Therefore, the backtting algorithm will possibly provide a consistent estimator ofµ0 insofar as the following identica-tion condiidentica-tion is fullled:

@ µ.X1t; µ

1_/¡_Y

t¡h.X1t; µ1/ ¡E0[Yt¡h.X1t; µ1/jX2t]

¢¶

D0 _H) µ1_Dµ0:

When maintaining the assumption that the rst-order conditions characterize a unique maximizer, the backtting identication condition coincides with the unique xed-point condition of Assumption 3, where the functionµNis dened by

µ[P0; ¸.µ1/]_Darg max µ ¡E

0_[Y

t¡h.X1; µ /¡¸.µ1/.X2t/]2: Using the model equations

YtDh.X1t; µ0/C¸0.X2t/Cut and

E0.utjXt/D0;

we rewrite this backtting identication condition as:

Condition C1 (backtting identication): E0

@ µ.X1t; µ

/¡h.X1t; µ1/¡E0[h.X1t; µ1/jX2t]

¢¶

DE0

@µ.X1t; µ

1_/¡_h_._X

1t; µ0/¡E0[h.X1t; µ0/jX2t]

¢¶

H) µ1_Dµ0:

As already noted, the PPR model identication condition can be written in the following way:

Condition C2 (identication in the PPR model): h.X1t; µ1/¡E0

h.X1t; µ1/jX2t

Dh.X1t; µ0/¡E0

h.X1t; µ0/jX2t

H) µ1_Dµ0:

Finally, we note that the basic identication condition

µ[P0; ¸.µ0/]Dµ0

is tantamount to the following standard identication condition in the “latent” regression model:

Condition C3 (identication in the latent regression model): h.X1t; µ1/Dh.X1t; µ0/ H) µ1Dµ0:

It is clear that

C1 _H) C2 _H) C3:

As previously mentioned, the difculty in PPR models ap-pears when considering nonlinear regression functionsh. Ifh is linear (the case of the partially linear regression model), then the conditions C1 and C2 are equivalent and mean that the residual variance Var0.X1t¡E[X1tjX2t]/is positive denite. C3 means thatE0[X1tX0₁_t] is positive denite.

Let us now study to what extent in the general PPR the back-tting identication condition C1 is more restrictive than the standard identication condition C2. Dene

’¡Xt;Yt; µ ; ¸.µ1/

D@_@µh.X1t; µ /

Yt¡h.X1t; µ /¡E0[Yt¡h.X1t; µ1/jX2t]

and assume that the functionµN[P0; ¸._¢/] can be also dened as the implicit solution of

E0£’¡Xt;Yt;µN[P0; ¸.µ1/]; ¸.µ1/

¢¤

D0; µ1₂2 :

Assuming the needed regularity conditions, differentiate this identity with respect to µ1, take µ1 _D µ0 (recall that

µ[P0; ¸.µ0/]_Dµ0), and deduce that

@µN @µ1₀[P

0_{; ¸.µ}0_/_]

DM¡1N; (2.15)

where

M_{D ¡}E0

@’0 @µ

Xt;Yt; µ ; ¸.µ0/

µDµ0

DE0

@µ.X1t; µ

0_/@h

@µ0.X1t; µ

0_/

(2.16)

(1)

The sample counterpart of ¸.µ / provides a root-T consistent estimator:

¸T.µ /D.X20X2/¡1X20.Y¡X1µ /

whereas the sample counterpart ofQ₁[µ ; ¸.µ1/] is QT[µ ; ¸.µ1/]D ¡

1 T

T X

t_D1

[Yt¡X1t0 µ¡X02t¸.µ 1_/]2_:

It is mentioned just after (2.9): “Note that for the sake of notational simplicity we always denote byQT[µ ; ¸.µ1/] the

-nite sample criterion while, in some cases, ¸.µ1/ should be replaced by a consistent estimator¸T.µ1/. In other words, by

a slight abuse, all the data dependence is summarized by the notationQT.” This abuse of notation is innocuous insofar as

it does not conceal the restrictive content of the assumptions maintained about the asymptotic behavior of the functionQT.

These assumptions are:

Assumption 1(for consistency of the backtting estimator).

Uniform convergence for µ ; µ1₂ 2 of QT[µ ; ¸.µ1/] toward

Q₁[µ ; ¸.µ1/].

This basically means two things: (1) uniform convergencefor allµ ; ¸ of a sample counterpart ofQ₁[µ ; ¸] and (2) uniform convergence for allµ1of¸T.µ1/toward¸.µ1/.

Assumption 5(for deriving the root-Tasymptotic probability

distribution of the backtting estimator). Root-T asymptotic normality of@QT=@µ[µ ; ¸.µ±/]jµDµ±.

This comes from two primitive assumptions: (1) root-T as-ymptotic normality of a sample counterpart of @Q₁=@ µ[µ ; ¸.µ±/]_jµDµ± and (2) root-Tasymptotic normality of the estima-tor¸T.µ±/of¸.µ±/.

Of course, the expression of the matrixB.µ±/must take into account these two sources of randomness whenever¸.µ±/must be estimated. This point can be easily checked in the Frisch– Waugh example.

b. In the classical backtting example, the nonparametric part of the model will introduce asymptotic normality with non-parametric rates that are beyond the scope of the article. This is the reason that Section 4.1 about consistency of the backtting estimator states “for a general parameter space” (see Prop. 4.2), whereas Section 4.2 about asymptotic probability distributions focuses on a parameter space “assumed to be a subset of some Euclidean spaceRp.” In other words, the classical backtting example must be considered for identication and consistency issues, but not for asymptotic probability distributions. It plays a crucial role in the article, however, because it has motivated the backtting terminology.

In this example, when the nuisance part of the regression model is really nonparametric, ¸.µ / is a functional dened by (2.14):

¸.µ /.X2t/DE[Yt¡h.X1t; µ /jX2t]:

Of course, this functional should be estimated with a non-parametric rate of convergence from a smootherK as dened in (2.13):

¸TDK[Y¡h.X1; µT/]:

X. Chen’s words of caution regarding the additional issues that should be addressed for extending the article to functional es-timation are appreciated. The idea of recursive nonparametric learning, following Chen and White (1998), is also appealing.

REPLY TO M. CHERNOV

M. Chernov’s discussion interestingly focuses on the concept of nuisance parameters and gives us the opportunity to clar-ify some notational issues that seem to be responsible for the trouble of some discussants (see also the replies to Q. Dai and J. Pan). Just to give a funny example of the difculty of remain-ing true to constant notation when several different issues are addressed in the same article, we note that M. Chernov starts his discussion with a notation that contradicts that adopted in the article: “Pastorello, Patilea, and Renault (PPR henceforth),” whereas we write in Section 2.2, “a general partially paramet-ric regression model (PPR hereafter)”; X. Chen’s discussion re-mains true to our notation.

More seriously, M. Chernov’s introductory reference to what he calls the “fundamental theorem of asset pricing” [his for-mula (1)] allows him to classify the issues of interest but, unfortunately, with some misleading notation. We actually write in the Introduction that “the joint conditional probability distribution of.mt_C1;gt_C1/given: : :the relevant conditioning

informationIt” involves the “value of a vector¸ofqunknown

parameters” in such a way that the fundamental theorem of as-set pricing:

YtDE[mtC1¢GtC1jIt]

is encapsulated in a pricing relationship: YtDg.Yt¤; ¸/;

where Yt is a vector of “J discretely observed asset prices,”

whereas “Y_t¤is a vector ofpossiblylatent state variables which summarize the conditioninginformation.” Moreover, the proba-bility distribution of the variablesY_t¤is assumed to be described by a vectorµ ofpunknown parameters in such a way that¸ ap-pears to be a given function ofµ:

¸_D¸.µ /: It is important to realize at this stage that:

1. The variablesY_t¤ are “possiblylatent,” but nothing pre-cludes that some of them are observed (as is the case in Sec. 6) to ensure identication.

2. Because the variables of interest include a pricing kernel, nothing precludes that some components of the vectorµ

ofstatisticalparameters are related to preferences, in

par-ticular, to the risk premium. Such preference parameters are actually relevant statistical characteristics of the pric-ing kernel dynamics.

Indeed, this convention is followed in the application of Sec-tion 6 where it is announced, from the beginning, that although “the pricing kernel is not explicitly included in the list of state variables: : :the dynamics of the latent risk factorsXtonly

iden-tify a setµ1 of unknownstatisticalparameters while the risk

premium parametersµ2must be added to dene the complete

(2)

vectorµ1of structural parameters of interest for asset pricing:

µ_D[µ₁0; µ₂0]0.”

Thus, preference (risk premium) parameters are denitely not considered as nuisance parameters, but as part of the vec-torµ of structural parameters of interest. The unfortunate con-fusion of some readers seems to come from the conict between two common but contradictory notational uses:

1. ¸is a standard notation in statistical theory for nuisance parameters. This is the reason that, in the spirit of back-tting, we write Yt Dg.Yt¤; ¸/ and, as a consequence,

QT[µ ; ¸]. Nonadaptivity in statistical estimation is

pro-duced by some perverse interactions between structural parametersµ and nuisance parameters¸, which precludes consistent estimation of the former when no consistent es-timator of the latter is available. Backtting, as advocated in the article, appears to be well suited when nuisance rameters are dened as a known function of structural pa-rameters. In the nancial applications we have in mind, we actually have¸.µ /_Dµ or part ofµ. Therefore, the dis-tinction between parameters of interest and nuisance pa-rameters does not come from their actual value but from their role inside the criterion function used for M estima-tion. In this respect, it is true that “PPR simply merge¸p

and µ,” but it is not true that “this clearly removes the nonadaptivity problem.” We hoped that the introductory examples would help the reader to realize that thesame unknown parameter value could be simultaneously a nui-sanceµ1[e.g., to recoverY_t¤.µ1/in the latent regression example] and an object of interest [e.g., estimated by non-linear least squares ofY_t¤.µ1/onXt].

2. Unfortunately,and we apologizefor this, we have also fol-lowed for the specic term structure example of Section 6 another notational use. This use is also common but more in the nance literature: The risk premium parameter is denoted by¸. This is the reason that we complete the de-nitions (6.15) and (6.16) of the models of interest with the notation:

µ1D.k;c; ¾ /0 and µ2D¸:

We did not realize that this would introduce some confu-sion with the general notation¸._¢/for the function den-ing nuisance parameters because throughout Section 6 this function is nothing but the identity function:

¸.µ /_Dµ_D[µ₁0; µ₂0]0;

as is apparent from the beginning [see (6.1) and (6.2)]. Note that the measurement equation actually depends only on a not one-to-one function ofµdened by the risk-neutral parameters, but there is no need to introduce a spe-cic notation for this; this is implicitly encapsulated in the denition of functionhin (6.1).

We hope that this reply will now make clear that it is not fair to say that “at the implementation stage, PPR are confronted with two problems, which were not addressed in their theoreti-cal developments.”

First, about blurring “the separation between statistical and risk-premium parameters raising the question of whether such

a separation was necessary in the rst place,” the answer is that this separation was simply not made in the rst place.

Second, it is right to say that “the nuisance parameter¸nhas

no explicit dependence onµ” or, more precisely, that there is no adaptivity problem with respect to this parameter, called ´in Section 6 of the article. But the case of such a nuisance parame-ter without an adaptivity problem can easily be accommodated within the general framework. When´ is replaced by a sam-ple counterpart´T, this denes a criterionQT that converges

toward some criterionQ₁. Because there is no adaptivity prob-lem, the resulting estimatorµT of structural parametersµ will

be consistent, irrespective of the value of´₁_DPlim´T. Of

courseQ₁ and in turn the asymptotic variance ofµT will

de-pend on´₁. This is the reason that, insofar there is some hope that the nuisance part of the model is correctly specied with a true unknown value´±, it may make sense to resort to what we call “extended backting.” The idea is to introduce within the backtting algorithm a avor of quasi-generalized least squares to ensure´₁_D´±. But, contrarily to what the discussant sug-gests, we are not worried by the fact that the “convergence cri-terion has poor properties in¸n.” Even if these properties were

good, we would likely prefer extended backtting because there is no reason to include in a backtting algorithm some parame-ters that do not raise any adaptivity issue.

Let us just briey answer two additional interesting questions of M. Chernov’s discussion. “The backtting procedure effec-tively allows to operate on the likelihood of the state variables rather than observables, which allows to avoid computation of the Jacobian” andalso to avoid maximizationwith respect to the value of the implied states. The latter point should not be for-gotten because, as mentioned in the article, this allows us, for instance, to use Aït-Sahalia’s likelihood expansions, whereas they diverge when one considers that implied states depend on unknown parameter values. We acknowledge that the rst point may be a matter of taste [the discussant considers that “once one hasg._¢/, computing Jacobian is not difcult”] but we ob-serve that, for instance, Pan (2003) has also renounced comput-ing some Jacobians needed to dene the optimal instruments. It might be worth remembering that these Jacobians should be involved in a sophisticated optimization procedure (and not just computed once) and that a proxy of the derivative of the func-tiongis much less stable than the proxy of the functiongitself. We do like the discussant’s remark about the continuously updated GMM of Hansen, Heaton, and Yaron (1996). Because it is an excellent illustration of cases without an adaptivity prob-lem, it was actually an introductory example in some trans-parencies used to present this article in various seminars. Even though we nally chose not to include it in the article for length reasons, we are glad to briey summarize it in this reply. Let us consider the following asymptotic criterion function in the context of GMM with standard assumptions:

Q₁[µ ; ¸.µ1/]_{D ¡}E[Ã .Yt; µ /0]W[¸.µ1/]E[Ã .Yt; µ /];

where W[¸.µ1/] is dened as the optimal weighting matrix whenµ1is the true unknown value of the parameters. Then sev-eral questions arise and should not be confused:

Question 1. Is there an adaptivity problem? In other words,

is it the case that the use of a wrong value of µ1 may pre-vent us from getting a consistent estimator of µ? The answer

(3)

is “no.” One way to see this is to check that the cross-derivative @2Q₁=@ µ @ µ1is 0 whenµ _Dµ±irrespective of the value ofµ1.

Question 2. However,µ1is “a nuisance parameter because it

determines the distribution of the estimates.” This is the reason that people use to doiterated GMM(in this respect, a particu-lar case of backtting) to be able to replaceµ1by a consistent estimator of µ±, which in turn ensures getting an asymptoti-cally efcient estimator ofµ. But note that, because there is no adaptivity problem, there is no need to iterate a large number of times as for common backtting; a simpletwo-step GMM esti-mator is asymptotically efcient and asymptotically equivalent to the oracle estimator that would be obtained withµ_Dµ±.

Question 3. Is it legitimate to maximize directly with respect

to µ the objective functionQ₁[µ ; ¸.µ /]? The answer is “yes” and this actually corresponds tocontinuously updated GMMas proposed by Hansen et al. (1996). The reason it works is a prop-erty even stronger than adaptivity:The derivative@Q₁=@ µ1is 0 whenµ_Dµ±irrespective of the value ofµ1.

REPLY TO Q. DAI

Q. Dai presents an interesting decomposition of the like-lihood, which is relevant for understanding the informational content of the different blocks. However, as already explained in our reply to M. Chernov, if3is a “vector of parameters typ-ically associated with the market prices of risk,” it is denitely not to “take a short cut” to consider withour notationthat “there exists a deterministic relationship between3andµ.”

Once more, we consider for all nancial applications that our¸.µ /, corresponding to (1.2)/(1.3) and not only to “market prices of risk,” is such that¸.µ /_Dµ _D.µ1; µ2/, whereµ2, the

vector of preference parameters, corresponds to what Q. Dai de-notes by3. With such notation, and without any “short cut,” it is true that3is a known deterministic function ofµ. We apol-ogize if, by saying in the Introduction that¸ is the vector of “pricing parameters,” we have been responsible for some con-fusion. We used this terminology to stress that the knowledge of¸is needed to compute the current asset prices from the cur-rent value of the state variables. If you think, for instance, of a Heston kind of option pricing formula with stochastic volatil-ity, the option price is a known function of the underlying spot volatility, but this function depends on all the risk-neutral para-meters and not only on the market price of risk.

In the application of Section 6, the identication restriction forµ2 is that some yields are observed without measurement

errors. With respect to Q. Dai’s comment on these identi-cation restrictions (see, in particular, his footnote 2), several remarks are in order. We thank him for stressing that such re-strictions are crucial in identifying some parameters. However, we would not say “zero-mean restriction” but rather “no mea-surement error” assumption, because the meamea-surement errors are in any case with zero mean. If a measurement error might have a nonzero mean, nothing could be done without addi-tional structure. Moreover, it is worth remembering that one can imagine some identifying restrictions more general than the “no measurement error” one. First, one can consider a full vector of measurement errors but with a singular covariance matrix. Sec-ond, even without singularity, the mere fact that measurement

errors are assumed to be i.i.d. paves the way for dynamic iden-tication, as a Kalman lter does. Finally, one can even imag-ine some specic dynamic structure on the measurement errors. This should be an even more favorable situation to enhance the advantages of latent backtting.

REPLY TO G. DURHAM AND J. GEWEKE

We gladly welcome Durham and Geweke’s comments, which, through some neat geometric interpretations, have im-proved our own understanding of the issue of nonadaptivityand of the backtting kind of solution to this issue as well.

Let us rst focus on the specic issue of implied states GMM, corresponding to a criterion function:

QT[µ ; ¸]D ¡ "

1 T

T X tD1

’.Yt; µ ; ¸/ #₀

1 T

T X tD1

’.Yt; µ ; ¸/ #

: When Pan (2002) considered implied states GMM, the uncondi-tional moment restrictions’.Yt; µ ; ¸/were built with “optimal”

instruments and were just identied by construction. Hence, for all practical purposes, we can assume that, for any given¸, the maximum ofQT[µ ; ¸] is the value 0;µT.¸/is dened as a

solu-tion of

1 T

T X

t_D1

’¡Yt; µT.¸/; ¸ ¢

D0:

Whereas the IS-GMM of Pan (2002) estimatesµ by the solution µ_TISof

1 T

T X t_D1

’¡Yt; µTIS; ¸.µTIS/ ¢

D0;

implied states backtting estimatesµ by the limitµTof the

iter-ation:

1 T

T X tD1

’¡Yt; µ.pC1/; ¸.µ.p// ¢

D0:

Therefore, we maintain that “typically, if the functionµT.¸.¢//

admits a unique xed point, the IS-GMM estimator coincides with the limit of iterations.” We do not understand the discus-sants’ point “if that were the case then these estimators would have the same asymptotic variance, but the article demonstrates that these variances are different.” This point may actually be well taken,not for the specic case of IS-GMM but for the general issue of comparison between implied states backtting and the one-step estimator (see the second part of this reply): Arg maxµQT[µ ; ¸.µ /].

In the particular case of ( just identied) IS-GMM, this dis-tinction is irrelevant because, with respect to the discussants’ geometric interpretation, the fact that

8¸; Max

µ QT[µ ; ¸]DQT[µ .¸/; ¸]D0

implies that, for any horizontal line, tangency is obtained with the same level curve. This is not consistent with the case por-trayed in the discussants’ Figure 1.

Although this case has not been studied in Pan’s (2002) ac-count of IS-GMM, the article also sheds some light on the

(4)

case with overidentication.It is explicitly acknowledged,how-ever, that this issue is only addressed under the restrictive As-sumption 4. Owing to this asAs-sumption, it is actually possible to address the discussants’ worries about the comparison of vari-ances between IS-GMM and backtting, respectively. Whereas the former will solve

Max µ ¡

1 T

T X t_D1

’¡Yt; µ ; ¸.µ / ¢#0

1 T

T X

t_D1

’¡Yt; µ ; ¸.µ / ¢#

; the latter will solve

Max µ ¡

1 T

T X tD1

’¡Yt; µ ; ¸.µ.p.T/// ¢#0

¢WT "

1 T

T X

tD1

’¡Yt; µ ; ¸.µ.p.T/// ¢#

: Therefore, it is clear that, even for largeT andp.T/, these two estimators correspond to two different selections of the estimat-ing equations and are not numerically equal in general (in con-trast to the just identied case). However, it is worth noticing that the consequence (3.5) of Assumption 4 in the article pre-cisely states that, asymptotically, the two procedures will select (see the rst-order conditions) the same linear combinations of estimating equations. In other words, the two estimators, IS-GMM and IS-backtting, remain asymptotically equivalent in the overidentied case. We thank the discussants for giving us the opportunity to make this point more explicit than it was in the article.

One way to understand the discussants’ point about the pos-sible difference, even asymptotically, between what they de-noteµ_TISand the backtting estimatorµT, is to imagine that the

discussants use the notationµ_TIS, as well as the terminology im-plied states GMM, for the general estimation principle:

µ_TIS_DArg Max

µ QT[µ ; ¸.µ /]:

By contrast, we have reserved in the article the terminol-ogy “implied states GMM” for Pan’s GMM context, and this is the reason we have claimed that it typically coin-cides with the backtting estimator for an innite number of iterations. But the discussants are certainly right to claim thatµ_TIS_DArg MaxµQT[µ ; ¸.µ /] and the backtting estimator

µTDArg MaxµQT[µ ; ¸.µ.p.T///] are, in general, different. This

is actually the main focus of interest of this article. For instance, in the likelihood context, depending on the way the functionQ is dened,µ_TIS may be either the maximum likelihood estima-tor or, if the Jacobian part is not taken into account, a wrong estimator that does not converge in general toward the true un-known value ofµ. This is the reason that, if we agree to say that (with discussants’ notation) there exist some situations where “differences betweenµ_TISandµT could be extremely great,” we

do not think that this is necessarilyµT that “could be

mislead-ing.”

The discussants’ geometric interpretation illustrates the dif-ference. It also shows, in a way familiar to economists, that the maintained contraction mapping assumption is akin to a com-parison of the slopes of the functions¸.µ /andµ .¸/. To say that the slope of the latter must be larger than the slope of the for-mer is a neat way to explain what we pointed out in the article

as a comment of (4.6): “the occurrence ofµ in the latent model (the transition equation of the state variables) carries more in-formation about the unknown parameters of interest than their occurrence in the measurement equation.”

Of course, when the functions¸.µ / andµ .¸/are nonlinear, the contraction mapping property is only local, and this may create some pitfalls for backtting. It is actually a general is-sue for the method of successive approximations: Its reliability highly depends on the choice of the starting point. As usual, it may be relevant to hedge against this risk by choosing as a starting point a value ofµ suggested by some simpler estima-tion procedure. For instance, in the spirit of Pastorello, Renault, and Touzi (2000), Black–Scholes implied that volatilities and associated estimators of the parameters of the volatility process may provide a useful starting point for backtting estimation of option pricing models with stochastic volatility.

REPLY TO M. JOHANNES AND N. POLSON

We are grateful to M. Johannes and N. Polson for having carefully read our article and made an insightful account of the inference issue, which is our main focus of interest. It is actually quite comforting to see that our distinction between two subsets of parameters, cleverly denoted by£Pand£Qin their discus-sion (whereas we use in the article the less transparent notation µ1 andµ2/, has been well understood. It may be that reading

this discussion is the best way to clarify some discussants’ con-fusion between “pricing parameters”¸.µ /_Dµ(or part of ) with µ_D.µ1; µ2/and market prices of riskµ2.

Indeed, we agree with most of Johannes and Polson’s dis-cussion. We do think that our “approach is similar in spirit to MCMC algorithms,” not only because “it is an iterative algo-rithm” but also more deeply because the mathematical founda-tions of the convergence argument for the iterative procedures are often very similar. To see this more clearly, it may be better to consider a class of models where, by contrast with most as-set pricing applications, the conditional probability distribution ofY¤givenYandµis not degenerate. Let us consider the latent regression model example of Section 2.3. of the article. Then we claim that the iterative algorithm suggested by the “general-ized residuals” of Gourieroux, Monfort, Renault, and Trognon (1987) (see also the “simulated residuals” article in the same is-sue ofJournal of Econometrics) is quite similar in spirit to the MCMC approach to PROBIT and TOBIT kinds of models.

It is also a well-taken point to stress that the two approaches are “different along a number of dimensions.” First, it is per-fectly right to say that, when updating the structural parameters, MCMC uses the information in both the vector of states and all of the observed asset prices, whereas implied states backtting focuses on the transition equation of state variables, and basi-cally in the M stage, neither option prices or yields are directly used. The stochastic volatility model considered by the discus-sants is illuminating in this respect. However, we think that, at least in some circumstances, this loss in efciency may be ac-ceptable for several reasons.

First, as usual in statistics, we may imagine a trade-off in efciency versus robustness to misspecication. Second, we have performed in Pastorello et al. (2000) some Monte Carlo

(5)

comparisons between the informational content about volatil-ity structural parameters of Black–Scholes implied volatilities and underlying asset price series, respectively. Because, on the one hand, the estimation procedure based on Black–Scholes im-plied volatilities was entirely based on the volatility transition equation and, on the other hand, the estimators based on re-turns time series were 10 times less accurate than the volatility-based ones, we wonder whether there is so much information to lose in focusing on the transition equation of the volatility process. The term structure example in this article leads to the same type of conclusion. Finally, if we do think as the discus-sants that “the real challenge: : :is to provide computationally attractive tools that solves problems of interest for empirical as-set pricers,” we hope that the backtting approach is in some respect closer to the practitioner’s approach of computing im-plied parameters and, for this reason, a bit more user friendly in terms of interpretation than MCMC. This may justify paying a cost in terms of relative efciency.

It is also well taken to notice that the backtting approach “does not appear to be as general as MCMC in terms of esti-mating latent states.” Indeed, the discussants raise an interest-ing issue about jumps that are just “integrated out of the option price.” It appears at rst sight that this issue is even more gen-eral and is about any factor that appears as a sequence of inde-pendent shocks rather than through a continuous-time Markov structure. We have no complete answer at this stage for this in-triguing question.

REPLY TO J. PAN

J. Pan introduces her interesting set of comments as revolv-ing “around one very basic question: as a practicrevolv-ing econome-trician, under what situation will I choose the proposedlatent

backtting methodover existing approaches?” We do think that

this is a nice way to address the issue but under one condi-tion: One should start from a clear denition of what the pro-posed backtting is and what the existing approaches were be-foresuch a backtting strategy had been proposed, rst in the likelihood context of Renault and Touzi (1996) and second in the very general context of extremum estimators of Patilea and Renault (1997). In this respect, we will argue in this reply that, rather than saying “the idea of IS-GMM comes from the obser-vation that for any given set of µ and¸, we can always try to invertXtfromYt using the pricing relation,” it would be more

appropriate to say that IS-GMM interestingly specializes to a GMM context the general idea of latent backtting that is pre-cisely “the observation that for any given set ofµ1andµ2, we

can always try to invertXtfromYtusing the pricing relation.”

Note, as a preliminary remark, that we have changed the discussant’s notation of the unknown parameters, to avoid the confusion, already mentioned in replies to Chernov and Dai, be-tween nuisance parameters and market prices of risk. We found it quite puzzling in this respect that J. Pan considers the fact that the set of parameters¸is a function ofµ “excludes most of the motivating examples discussed in the paper.” As already explained in the previous replies, most of the motivating ex-amples in asset pricing, including Pan’s (2002) option pricing model, are such that the set of parameters¸, as they are dened in the main part of the article, starting with formula (1.2) in the

Introduction,coincideswithµor at least with the function¸.µ / dening the risk-neutral parameters. In this respect, the discus-sant is right “to give the authors the benet of doubt and assume that theirµ is big enough to include both” what she calls “theµ and¸” and what we prefer to call “theµ1 andµ2.” Indeed, we

thought it was clear that in most asset pricing models, the re-lation between observed prices and state variables depends not only on the market prices of risk but also on the dynamics of the state variables.

The specializationof latent backtting, as put forward by Pan (2002), to a GMM context is actually illuminating for two rea-sons. First, as explained in the reply to Durham and Geweke, GMM is a specic case where, by contrast with the general case, it is not wrong to perform estimation in one step as

µ_TIS_DArg Max

µ QT[µ ; ¸.µ /] instead of the backtting estimator

µTDArg Max

µ QT

µ ; ¸.µ.p.T///¤: This is the reason that Pan (2002) proposes to solve

1 T

T X tD1

’¡Yt; µTIS; ¸.µ IS T /

¢ D0;

whereas implied states backtting in this specic context pro-poses more generally to estimateµ byµT solution of

1 T

T X

tD1

’¡Yt; µT; ¸.µ.p.T/// ¢

for a sufciently large numberp.T/of iterations. In this respect, a constructive discussion should not oppose the two strategies but, on the contrary, acknowledge that the former is a particular case of the latter.

The second reason the specialization to GMM is illuminat-ing is that there is, however, one specic situation where “the latent backtting method does not converge to IS-GMM, which does not suffer from the identication problem ofµ2,” precisely

because it follows a one-step approach. More precisely, when latent moment conditions

E[Ã .Y_t¤; µ1/]D0

are only able to identifyµ1, it may be the case that, after

com-putation of implied states, the resulting moment conditions: E£Ã¡g¡1.Yt; µ /; µ1

¢¤ D0

are able to identify not onlyµ1but alsoµ2such thatµD.µ1; µ2/.

Pan has to be congratulated for pointing out this special case in her discussion. In such a setting,µ2 typically corresponds

to market prices of risk and implied states backtting will not work becauseµT.µ /will not depend onµ2. This is the reason

we have written that it is only when “the functionµT.¸.¢//

ad-mits a unique xed point” that we are sure that “the IS-GMM estimator coincides with the limit of iterations” of backtting. Note, however, that the special case whereµT.µ / does not

de-pend onµ2and, therefore, does not admit a unique xed point

should not be overstated. The term structure application, which was performed in Section 6, shows that there are natural cir-cumstances where even the market prices of risk are identied

(6)

by the latent moment conditions because some observed prices are included in the list of state variables.

To conclude, we do not consider that it is appropriate to oppose IS-GMM and IS-backtting. The former is basically a neat specialization of the latter in a GMM context. It is al-ways possible to interpret IS-GMM as looking for the xed point IS-backtting is tracking by successive approximations. Of course, one can always imagine situations where the xed point exists but backtting will fail because the function is not contracting. But we would surely not say “the desired xed point is ever so eluding when no much pressure is en-forced.” As explained in the article [see formula (3.6) and re-lated comments], when Pan (2002) says “the efciency loss of this optimal-instrument scheme is limited in that: : :,” the only

limitto this efciency loss is actually characterized by the

al-legedly eluding contracting feature of the backtting function.

REPLY TO R. P. SHERMAN

We are glad to welcome R. Sherman’s comment. In an inde-pendent and simultaneous work, Dominitz and Sherman con-sidered a closely related framework and provided some deep insights for the case of semiparametric single-index models for binary dependent variables. Their contribution is closer to the original EM paradigm. At each iteration they perform a gen-uine E step; that is, an expectation of the sample criterion over unobserved variables given observed variables and current pa-rameter values is computed. In this respect, our framework is more general because it does not specify a particular way to re-cover a guess about the latent variables for a given value of the parameters. In particular, in many applications such as empir-ical nance and applied game theory, the relationship between latent and observable worlds is one to one. This case has some specic features (identication, information: : :), which are the main focus of interest of our article.

Imposing conditions for asymptotic results on sample crite-rionQT and its rst- and second-order derivatives, rather than

on the functionµT(in general, implicit) and its rst derivative is

a matter of convenience given the application at hand. In some cases the former is easier, whereas in other cases the latter is more appealing.

Moreover, writing our conditions in terms of QT entails

that we distinguish between what is needed for identication (unique xed point) for consistency of the proposed estima-tor (contraction mapping) and for the asymptotic distributional theory [_k6.µ±; µ±/¡1H.µ±/_k<1]. The latter inequality allows us to draw a direct relationship with the statistical literature on nonadaptivity. It allows us to compare directly the asymptotic properties of various competing estimators (see, e.g., the previ-ous discusion aboutcontinuously updated GMMor our discus-sion in the article aboutimplied states GMM), which are not, in general, dened through the functionµT.

REPLY TO C. A. SIMS

We are grateful to C. Sims for the set of comments of our work he proposes under the title “the paper’s ideas are not in-herently non-Bayesian and not an alternative to Bayesian ap-proaches.” It was quite exciting for us to see that our approach could be reconsidered with a Bayesian viewpoint and seems to deal with a type of modeling situation in which standard Gibbs sampling might be inefcient. The tone of Sims’ comments is more critical as regards the numerical issues and the term struc-ture application.

As already acknowledged in our reply to Chernov’s com-ments, it is always partially a matter of taste to decide if some amount of computational work is a daunting task or not, and to what extent one is ready to accept some efciency loss to look for more user-friendly procedures. Let us just stress in this re-spect that when resorting to latent likelihood, owing to backt-ting, allows one to work with computations available in closed form, there is a benet, at least in terms of interpretation. More-over, as it is manifest with the example of Aït-Sahalia’s likeli-hood expansions, there is a cost to maximize a likelilikeli-hood that involves “unnatural” occurrences of the unknown parameters.

About the term structure application, we actually share most of the discussant’s sources of discomfort. But we do think that this “applies to all factor pricing model of this sort, not just to this paper.” For instance, we acknowledge in the article that the results are sensitive to whichK variables or linear combi-nations of variables are assumed to be measured without er-ror. However, we do know that this problem would be at least as detrimental with standard maximum likelihood. Moreover, the idea of backtting could accommodate an iterative search for the “least principal components.” Last but not least, the partially semiparametric feature of what we called “extended backtting” should make it robust, by contrast with standard maximum likelihood, with respect to some specication error in the disturbance process. Unless we are missing something important, we do think that this argument makes sense, even without Monte Carlo evidence.

Finally, we fully agree that more work should be done to say “how is it expected that a nancial decision maker should use these models,” if the “error” has “to be interpreted as reect-ing failure of no-arbitrage conditions.” Renault (1997) devoted a complete invited lecture at the 1995 World Congress of the Econometric Society to “Econometric Models of Option Pric-ing Errors,” precisely because we felt uncomfortable with the way arbitrage-based option pricing models were, at the time of this lecture, poorly accommodated to statistical inference. Of course, nancial econometrics have made some signicant progress within the last 10 years but the way estimation errors and model uncertainty should be taken into account for nan-cial decision making is still a wide-open issue.

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500103288619124

Journal of Business & Economic Statistics

Iterative and Recursive Estimation in Structural

Nonadaptive Models

Sergio Pastorello, Valentin Patilea & Eric Renault

Iterative and Recursive Estimation in

Structural Nonadaptive Models

Sergio P

Valentin P

Eric R

Parts

Dokumen yang terkait

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205187

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205196

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205204

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205213

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205222

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205231

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205240

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205259

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205268

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205277

Dukungan

Links

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500103288619124

Journal of Business & Economic Statistics

Iterative and Recursive Estimation in Structural

Nonadaptive Models

Sergio Pastorello, Valentin Patilea & Eric Renault

Iterative and Recursive Estimation in

Structural Nonadaptive Models

Sergio P

Valentin P

Eric R

Parts

Dokumen yang terkait

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205187

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205196

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205204

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205213

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205222

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205231

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205240

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205259

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205268

Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 0007491042000205277

Dokumen yang Anda mencari sudah siap untuk unduhkan