Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji jbes%2E2009%2E0012

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 12 January 2016, At: 17:18

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

A Penalty Function Approach to Bias Reduction in

Nonlinear Panel Models with Fixed Effects

C. Alan Bester & Christian Hansen

To cite this article: C. Alan Bester & Christian Hansen (2009) A Penalty Function Approach to

Bias Reduction in Nonlinear Panel Models with Fixed Effects, Journal of Business & Economic Statistics, 27:2, 131-148, DOI: 10.1198/jbes.2009.0012

To link to this article: http://dx.doi.org/10.1198/jbes.2009.0012

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 411

View related articles


(2)

A Penalty Function Approach to Bias Reduction

in Nonlinear Panel Models with Fixed Effects

C. Alan BESTER

University of Chicago, Graduate School of Business, Chicago, IL 60637 (cbester@chicagosb.edu)

Christian HANSEN

University of Chicago, Graduate School of Business, Chicago, IL 60637 (chansen1@chicagosb.edu)

We consider estimation of nonlinear panel data models with individual specific fixed effects. Estimation of these models is complicated since estimation of the fixed effects when the time dimension is short generally results in inconsistent estimates of all model parameters. We present a penalized objective function that reduces the bias in the resulting point estimates. The penalty function is simple to construct and requires no modification for models with multiple individual specific parameters. We illustrate the approach through a series of simulations that suggest the approach is effective in reducing bias and in an empirical study of insider trading activity.

KEY WORDS: Bias; Fixed effects; Incidental parameters; Panel data.

1. INTRODUCTION

One of the most appealing features of panel data is the flexibility that it allows in modeling time-invariant individual specific effects. In the linear model, the most common approach to dealing with individual specific heterogeneity in economics is to allow for individual specific intercepts that are treated as parameters to be estimated. This approach is appealing as it allows the researcher to estimate the common slope parameters of the model without needing to specify a mixture distribution for the individual specific effects. How-ever, as noted by Neyman and Scott (1948), leaving the indi-vidual heterogeneity unrestricted in a nonlinear or dynamic model leads to the incidental parameters problem; that is, noise in the estimation of the fixed effects when the time dimension is short results in inconsistent estimates of the common parameters because of the nonlinearity of the problem.

A number of approaches to removing the incidental parameters bias from estimates of common model parameters have been developed. In some special cases, estimators of the common parameters that are consistent with the number of observations per individual fixed are available (for example, see Arellano and Honore´ 2001; Chamberlain 1984 for reviews and Anderson 1970; Chamberlain 1985; Honore´ 1992; Honore´ and Kyriazidou 2000a,b; Horowitz and Lee 2004; Manski 1987). Unfortunately, such estimators generally only apply to specific models and the existence of such estimators seems to be quite rare. In addition, the consistency of these estimators typically comes from finding clever ways to avoid estimating the fixed effects, so they generally do not provide any guidance to estimating average marginal effects of covariates.

Recently, a number of additional approaches have been proposed that use asymptotic approximations derived as both the number of individuals,n, and the number of observations per individual,T, that go to infinity jointly (for example, see Arellano and Hahn 2005 for an excellent survey and Alvarez and Arellano 2003; Arellano 2003; Carro 2006; Ferna´ndez-Val 2004, 2005; Hahn and Kuersteiner 2004; Hahn and Newey 2004; Woutersen 2005 for specific approaches). These

esti-mators are designed to remove theO(1/T) bias from the fixed effects estimators of the common model parameters. In small to moderate-Tpanels, these corrections have been found to offer substantial improvements in root mean squared error (RMSE) relative to uncorrected fixed effects estimates. The approaches pursued in these articles are appealing in that they are generally applicable. Furthermore, estimation of the fixed effects is explicitly considered and so bias reductions for functions of interest averaged over the individual effects may be pursued in a straightforward manner.

We present an approach to bias reduction for fixed effects panel models that falls within the second category. Specifically, we consider estimation using a penalized objective function where the penalty function is designed to remove the O(1/T) bias of the resulting estimator. The penalty function can be constructed using the Hessian and scores from the unpenalized objective function, which are often used for performing asymptotic inference. Hence, forming the penalty requires only calculation of quantities that are readily available to the researcher. This form of the penalty function is also quite intuitive; for example, when the objective function is a like-lihood, the objective function is penalized for finite sample deviations from the information equality.

The bias correction is valid for the parameters of general M-estimators in both static and dynamic models and is imme-diately applicable to models with multiple individual-specific parameters. In addition to offering corrections for model parameters, we provide results that can be used to bias-correct the marginal impact of a regressor averaged over the individual specific parameters. Such average effects provide a useful summary of an effect over the population and in many cases are more meaningful than the individual parameters of the model. We consider the finite sample performance of the procedure through a series of simulation studies. In the simulations, we

131

2009 American Statistical Association Journal of Business & Economic Statistics April 2009, Vol. 27, No. 2 DOI 10.1198/jbes.2009.0012


(3)

see that the bias reduction removes a large portion of the bias and does not substantially increase the variance relative to the uncorrected estimates. Thus, the bias-corrected estimators tend to perform substantially better than the uncorrected estimators in terms of mean squared error and inference. In some of the simulations, we find that the simplest form of our penalized estimator has inferior finite sample properties relative to other bias corrections, most of which leverage the functional form of the likelihood. This suggests that the simplicity and generality of the proposed approach may not be free, and that other approaches that exploit the structure of the problem may per-form better in finite samples when they are available and feasible.

We illustrate our approach in an empirical study of insider trading activity. We estimate a dynamic ordered probit model where the outcome is selling, no activity, or buying of the firm’s equity shares by corporate officers and directors of a given firm in a given quarter. We include multiple firm-specific parameters in this model to allow unobserved firm-specific factors, such as litigation risk, the structure of executive compensation, and concerns about regulatory attention, to affect insiders’ decision to buy or sell in different ways. Bias correction is potentially useful in this setting as the model is dynamic, nonlinear, and estimated using a relatively short sample period. Our penalty function approach is attractive here because it allows for multiple individual specific parameters in a simple fashion. We find that our bias correction has an impact on estimates of several coefficients and marginal effects that is both substantial relative to standard errors and relevant in the economic interpretation of the results. After bias correction, we find modest evidence that insiders use information in future returns when making trading decisions.

The remainder of the article is organized as follows. In Section 2 we present the penalty function and provide key properties of the resulting estimators. Section 3 illustrates the penalty function in a number of examples. We present simu-lation results in Section 4 and our insider trading application in Section 5. Section 6 concludes.

2. A PENALTY FUNCTION FOR NONLINEAR FIXED EFFECTS MODELS

In this section, we characterize our proposed penalty func-tion and illustrate how it removes the O(1/T) bias from the estimates of the parameters of nonlinear fixed effects models. As with other recent articles regarding bias reduction for fixed effects panel models, our formal results make use of an asymptotic sequence in whichnandTgo to infinity at the same rate. Under these asymptotics, we find that the estimator resulting from the penalized optimization problem is asymp-totically normal and correctly centered.

2.1 The Incidental Parameters Problem

Suppose that we are interested in estimating a panel data model defined by an objective function

Qðu0;a10;. . .;an0Þ ¼

Xn i¼1

XT t¼1

xit;u0;ai

with a common parameter of interestu0and individual specific

parametersai0,i¼1,. . .,n, whereu() does not depend onT.

The extremum estimator ofu0and theai0may then be defined

as

ð^u;a^1;. . .;a^nÞ ¼arg max u;a1;...;an

Qðu;a1;. . .;anÞ:

We assume that the problem is posed such that in time series asymptotics where T ! ‘ with n fixed ð^u;a^1;. . .; a^nÞ !p ðu0;a10;. . .; an0Þ:

The incidental parameters problem arises whennis large and

Tis small because of the estimation error in the^ai. Intuitively,

the bias can be seen by considering the optimization problem where theaiare first concentrated out of the problem. In other words, suppose we find^uby first solving

^

aiðuÞ ¼arg max

ai

XT t¼1

xit;u;aiÞ and then

^

u¼arg max u

Xn i¼1

XT t¼1

xit;u;a^iðuÞÞ: ð1Þ

It follows from standard results for extremum estimators (e.g., Amemiya 1985, chap. 4) that

^

u!p uT¼arg max

u

E X

T t¼1

xit;u;a^iðuÞÞ

" #

;

whereE PT

t¼1uðxit;u;a^iðuÞÞ

¼ lim

n!‘1=n

Pn i¼1E

PT t¼1uðxit;

u;a^iðuÞÞ

when n ! ‘ with T fixed, and we assume all expectations exist for simplicity and clarity. It then follows that uT6¼u0in general since E PTt¼1uðxit;u;a^iðuÞÞ

will typically not be equal to E PT

t¼1uðxit;u;aiðuÞÞ

where aiðuÞ ¼

arg max ai

EPTt¼1xit;u;aiÞ

:In other words, the randomness in thea^iwhenTis small results in an estimator,^u, that is the

solution to a ‘‘misspecified’’ problem: Even as n ! , the optimization problem one solves for ^u using the estimated ^

aiðuÞdiffers from the one that would be solved if the individual

specific coefficientsaiwere known.

WhileuTusually differs fromu0, it will generally be true that uT!u0asT!‘. In addition, for smooth functionsu(), we

will haveuT¼u0þB/TþO(1/T2) whereB/Tis theO(1/T) bias

of the estimator. Under standard regularity conditions, it will also generally be true thatpffiffiffiffiffiffinTð^uuTÞ !

d

Nð0;I1

u VI

1

u Þfor IuandVdefined below (see, for example, White 1982). Using these results, we can then see that even whennandTgrow at the same rate such thatn/T!r

ffiffiffiffiffiffi

nT p

ð^uu0Þ ¼pnTffiffiffiffiffiffið^uuTÞ þ

ffiffiffiffi

n T

r

BþO

ffiffiffiffiffi

n T3

r

!d NðrB;Iu1VI

1

u Þ: ð2Þ

That is, the uncorrected extremum estimator ofu0is incorrectly

centered asymptotically even in cases wherenandTincrease at the same rate. Hahn and Newey (2004), in static models, and Hahn and Kuersteiner (2004), in dynamic models, provide additional discussion of the incidental parameters prob-lem using asymptotic expansions and also formally justify Equation (2).


(4)

2.2 Bias in the Scores

In the preceding section, we intuitively presented the incidental parameters problem. For uncorrected extremum estimators, the estimation of the individual specific effects results in asymptotically incorrect inference even when T

grows as fast as n because of the bias in the limiting distribution. Here, we present further intuition for the inci-dental parameters bias by presenting a heuristic derivation of the bias in the scores of the optimization problem solved to estimateu.

Before proceeding, it will be useful to define some notation. Let

yitðu;aiÞ ¼

@uðxit;u;aiÞ

@ai

;

uitðu;aiÞ ¼

@uðxit;u;aiÞ

@u ;

Uit¼ uitðu;aiÞ E yuit E y

a

it

1

yitðu;aiÞ;

and let superscripts denote partial derivatives; e.g.,ya

itðu;aiÞ ¼

½@yitðu;aiÞ=@a9i:Also, define

Iu¼ EUuit: ð3Þ

In a likelihood setting,uitandyit are the scores foruandai, respectively,Uitis the score foruafter the individual specific parameters ai are concentrated out, and Iu is the Fisher information for u. Also, arguments of the functions will be suppressed when evaluated at the true parameters values; e.g., yit¼yit(u0,ai0).

By standard arguments, the optimization problem in (1) implies that

^

uu0¼ I^uðuÞ

1^ Uð^uÞ whereu is an intermediate value between ^u andu

0, I^uðuÞ ¼ 1/nTPni¼1PTt¼1Uu

itðu;a^iðuÞÞ, and U^ðuÞ¼1=nTP n i¼1

PT t¼1 Uitðu;a^iðuÞÞ: Intuitively, bias in ^u will then result since

E½U^ðu0Þ6¼0 in general. Below, we show that E½U^ðu0Þ b=T, wherebis theO(1/T) bias in the score foru.

For simplicity, we focus on the dim(a)¼1 case. However, the approach to bias reduction we present below will also work for models with multiple individual-specific coefficients; for example, a dynamic probit model with individual specific intercepts and individual specific time trends. Following Hahn and Newey (2004) and Hahn and Kuersteiner (2004), we can approximatea^iðu0Þas

^

aiðu0Þ aiþ

1

TE y

a

it

1 Biþ

1

T

XT t¼1

cit; ð4Þ

where

cit¼ E y

a

it

1

yit ð5Þ

is a mean zero random variable, and

Bi¼lim T!‘E

1

T

XT t¼1

yait

XT t¼1

cit

" #

þ12Eyaait lim

T!‘E

1

T

XT t¼1

cit

XT t¼1

cit

" #

:

Then, making use of PT

t¼1yitðu0;a^iðu0ÞÞ ¼0 and expanding

^

Uðu0Þ¼1=nTPni¼1

PT

t¼1uitðu0;a^iðu0ÞÞabouta^iðu0Þ¼aiyields

^ Uðu0Þ

1

nT

Xn i¼1

XT t¼1

uit þ

1

nT

Xn i¼1

XT t¼1

uaitða^iðu0Þ aiÞ

þnT1 X

n i¼1

XT t¼1

uaait ða^iðu0Þ aiÞ2:

The first term is the usual score evaluated at the truth and will have a zero expectation and, appropriately normalized, follow a central limit theorem. For the second and third terms, we can plug in the expression for^aiðu0Þ aigiven in (4) to obtain

^ Uðu0Þ

1

nT

Xn i¼1

XT t¼1

uit þ

1

nT

Xn i¼1

XT t¼1

uait

1

TE½y

a

it

1 Bi

þT1 X

T t¼1

cit

þ12 nT1 X

n i¼1

XT t¼1

uaait

1

T

XT t¼1

cit

2 ;

from which we obtain EU^ðu0Þ0 þ 1

T

1

n

Xn i¼1

E uait Eyait1Bi

þE 1

T

XT t¼1

uaitX

T t¼1

cit

" #

þ 12Euaait E 1

T

XT t¼1

cit

XT t¼1

cit

" #!!

¼ Tb; ð6Þ wherebis theO(1/T) bias in the score foru. It follows that the bias in^u, defined in (2), is given byB¼ I1

u b.

2.3 Bias Reduction via Penalized Optimization

As illustrated previously, bias reduction for estimates of the common parameter,u, may be performed by first computing the maximum likelihood (ML) estimate, ^u, then using the infor-mation in the sample to form an estimate of the bias. In this section, we present an alternative approach to bias correction based on penalizing the objective function. There are several advantages to this approach. First, the penalty function we employ makes use of only the score and the hessian of the original problem, both are often used in performing inference and so are readily available to the researcher. The approach applies simply to problems with individual specific effects, and working directly with the objective function may offer compu-tational advantages relative to the score correction in some cases.

Define the penalized objective function as

Qpðu;a1;. . .;anÞ ¼

Xn i¼1

XT t¼1

xit;u;aiÞ

Xn i¼1

piðxit;u;aiÞ;

wherePni¼1piðxit;u;aiÞis the penalty function, and let

~

aiðuÞ ¼arg max

ai

Qpðu;a1;. . .;anÞ and

~

u¼arg max

u Qpðu;a~1ðuÞ;. . .;a~nðuÞÞ:


(5)

The first order condition for ~u is given by 0¼U^pð~uÞ where ^

UpðuÞ¼Pni¼1

PT

t¼1yitðu;a~iðuÞÞPni¼1ð@=@uÞpiðxit;u;a~iðuÞÞ.

Then following the same informal argument from above, we can expand this score to obtain

EU^pðu0Þ0 þ 1 T

1

n

Xn i¼1

E uait E yait

1

Bipai

þ E 1

T

XT t¼1

uaitX

T t¼1

cit

" #

þ 12Euaait E 1

T

XT t¼1

cit

XT t¼1

cit

" #

piu

; ð7Þ

wherepia¼plim T!‘

@pi=@aiandpui ¼plim

T!‘ð

@pi=@uÞ. From this

expression, we can see that anypithat satisfy

@pi

@ai !

p

lim

T!‘E

1

T

XT t¼1

yait

XT t¼1

cit

" #

þ 12E yaait

lim

T!‘E

1

T

XT t¼1

cit

XT t¼1

cit

" #

; ð8Þ

and

@pi

@u !

p

lim

T!‘E

1

T

XT t¼1

uaitX

T t¼1

cit

" #

þ 12Euaait lim

T!‘E

1

T

XT t¼1

cit

XT t¼1

cit

" #

ð9Þ will remove theOð1=TÞbias from the scores fora~iand~u. It is

interesting to note that the right side of (8) corresponds to the

O(1/T) bias in the scores for ai. In other words, a penalty function that satisfies (8) and (9) works by first removing the

O(1/T) bias from the estimator of ai and then removing the remaining bias from the estimator of u. Note that this differs from the score correction mentioned in the previous section, which corrects the score foruof the concentrated problem but offers no correction for theai.

We consider two ways to construct a penalty function. One approach is to compute the expectations on the right side of (8) and (9) analytically and then solve the implied set of differential equations forpi. We show below that this approach may perform extremely well in specific models. For the Neyman and Scott (1948) example and the linear dynamic panel data model, pur-suing this approach produces pffiffiffin-consistent estimates of the common parameters. Unfortunately, explicit expressions for the expectations are typically not available except in very special cases. Also, even when the right side of (8) and (9) can be computed analytically, the resulting differential equations may not be soluble. One could also compute some of the right side terms in (8) and (9) and use the resulting expressions to con-jecture a function whose derivatives have the required limiting behavior, although this approach is naturally very model specific. Instead, we propose a simple penalty function that satisfies conditions (8) and (9) quite generally. In particular, we consider a penalty defined by

piðu;aiÞ ¼

1

2traceðI^ 1

ai V^aiÞ

k

2 ð10Þ

where dim(aikandI^ai andV^ai are given by ^

Iai ¼ 1

T

XT t¼1

yaitðu;aiÞ;

and

^ Vai ¼

1

T

Xm t¼m

X

maxðT;TþlÞ

t¼maxð1;tÞ

yitðu;aiÞyitlðu;aiÞ9:

It is straightforward to verify that (10) satisfies (8) and (9) by differentiating (10) with respect tou andaiand taking limits asT!.

The form of the penalty given in (10) is also quite intuitive, especially for likelihood models. When evaluated atu0andai0,

^

Iai is simply the sample information matrix forai, andV^aiis a conventional estimator of Var 1 =pffiffiffiffiTPtyit that is robust to

heteroskedasticity and autocorrelation (HAC), withma band-width parameter that needs to be chosen such thatm/T1/2!0 as T! . For iid static models,m¼0 may be chosen; and sinceTis relatively small in most panel data applications,m¼

1 is a natural choice for the bandwidth in most dynamic applications.

It is worth noting that using the HAC form for V^a i is important for good performance for reducing bias in dynamic models even in cases where the scores would be uncorrelated if one had the true parameter values. The intuition for this is that even in cases where the scores are uncorrelated at the true parameter values, when evaluated at points away from the true parameter values; there will be correlation in the scores. Because of the potential for substantial finite sample bias, this will remain true even when the scores are evaluated at the estimated parameter values, and failing to account for this correlation may result in poor finite sample performance of the bias reductions.

We make two additional comments regarding the HAC estimator forV^a

iin the present context. First, whilem¼1 is a natural bandwidth choice in short panels, one may wish to consider optimal bandwidth selection in longer panels and would certainly want to consider largermin cases when more than one lag of the dependent variable is included. There are a number of approaches available for selecting bandwidths for estimating the spectral density at 0 and in principle these methods could be employed in the present context when longer panels are available. For example, see the early work of Parzen (1957) as well as Newey and West (1987, 1994), Andrews (1991), and Andrews and Monahan (1992) for approaches that could readily be applied within each cross-sectional unit to generate an optimal bandwidth for each time series. One could also potentially employ cross-validation (for example, see Velasco 2000 for recent work). We note that formally adapting these procedures and verifying their properties in the present context would require substantial work that is beyond the scope of the present article. Furthermore, optimality for estimating the spectrum at zero does not imply optimality in terms of the properties of the bias reduction. As such, it seems that pursu-ing optimal bandwidth selection in the present context would be an interesting direction for future research. Second, our formulation makes use of the truncated kernel which may lead to an estimate ofV^a

i;which may not be positive definite. We


(6)

maintain this formulation for notational convenience, but note that one could use a kernel that would guarantee positivity of the resulting estimator. While lack of positive definiteness has not been a problem in our simulation or empirical results, it could arise, especially as the time dimension increases and one worries more about bandwidth selection. In addition, the automatic bandwidth selection procedures mentioned pre-viously generally rely on kernels that produce positive esti-mates of the spectrum at 0, which may also argue for the use of other kernels in some situations.

The penalty function may also be developed using the fol-lowing intuition. As noted in Section 2.1, we may think of the incidental parameters problem as a form of misspecification resulting from the estimation error in thea^i whenTis small.

Considering likelihood models with iid data for the moment, this misspecification should result in failure of the information equality for smallT; that is,

Eyaitðu0;a^iðu0ÞÞ6¼Eyitðu0;a^iðu0ÞÞyitðu0;a^iðu0ÞÞ0:

However, at the true parameter values

E yait ¼E½yity9it:

In other words, the information equality would be satisfied if we knew theai; but with smallT, the estimation error ina^i

results in the failure of the information equality as with mis-specified model (e.g., White 1982). This difference suggests that a potential way to remove bias from the estimator would be to penalize the estimator for within sample deviations from the information inequality, which is exactly what the penalty defined in (10) does. In likelihood models, as T gets large,

I^1

ai ^

Vai converges in probability to ak-dimensional identity matrix sopðÞ !p 0. However, for smallT,I^a

i andV^ai will generally differ, and p() penalizes the optimization prob-lem for deviations from the information equality in finite samples.

In the preceding, we have heuristically presented stochastic expansions fora~iand~uand argued that the use of the penalized

objective function eliminates theOð1=TÞ bias from the esti-mators. These arguments are formalized in the following result which states that the asymptotic distribution of the common parameter is correctly centered asn and T grow large at the same rate when the penalized objective function is used.

Theorem 1. Asn!andT!such thatn=T!rand under regularity conditions given in Assumption A.1 in the Appendix,a~ifori¼1,. . .,nand~uare consistent and

ffiffiffiffiffiffi

nT p

ð~uu0Þ !

d

Nð0;Iu1VI

1

u Þ whereV¼var 1=pffiffiffiffiTPtUit

, andUitandIuare defined in (3). 2.4 Bias Correcting Fixed Effects Averages

In the previous section, we demonstrated how solving the penalized optimization problem results in a bias reduction to estimates of the common parameters relative to estimators obtained from the unpenalized objective function. However, in many nonlinear models, one may be more interested in estimating effects averaged over the unobserved effects dis-tribution than in the common parameters themselves. For

example, in a panel discrete choice setting, one is likely more interested in estimating the effect of a covariate on the choice probabilities than in the index coefficients. In this section, we illustrate bias reduction for these fixed effects averages.

Suppose we are interested in estimating m¼E½mðw;u0;aiÞ;

whereware values of the covariates. We consider estimation of mby

~ m¼ 1

nT

Xn i¼1

XT t¼1

mðwit;~u;a~iÞ:

For example, in a probit, one might be interested in the effect of a continuous variable on the probability that the dependent variable equals one:

m¼ E½bjx0bþaÞ

and

~

nT1 X

n i¼1

XT t¼1

~

bjx9itb~þa~iÞ

wherebjis the coefficient on the variable of interest andfis the standard normal density function. We also note that the preceding formulation and the following results apply imme-diately to estimating an effect at a particular covariate value averaged over the unobserved effect distribution by simply replacingwandwitin the previous expressions with the value of interest, sayw*. The same basic ideas might also be applied fruitfully to bias reduce other functionals of the estimated common parameters and unobserved effects, but we leave this extension to future work.

While both ~banda~ihave the O(1/T) bias removed,m~ still needs additional corrections because of the randomness ina~i.

Lets^2i be an estimate of the variance ofa~i, then a bias-reduced

estimate ofmis given by ~

mbc¼m~

1

T

1

nT

Xn i¼1

Xm l¼m

X

maxðT;TþlÞ

i¼maxð1;lÞ

@mðwit;~u;~aiÞ

@ai

^ c9itl

12T1 nT1 X

n i¼1

XT t¼1

trace @

2m

ðwit;~u;a~iÞ

@ai@a9i

^ si2

wherec^

itis an estimate ofcitdefined in (5). We note that the randomness in a^i shows up as 1/T bias resulting from two

sources: the variance of a^i itself, which is O(1/T) and

cova-riance between the score foraand the derivative of the func-tional of interest with respect to a. The HAC form in the covariance allows for possibly dependent data and may be simplified in the case of time-independent data.

We note that the correction itself follows from a stochastic expansion ofm~;and its validity can be verified using arguments similar to those used for bias-reductions for the estimator ofu. Similar corrections for estimates of fixed effect averages are considered in Arellano and Hahn (2005), Hahn and Newey (2004), and Ferna´ndez-Val (2005). Our correction differs by allowing for dependence. It is also slightly simpler because the penalty function cancels the first-order bias ofa~ias well as~u

so this bias does not need to be accounted for in our correction.


(7)

2.5 Relation to Other Work

The penalized optimization approach to bias reduction pro-posed previously is closely related to a number of approaches to bias correction for panel data models that have been pro-posed in the literature. Arellano and Hahn (2005) provide a detailed review of the recent literature in this area. Intuitively, once one is equipped with an expression for the asymptotic bias

bin the estimating equation foru, such as (6), one may obtain a bias-corrected estimator in a number of ways. Our approach works by defining a penalty function whose derivatives con-verge in probability to the first-order asymptotic bias in the estimating equation. One may also construct a bias-corrected estimator as the solution to a recentered estimating equation obtained by subtracting an estimate of the bias given in (6) from the original estimating equation. Finally, one may recenter the parameter estimate directly by subtracting an estimate of I1

u b. Below we briefly discuss some other bias-corrected estimators for panel models and their relation to our method.

For static models, our penalized optimization approach produces similar estimating equations as the score correction suggested by Hahn and Newey (2004) and Ferna´ndez-Val (2004). These corrections build on the work of Firth (1993), who considered score corrections to remove higher order bias from models with a fixed number of parameters. In the dynamic case, our approach uses the same expression for the bias in the scores as used by Hahn and Kuersteiner (2004), who correct the common parameters directly. In con-temporaneous research, Arellano and Hahn (2005) propose a correction to the concentrated likelihood function for u(with incidental parameters concentrated out) that is similar to our method.

A number of articles have also considered special cases of these bias corrections applied to commonly used models and noted relationships between these bias corrections and other estimation approaches. Carro (2006) and Arellano (2003) developed score corrections specialized for use in discrete choice panel models. Hahn and Newey (2004) show their score correction is asymptotically equivalent to the integrated like-lihood estimator of Woutersen (2005). Woutersen (2005) also demonstrates that his integrated likelihood approach is asymptotically equivalent to the modified profile likelihood estimator of Cox and Reid (1987). In the more general stat-istical literature, Severini (1998, 2000) and Sartori (2003) also consider adjustments to concentrated likelihoods to alleviate bias problems, and Severini (2002) has extended these approaches to other estimating equations contexts.

Our approach is distinct from those previously considered in the literature in several ways. First, it works directly with the unconcentrated problem, simultaneously correcting theO(1/T) bias in estimates of both the common parameteruand theai. Using the penalty function given in (10) and modern opti-mization software, our approach is very simple to implement in practice. Our penalty function requires only the sample outer product of scores and hessian, which are likely already avail-able to a researcher; although one may wish to compute higher order derivatives to obtain the scores and hessian of the penalized problem in some situations. Score corrections and ex

post corrections of the estimator require third derivatives and, in many cases, analytic computation of expectations involving the scores and their derivatives, which may be cumbersome or impractical in many cases and is likely infeasible for general M-estimators. Finally, our approach is quite general and applies to M-estimators in static and dynamic nonlinear models and requires no modification for models with multiple individual-specific parameters.

Most of the bias correction approaches discussed previously, including our procedure, are asymptotically equivalent in the sense that they all remove theO(1/T) bias from estimates of the common parameter,u. However, the finite sample properties of these procedures can differ dramatically. For example (i.e., Ferna´ndez-Val 2005) develops a correction specific to panel binary choice models, taking advantage of the structure of the likelihood to construct an estimate of the bias in the common parameter, and finds this correction performs better in simu-lations than the more general score correction given in Hahn and Kuersteiner (2004). Similarly, the score correction pro-posed in Carro (2006) exploits the structure of the scores in panel discrete choice models and is found to have very good finite sample properties. Overall, there is growing evidence that exploiting the specific structure of the objective function or estimating equation in constructing a bias-corrected estimator may result in finite sample improvements. In Section 4, we find that the simplest form of our penalization procedure produces an estimator with inferior finite sample properties relative to other bias corrections that exploit the structure of the problem. Thus, there may be a price to be paid for the generality and simplicity of our approach, although, of course, finite sample performance can vary considerably across different simulation designs.

3. EXAMPLES

We present four examples that highlight different aspects of our approach. We refer to the penalty function (10) as the HS penalty, as it involves the sample hessian and outer products of the scores. When a solution to (8)–(9) is available, we refer to it as the IE penalty, as it involves integrating expectations of the scores and their derivatives. Our first two examples are linear models, the classic example considered by Neyman and Scott (1948) and a dynamic linear model. Our last two examples the logit and probit are nonlinear. In the probit example, the IE penalty requires numeric solution of a set of differential equations, but our penalization approach remains easy to implement using sample information.

3.1 Neyman-Scott

We begin with a classic example in which observations {yit} are normally distributed with individual-specific means, ai0,

and common variance,s20. This example allows us to analyze our estimator in closed form and explicitly compare it with other approaches. Ignoring constants for notational con-venience, the log-likelihood can be written

L¼ nT

2 logs

2 21

s2

X

i

u9iui; ð11Þ


(8)

whereyiis aT-vector andui¼yiiTai. The ML estimates are ^

ai¼yiand

^ s2ml¼

1

nT

X

i

yiyi

ð Þ9ðyiyiÞ ¼s20

1

n

X

i

zi

T wherezi;x 2

T1:

Note that, because E[zi] ¼ T 1, we have E s^ml2

¼

11=T

ð Þs20, sos^2ml is not consistent whenTis fixed. In this example the Fisher information forai and the outer product of scores foraiare given byI^ai ¼ 1=s2 andV^ai ¼ 1=Ts4u9iui, and the corresponding penalized objective

func-tion is

QHS¼

nT

2 logs

2 21

s2 1þ

1

T

X

i

u9iui:

In this case the penalty function leaves a^i unchanged, while

the estimate ofs2is multiplied by 1ð þ 1=TÞ. We therefore have E s^2HS

¼ð1þ1=TÞE s^2ml

¼ ð11=T2Þs20. That is,

the penalty function reduces the bias from orderT1to order

T2, as required by Theorem 1.

In this simple example, we can also construct a penalty function by computing expectations involving the scores ex-plicitly and integrating. Equation (8) implies@pIE

i =@ai¼0;

and after some algebra, (9) becomes@piIE=@s

2

¼ ð1=2s2Þ:

The solution to these ODEs is pIE

i ¼ ð1=2Þlogs

2, and the

penalized objective function becomes

QIE¼Lþ

n

2logs

2

¼ nðT21Þlogs2 1 2s2

X

i

u9iui:

The resulting estimator is thens^IE2 ¼ðT=T1Þs^

2

ml, which is

ffiffiffi

n

p -consistent for

s2for anyT. This fixed-Tconsistency is a special property resulting from the structure of the likelihood in the linear model. In terms of the theory developed in Section 2, the two estimatorss^2H Sands^2I Eare equivalent in the sense that they are both free fromO(1/T) bias.

As mentioned in Section 2.2, once equipped with an expression such as (6) for the bias in the estimating equation for u, one may obtain a bias corrected estimator in several ways. Our approach works by finding a penalty whose derivatives have a probability limit equal to the asymptotic bias in the estimating equation and subtracting it from the objective function. Another approach is to bias-correct the estimating equation itself, or to correct (recenter) the estimator directly by subtracting an estimate of the first-order bias. Using the latter approach, the asymptotic bias of the MLE is E s^2ml

s20¼ ð1=TÞs20. A natural bias-corrected estimator is then

^

sBC2 ;1¼s^2mlþs^2ml=T, which is identical tos^2HS.

Arellano and Hahn (2005) and Ferna´ndez-Val (2005) note that this and other similar bias corrections that work by recentering the estimator may be iterated, in this case using the recursion formula s2BC;k¼s^

2

mlþs^

2

BC;k1=T. They then

rec-ognize that ^

s2BC;k¼ 1 þ

1

T þ

1

T2 þ. . . þ

1

Tk

^ s2ml !

k!‘ T

T1

^ sml2 ;

so here the limiting ‘‘infinitely iterated’’ estimator s^BC2 ;‘ is

fixed-T consistent and is identical to s^2IE. Our HS penalty

function may also be iterated in this fashion. DefiningQHS,1[ QHS as given above, define a sequence of penalty functions pHS,kand penalized objective functionsQHS,krecursively as

QHS;k¼LpHS;k

where pHS;k¼

1 2

P

i

@QHS;k1

@ai

2

P

i

@2Q

HS;k1 @a2i

;

where thekth penalty function is constructed using the Hessian and outer product of scores from the previous penalized objec-tive function QHS,k1. For this example, it is straightforward

to show that the penalized objective functions have the form

QHS;k¼

nT

2 logs

2

21s2 1 þ

1

T þ. . . þ

1

Tk

X

i

u9iui

k!!‘ nT2 logs2 1 2s2

T T1

X

i

u9iui;

and that maximizing the limiting objective function results in the same fixed-T consistent estimator. It is worth noting that this is a special example and iterating a bias-corrected estimator does not result in improved asymptotic properties in general.

Even in this simple example, we see that different bias corrections that remove bias to the same order may have very different finite sample properties. In more general models, the various approaches to bias correction, along with different ways in which the expectations of the scores and their deriv-atives are estimated using the data, will all typically produce distinct bias-corrected estimators, each of which may perform very differently in finite samples. In the remainder of this section, we concentrate on our penalty function approach and show that it may be easily applied to commonly used panel data models. The finite sample properties of our procedure are explored via Monte Carlo study in Section 4.

3.2 Dynamic Linear Model

Our second example generalizes the first by adding regres-sors,xi, and a lagged dependent variable. The likelihood has the same form (11) with

ui¼yiiTaixibrðLyiÞ;

where L is the lag operator defined such that Ljz i¼

013j;zi1;. . .;ziTj

9

and Ljz

i¼ zij;. . .;ziT;013j

9

. Also note that we assume {xi,yio} is strictly exogenous and without loss of generality set yi0 ¼ 0. The penalty function may be

constructed as in Section 2.3 with ^

Iai ¼ 1

s2 V^ai ¼ 1

Ts4 u0iui þ 2

Xm j¼1

Lju i

Lju i

!

;

ð12Þ where we have used the HAC form of V^ because this is a dynamic model. The resulting estimator maximizes the objective functionQpH S¼L ð1=2TÞPiI^a1

i ^

Vai and is con-sistent under the conditions given in Theorem 1.


(9)

For an alternate construction of the penalty function, we note that@pIE

i =@aiand@piIE=@s2are as the previous example, and

that@3L=@a2i@r¼@3L=@a2i@b¼0. We therefore have

@pIE

i

@r ¼ E E

@2L

@a2i

1

@2L

@ai@r

@L

@ai

" #

¼ T1s2E½ði9TðLyiÞÞði9TuiÞ

@pIE

i

@b ¼ E E

@2L

@a2i

1

@2L

@ai@b

@L

@ai

" #

¼ T1s2E½ðxi9iTÞði9TuiÞ:

Note that while strict exogeneity ofxiimplies that@piIE=@b¼0,

the expectation of the product of summeduiand laggedyidoes depend onr. A solution to these differential equations is

pIE¼X

i

piIE p IE i ¼

1 2logs

2 þbðrÞ bðrÞ ¼ 1

T

XT1

t¼1 Tt

t r

t

:

This expression forpIEis equivalent to the correction proposed by Lancaster (2002), who also shows that maximizing the objective function QpIE ¼Lþn=2 logs2þnbðrÞ results in

ffiffiffi

n

p -consistent estimates of (

r,s2) withTfixed.

3.3 Logit

In this example, the observations for each individual areyi, a T-vector containing zeros and ones, and a T 3 karray of regressors,xi. The likelihood has the form

L¼X

i

y9ilogLiþ ðiTyiÞ9logð1LiÞ; ð13Þ

whereLi is aT-vector whose tth entry is L(x9it b þai), and

LðyÞ ¼ey=1

þey is the logistic cumulative distribution func-tion. The penalty functionpHSis easily constructed using

^ Iai ¼

1

TL9iðiTLiÞ ^ Vai ¼

1

TðyiLiÞ9ðyiLiÞ:

Note that if the model is dynamic (e.g.,xiincludes past values ofy), the HAC form ofV^a

i should be used.

The logit is a nonlinear example where (8) and (9) can be solved in closed form as in the linear examples. After some manipulation, these equations reduce to

@pIE

i

@ai ¼

i9Tli

ð Þ1ði9TdliÞ

@pIE

i

@b ¼ði9TliÞ

1 x9idli

ð Þ;

whereli¼Lio(1Li) is the logistic density,dli¼lio(1 2Li) is its derivative, ando denotes element-by-element mul-tiplication. In both cases the numerator is the derivative of the denominator, and solutionpIE

i ¼1=2 logði9TliÞfollows easily.

Note, however, that the above expressions for @pi=@ai and

@pi=@bwere derived under the assumption that the scores for

ai and their derivatives are iid. Using this form of the penalty function for a dynamic logit would require us to solve (8) and (9) numerically.

3.4 Probit and Ordered Probit

The probit model has much in common with the logit: it is nonlinear, uses the same observablesyi andxi, and has seen wide use in applied work. UsingFto denote the normal cdf, the likelihood has the same form (13) withLireplaced byFi, whosetth entry isFðx0itbþaiÞ. The expressions forI^ai and

^

Vai, which are similar to the logit, are readily obtained by differentiating the log-likelihood and are omitted for brevity. A key feature of the probit model is that, unlike for the logit, the differential Equations (8) and (9) do not have solutions in closed form even in the static case.

The ordered probit model is a simple extension of the probit for which a natural parameterization includes multiple indi-vidual specific effects. We consider a dynamic version of this model, which we will use in our Monte Carlo study and empirical application later. We suppose that the outcomes are given byyit2{1, 0, 1} and write the log-likelihood for each observation as

L¼X

i

X

t

1y it¼1

f glog 1ð FitÞ

þ1 y it¼0

f glog FitFit

þ1 y it¼1

f glogFit; ð14Þ

where Fit¼FðmitÞ, Fit¼FðmitþciÞ, and mit¼aiþ

r11fyit1¼1gþr11fyit1¼1gþx

0

itb. In addition to ai, this

model features a second individual specific parameter,ci$0. Without this second effect, a change in unobserved hetero-geneity would always affect the probabilities of the highest and lowest outcomes,P(y¼1) andP(y¼ 1), in opposite direc-tions. This is an economically undesirable restriction in many applications. For example, in our insider trading application, unobserved firm-specific restrictions on insider activity may decrease the probability of both buying and selling. The HS form of the penalty function may readily be obtained by dif-ferentiating (14). The form of the derivatives is somewhat cumbersome but quite standard, so the expressions are omitted for brevity. Note that as for the simple probit, the differential Equations (8) and (9) are not soluble in closed form.

4. MONTE CARLO STUDY

We present a brief Monte Carlo study that compares the finite sample properties of the ML estimator and our bias corrected penalized likelihood estimators. We consider three example models. The first is the static logit, for which we can explicitly compute both forms of the penalty function dis-cussed in Section 2, and compare the resulting penalized likelihood estimators with several alternative estimators that have been proposed in the literature. The second is the dynamic logit, for which we compare the simple form of our penalty function (10) to several other bias corrected estimators, and the estimator proposed by Honore´ and Kyriazidou (2000b), which is consistent and asymptotically normal asn!withTfixed, although the rate of convergence is slower thanpffiffiffin. Finally, we consider an ordered probit with multiple individual specific effects similar to the model used in the empirical application in Section 5. Because this model includes multiple individ-ual specific effects, we consider only our corrections in this case. As in Section 3, we refer to the penalty function (10),


(1)

(1998), insider trading activity may depend on past returns and differ across value and growth stocks. Firm size is also potentially important as larger firms are more likely to employ stock-based compensation and the impact of earnings on share price can differ across large and small firms. We therefore include log(MVE), the natural log of the firm’s market value of equity 10 days before the announcement, BM, the book to market ratio, and EAPRE6, the return on the firm’s stock from six months to two days before the announcement minus the market return over the same period, as controls. We include EAPOST6, the excess return on the firm’s stock over the period two days to six months after the announcement, to control for nonearning related future returns. Models of informed trade (e.g., Kyle 1985) predict that insider activity should respond to market volume, prompting us to include TURNOVER, the total trading volume during the quarter divided by shares outstanding.

Finally, the information content of earnings announcements may be influenced by institutional ownership and analyst coverage, prompting us to include INST, EEPS, and NUMEST, respectively, the percentage of institutional ownership, the absolute value of announced earnings per share minus analysts’ consensus forecast, and the number of forecasts in the last I/B/ E/S consensus forecast released before the earnings announcement date. Because a sizable fraction of the firms in our sample have no analyst coverage, we also include a dummy variable dNUM0, which equals one if no consensus forecast was released.

Our analysis complements Roulstone (2006) in several ways. First, to control for changing market conditions and regulatory reform, we focus on a five-year period from 1996 to 2000. This period falls approximately between two important events in insider trading regulation: a 1997 U.S. Supreme Court decision that upheld the SEC’s ability to prosecute trading on the basis of misappropriated nonpublic information, and the release of a detailed set of disclosure regulations by the SEC in late 2000, which were later expanded in the Sarbanes-Oxley

Act of 2002. Roulstone (2006) considers two main specifica-tions, a linear model with firm specific effects and a Tobit model without firm effects and estimates both separately for buys and sells. In this application, the ordered probit specifi-cation allows us to combine information from both types of transactions while accommodating unobserved firm level het-erogeneity in a very flexible way. We also include indicators for buying and selling in the previous quarter and allow each to have asymmetric effects on the probability of insider activity in the current period. Despite the shorter sample period and dif-ferent model specification, our results are qualitatively very similar and the unanticipated earnings announcement return,

CAR, remains statistically significant both before and after bias correction.

5.3 Results and Discussion

We present estimation results for several versions of the ordered probit model (15) using the data discussed previously. We estimate all models considered by ML. For models with individual specific effects, we also present penalized likelihood estimates using the penalty function given in Equation (10), which are free fromOð1=TÞbias, as discussed in Section 2.

We first consider a standard ordered probit model, as given in (15) with the additional restrictionsrc¼bc¼gc¼0. The left and center columns of Table 5 present ML estimates and standard errors for this model, respectively, without and with firm specific fixed effects. Bias corrected estimates for the model with fixed effects are presented in the right columns of the table. These estimates highlight two important features of our data and estimation procedure. First, unobserved firm-level heterogeneity plays a prominent role. Controlling for firm specific heterogeneity has a large impact on the coefficients on the lagged outcomes,1{yit

1¼1} and1{yit1¼ 1}. Book to market and analyst coverage are no longer statistically sig-nificant after fixed effects are added. Also, for the fixed effects specification, our bias correction has a substantial impact on Table 5. Ordered probit estimates for insider trading data

Firm-specific effects No Yes Yes

Time dummies Yes Yes Yes

Bias corrected *

Index coefficient estimates,r^1;^r1;b;^ g^ 1(y

t1¼ 1) 0.555 (0.016) 0.121 (0.021) 0.193 (0.021)

1(yt1¼1) 0.417 (0.017) 0.120 (0.022) 0.190 (0.022)

CAR3 0.170 (0.081) 0.236 (0.103) 0.187 (0.102)

EAPRE6 0.460 (0.020) 0.373 (0.027) 0.376 (0.026)

EAPOST6 0.024 (0.018) 0.017 (0.024) 0.025 (0.024)

TURNOVER 0.143 (0.041) 0.191 (0.077) 0.200 (0.076)

INST 0.132 (0.032) 0.167 (0.078) 0.125 (0.076)

BM 0.113 (0.013) 0.054 (0.031) 0.048 (0.031)

dNUM0 0.035 (0.018) 0.008 (0.040) 0.010 (0.040)

NUMEST 0.010 (0.001) 0.007 (0.004) 0.006 (0.004)

EEPS 0.074 (0.060) 0.065 (0.084) 0.069 (0.083)

log(MVE) 0.049 (0.005) 0.592 (0.030) 0.539 (0.029)

Log likelihood 29,990 19,726

NOTE: Ordered probit parameter estimates for our insider trading application. The model is as in Equation (15) with the restrictionsrc

1¼rc1¼bc¼gc¼0 imposed. Standard errors are shown in parentheses. The left columns show estimates with no firm-specific effects. The center and right columns show estimates with two firm-specific effects before and after bias correction. Variable definitions are given in Table 3.


(2)

several of the coefficient estimates. This is particularly true for the coefficients on lagged outcomes (as previously noted in the Monte Carlo study) but also for CAR, which is significant at the 5% level before bias correction but not afterward.

Coefficient estimates for the asymmetric ordered probit model are presented in Table 6. Estimates of the index coef-ficients,r,b, andg, appear in the top panel of the table. In this specification, rc,bc, and gc are free parameters, with coef-ficient estimates displayed in the bottom panel of the table. The columns of the table correspond to ML and our penalized likelihood estimator, for the model (15) with and without time period effects. Several right side variables have asymmetric effects on insider buying and selling, most notably the lagged outcomes, turnover, book to market, and market value. For selling in the previous quarter, the estimated coefficientbr1is insignificant at the 5% level while ^rc

1 is highly significant and negative. Market volume displays perhaps the most pro-nounced asymmetric effect. The coefficient on turnover is insignificant at the 5% level when symmetry is imposed (see Table 5) but is highly significant in both the upper and lower panels of Table 6. Our bias correction again substantially impacts the coefficients on lagged outcomes, and behaves nearly identically when period specific effects are excluded. Regarding the variable of chief interest, CAR, we note that the bias correction substantially reduces the coefficient estimate though it does remain significant at the 5% level. This finding

is economically interesting as it provides some evidence that insiders do base current trading decisions on future market reactions to future news releases.

As with any nonlinear model, directly interpreting the coefficient estimates can be difficult. This is particularly true here. For example, for a given right side variablex(j)in (15), we know that ifgj> 0, an increase in the value ofx

(j)will increase the probability of buying by insiders in the given quarter. However, if in additiongc

j< 0 (as is the case with turnover and

book to market), an increase inx(j)will raise the level of the indexy*, but also push the lower truncation pointcit toward

zero, leaving the net impact on the probability of insider selling ambiguous. We therefore report average marginal effects, E½@=@xitPðyit¼1Þand E½@=@xitPðyit¼ 1Þ, and bias-correct

our estimates of these effects as described in Section 2.4. When

x(j) is an indicator, as with our lagged outcomes, we present the discrete analogue, E Pðyit¼61jxðjÞ¼1;xðjÞÞ Pðy¼

61jxðjÞ¼0;xðjÞÞ: Estimates of these effects for the asym-metric probit model with time period effects are presented for selected variables in Table 7. Note for several right side var-iables, including turnover, market value, and both lagged outcomes, the bias corrected estimate of one or both partial effects differs from the ML estimate by at least one standard error.

Our estimates of average marginal effects are economically interesting for several reasons. The earnings announcement Table 6. Asymmetric ordered probit estimates for insider trading

Firm-specific effects Yes Yes Yes Yes

Time dummies Yes Yes No No

Bias corrected * *

Index coefficient estimates,^r1;^r1;b;^ g^ 1(y

t1¼ 1) 0.044 (0.027) 0.047 (0.027) 0.061 (0.026) 0.064 (0.026)

1(yt1¼1) 0.168 (0.026) 0.305 (0.025) 0.193 (0.025) 0.327 (0.025)

CAR3 0.322 (0.123) 0.256 (0.122) 0.365 (0.122) 0.303 (0.121)

EAPRE6 0.385 (0.035) 0.390 (0.034) 0.475 (0.034) 0.475 (0.034) EAPOST6 0.046 (0.029) 0.053 (0.028) 0.019 (0.028) 0.028 (0.027) TURNOVER 0.648 (0.094) 0.639 (0.093) 0.739 (0.093) 0.718 (0.092) INST 0.125 (0.099) 0.111 (0.097) 0.089 (0.094) 0.089 (0.093)

BM 0.027 (0.036) 0.027 (0.036) 0.156 (0.033) 0.146 (0.033)

dNUM0 0.006 (0.049) 0.001 (0.048) 0.017 (0.048) 0.007 (0.047) NUMEST 0.016 (0.005) 0.015 (0.005) 0.018 (0.005) 0.016 (0.005)

EEPS 0.030 (0.095) 0.035 (0.093) 0.098 (0.094) 0.100 (0.093)

log(MVE) 0.462 (0.036) 0.417 (0.035) 0.410 (0.033) 0.368 (0.033) Cut point coefficient estimates,^rc

1;^rc1;b^c;g^c

1(yt1¼ 1) 0.100 (0.023) 0.204 (0.024) 0.105 (0.023) 0.209 (0.023) 1(y

t1¼1) 0.060 (0.024) 0.173 (0.024) 0.065 (0.024) 0.176 (0.024)

CAR3 0.166 (0.110) 0.147 (0.110) 0.199 (0.111) 0.188 (0.111) EAPRE6 0.040 (0.032) 0.042 (0.032) 0.097 (0.030) 0.100 (0.029) EAPOST6 0.051 (0.025) 0.052 (0.026) 0.004 (0.024) 0.006 (0.023) TURNOVER 0.687 (0.093) 0.677 (0.093) 0.684 (0.090) 0.677 (0.090) INST 0.052 (0.088) 0.057 (0.087) 0.015 (0.082) 0.012 (0.082) BM 0.120 (0.031) 0.115 (0.031) 0.043 (0.029) 0.043 (0.029) dNUM0 0.015 (0.043) 0.005 (0.043) 0.006 (0.043) 0.016 (0.043) NUMEST 0.017 (0.005) 0.017 (0.520) 0.019 (0.005) 0.019 (0.094)

EEPS 0.054 (0.090) 0.054 (0.091) 0.094 (0.094) 0.088 (0.094)

log(MVE) 0.208 (0.033) 0.200 (0.033) 0.166 (0.031) 0.164 (0.031)

Log likelihood 19,580 — 19,892 —

NOTE: Parameter estimates for the asymmetric ordered probit model (15) in our insider trading application, with and without time dummies. Standard errors appear in parentheses. Variable definitions are given in Table 3.


(3)

return CAR has a statistically significant effect on the proba-bility of insider buying (at the 5% level before bias correction and the 10% level afterward) but not on selling. This seems to agree with several results in Roulstone (2006), who suggests that the impact of earnings announcements on sales may be more difficult to estimate because of the presence of liquidity trades (e.g., selling by insiders for portfolio rebalancing pur-poses). This hypothesis may also be reflected in two other estimated partial effects. For market value, we find that insiders at larger firms (which are more likely to employ stock-based compensation) are less likely to be buyers and more likely to be sellers, with both effects statistically significant at the 1% level. We also find statistically significant effects for previous returns (EAPRE6), while future nonearning announcement returns fail to be statistically significant. In particular, high previous returns decrease the probability of buying and increase the probability of selling, both of which could be explained by insiders’ portfolio rebalancing.

Of our control variables, turnover has the most pronounced asymmetric effect: larger market volume substantially increa-ses the probability of both types of insider activity. This re-sult is of interest, because while the Kyle (1985) model implies that informed agents should trade in larger quantities when market volume is higher, it does not necessarily specify how market volume impacts the decision to trade. The effects of activity in the previous quarter are also asymmetric. With all other observables held fixed, insider selling in the previous quarter increases the probability of selling by insiders in the current quarter by 6.3%, but does not significantly change the probability of buying. Similarly, insider buying last quarter raises the probability of buying in the current quarter by 6.9% and decreases the probability of selling by 1.2%. Book to market has a statistically significant effect on the probability of selling but not on buying. Note that this conclusion is not obvious from inspection of the coefficient estimates in Table 6.

Overall, we find that future earnings announcement returns have a small but quite robust effect on insiders’ decision to trade. Our results are qualitatively similar to Roulstone (2006) who uses the same data but over a much longer time period and with a very different specification. Our penalized likelihood estimator is simple to implement, and the bias correction has an impact on a number of coefficients and average partial effects that is large relative to standard errors.

6. CONCLUSION

In this article, we propose a penalty function approach to estimation of structural parameters in panel data models with individual specific coefficients. Our penalty function is simple to compute, even for nonlinear dynamic models: it involves only the sample information matrix and outer products of scores, which are already widely used by practitioners for inference. We also consider an alternate construction of our penalty function as the solution to a set of differential equations involving expectations of products of the scores and their derivatives. Interestingly, in two linear example models, this alternate construction of the penalty function results in pffiffiffin -consistent inference for the common structural parameters. For general nonlinear models, however, this form of the penalty function will typically not be available.

The penalty function reduces bias in the resulting extremum estimator from order 1/Tto order 1/T2asymptotically. We prove this result under asymptotics wherenandTgo to infinity jointly. We also present a brief Monte Carlo study. The results suggest that the resulting bias corrected estimates will be very useful in settings where T is moderate, as is the case in many micro-econometric applications. Both forms of our penalty function result in bias corrected estimators that perform comparably to other bias corrections proposed in the literature. There is some evidence that other bias corrected estimators that exploit the particular structure of the objective function may have better finite sample properties, and extensions of our approach along these lines may be an interesting avenue for future research.

The Monte Carlo evidence also suggests that the improvement is most dramatic in nonlinear dynamic models, where the sim-plicity of our approach is of particular advantage. We consider one such example in an empirical study of insider trading activity, where we estimate an ordered probit model with mul-tiple firm specific parameters. We find fairly robust evidence that insiders base trading decisions on future market returns.

APPENDIX

The conclusion of Theorem 1 will be valid under the follow-ing assumption adapted from Conditions 1–7 in Hahn and Kuersteiner (2004).

Assumption A.1. Assume m=T1=2 to 0 and that the fol-lowing hold jointly.

Table 7. Average partial effects for insider trading application Partial effect @

@xitPðyit¼1Þ

@

@xitPðyit¼1Þ

@

@xitPðyit¼ 1Þ

@

@xitPðyit¼ 1Þ

Bias corrected * *

1(y

t1¼ 1) 0.008 (0.007) 0.009 (0.007) 0.035 (0.006) 0.063 (0.006)

1(yt1¼1) 0.032 (0.006) 0.069 (0.007) 0.020 (0.006) 0.012 (0.006) CAR3 0.061 (0.031) 0.054 (0.031) 0.027 (0.026) 0.016 (0.026) EAPRE6 0.073 (0.012) 0.083 (0.011) 0.070 (0.009) 0.080 (0.008) EAPOST6 0.009 (0.008) 0.011 (0.007) 0.002 (0.006) 0.003 (0.007) TURNOVER 0.123 (0.023) 0.134 (0.023) 0.032 (0.021) 0.059 (0.021)

BM 0.005 (0.011) 0.006 (0.011) 0.023 (0.011) 0.029 (0.010)

log(MVE) 0.088 (0.011) 0.088 (0.011) 0.145 (0.010) 0.163 (0.009) NOTE: Estimates of average partial effects, E½@Pðyit¼1Þ=@xitand E½@Pðyit¼ 1Þ=@xit;for the asymmetric ordered probit model. Parameter estimates for this model are shown in the left columns of Table 6. Standard errors are given in parentheses. Variables for which neither partial effect is significant at the 10% level are omitted for brevity. Variable definitions are given in Table 3.


(4)

A1. Foreachh> 0,infi½GðiÞðu0;ai0Þ supu;aÞ:u;aÞðu0;ai0Þj>hg

GðiÞðu;aÞ> 0;whereGiÞðu;aiÞ[1=T PTt¼1xit;u;aiÞ

andGðiÞðu;aiÞ[E½uðxit;u;aiÞ:

A2. n!andT !such thatn=T!rwhere 0 <r<. A3. For each i, {xit, t ¼ 1, 2,. . .} is a stationary mixing

sequence that is independent across i. Let Ai

t¼

sðxit;xit1;xit2;. . .Þ;Bti¼sðxit;xitþ1;xitþ2;. . .Þ; and

kiðmÞ ¼supt supA2Ai

t;B2B i

tþm jPðA\BÞ PðAÞPðBÞj: supi|ki(m)|#Camfor someasuch 0 <a< 1 and some C> 0.

A4. Forc¼(u,a), the functionu(;c) is continuous in (c) 2CwhereCis a compact, convex subset ofRdim(c). A5. Letn¼(n1,. . .,nk) be a vector of nonnegative integers,

jnj ¼Pkj¼1nj; and Dnuðxit;cÞ ¼@jnjuðxit;cÞ=@cn11... @cnk

k : There exists a function M(xit) such that jDnuðxit;c1ÞDncðxit;c2Þj# MðxitÞjjc1c2jj for all c1, c2 2 C and |n| < 5, and M(xit)

satisfies supc2CjjDnuðxit;cÞjj #MðxitÞ and supi

Eh|M(xit)|10qþ12þd i

<‘for some integerq$dim(c)/2

þ2 and somed> 0.

A6. infiinfTliT> 0 whereliTis the smallest eigenvalue of

VarðT1=2PTt¼1uitðu;aiÞ E½uitaE½yita1yitðu;aiÞÞ:

A7. infi|E [yita]| > 0.

A8. LetIi¼E½@=@u9ðuitðu;aiÞE½uitaE½yita

1

yitðu;aiÞÞ

ju0;ai0;and let mil andmiube the minimum and max-imum eigenvalues of Ii. 0 < infimil#supimiu<‘, and I[ limn!‘Iiexists and is positive definite.

A9. Forc¼(u,a), supc2Ck1=nT

Pn

i¼1piðu;aiÞk ! p

0:

Conditions A1–A8 are equivalent to those used in Hahn and Kuersteiner (2004) who provide additional discussion. The restrictions are fairly standard, although A3 does impose sta-tionarity. The initial observations in the simulation study were generated in a way that violated the stationarity assumption, so the Monte Carlo evidence suggests that the correction may be robust to mild violations of this assumption.

Condition A9 is a sufficient condition for the penalty func-tion to go away asymptotically and guarantee that a uniform law of large numbers applies to the penalized objective func-tion. Under A1–A8, a sufficient condition for A9 to be satisfied is that the unpenalized objective function is concave over the parameter space C. Although this condition is stronger than necessary, it is simple and will be satisfied in many models of interest. A practically relevant alternative approach is to note that the unpenalized estimates ofu and theai are consistent

under A1–A8 using the arguments of Hahn and Kuersteiner (2004). One could then define the parameter space for opti-mizing the penalized objective function asCT, a neighborhood

of the unpenalized estimates whose size decreases to zero as

T !. As T !, one would then only need the objective function to be locally concave in a neighborhood of the true parameter value, which seems to be a very mild condition. A different way to guarantee that A9 is satisfied is to trim the denominator when the minimum eigenvalue of 1=TPTt¼1 yaitðu;aiÞ is near zero with the amount of trimming going to

zero asT !; proofs using this approach for a different but related problem are provided in Bester and Hansen (2006). In

the present article, we adopt the high-level condition for sim-plicity and brevity.

Given Assumption A.1, we sketch an argument for the proof of Theorem 1 later. For simplicity of notation, we consider the case where dim(ai)¼1.

A.1 Proof of Theorem 1

Under conditions A1–A8, a uniform law of large numbers applies to the unpenalized objective function following arguments as in Hahn and Kuersteiner (2004); it then follows using the tri-angle inequality and A9 that the penalized objective function will converge uniformly to the same limit as the unpenalized objective function, which is optimized at the true parameter values. Con-sistency is then immediate following the usual arguments.

Using the form of the penalty function given in (10), the objective function we seek to optimize is given by

Qpðu;a1;...;anÞ ¼

Xn

i¼1

XT

t¼1

xit;u;aiÞ

þX

n

i¼1 1 2

Xm

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;aiÞyitlðu;aiÞ

XT

t¼1

yaitðu;aiÞ

! :

With u fixed, we can solve for ~aiðuÞ ¼arg max ai

Qpðu;

a1;. . . ;anÞ;which satisfies the first order condition

0¼X

T

t¼1

yitðu;~aiðuÞÞ

þ X

m

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;a~iðuÞÞyaiðtlÞðu;a~iðuÞÞ

0 @ 1 A , XT

t¼1

yaitðu;~aiðuÞÞ

!

12 X

m

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;~aiðuÞÞyitlðu;a~iðuÞÞ

0 @ 1 A 3 X T

t¼1

yaait ðu;a~iðuÞÞ

!,

XT

t¼1

yaitðu;a~iðuÞÞ

!2 : ð16Þ The score foru,ð@=@uÞQpðu;a~1ðuÞ;. . .;a~nðuÞÞ;is also given by

Xn

i¼1 "

XT

t¼1

uitðu;a~iðuÞÞ

þ X

m

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;a~iðuÞÞyuiðtlÞðu;a~iðuÞÞ

0 @

1 A

XT

t¼1

yaitðu;a~iðuÞÞ

!

12 X

m

l¼m

X maxðT;TþlÞ

t¼max 1ð Þ;l

yitðu;a~ið Þu Þyitlðu;a~ið Þu Þ

0 @ 1 A 3 X T

t¼1

yauit ðu;a~ið Þu Þ

! XT

t¼1

yaitðu;a~ið Þu Þ

!2# :


(5)

Usingyuit¼u a it;y

au

it ¼u

aa

it ;and (16) and definingUit¼uit

giyitwheregi¼E½uait=E½y a

itand additional subscripts denote

partial derivatives, we can write the score foruas

Xn

i¼1 "

XT

t¼1

Uitðu;~aiðuÞÞ

þ X

m

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;a~iðuÞÞUaiðtlÞðu;a~iðuÞÞ

0 @ 1 A , XT

t¼1

yaitðu;a~iðuÞÞ

!

12 X

m

l¼m

X maxðT;TþlÞ

t¼maxð1;lÞ

yitðu;a~iðuÞÞyitlðu;a~iðuÞÞ

0 @ 1 A 3 X T

t¼1

Uaait ðu;a~iðuÞÞ

!,

XT

t¼1

yaitðu;~aiðuÞÞ

!2# ;

wherePni¼1 P

T

t¼1Uitðu;eaiðuÞÞis the score considered in Hahn

and Kuersteiner (2004) evaluated ateaiðuÞinstead of baiðuÞ ¼

arg maxai

PT

t¼1uðxit;u;aiÞ and the remaining terms

corre-spond to the bias in the scores. The conclusion then follows by the usual approach of expanding the modified score aboutu¼

u0, solving foruu0, and then expanding the first term in the solution, which corresponds to the usual score abouteaiðu0Þ ¼ ai0using (16) to provide the expression foraeiðu0Þ ai0:Under the conditions of Assumption 1 and using similar arguments to Hahn and Kuersteiner (2004), we can then show that

e

uu0 ¼ Iu1

1

nT

Xn

i¼1

XT

t¼1 uit

þT1

1

n

Xn

i¼1

E½uaitE½yait

1

ðBipi

þE

1

T

XT

t¼1 uaitX

T

t¼1

cit

þ12E½uaait E

1

T

XT

t¼1

cit

XT

t¼1

cit

pui

þopð1=TÞ:

Under Assumption A.1 and by construction of thepi, we then

have the conclusion. j

ACKNOWLEDGMENTS

The authors thank Victor Chernozhukov, Whitney Newey, and Tim Conley for useful discussion and advice regarding this article. We are also grateful to two anonymous referees and an associate editor for their extremely useful comments and suggestions. We are also indebted to Darren Roulstone for sharing his insider trading data with us. Of course, all remaining errors are ours. This work has been supported by funding from the William S. Fishman Faculty Research Fund and the IBM Corporation Faculty Research Fund at the Graduate School of Business, the University of Chicago.

[Received August 2007. Revised October 2007.]

REFERENCES

Alvarez, J., and Arellano, M. (2003), ‘‘The Time Series and Cross-Section Asymptotics of Dynamic Panel Data Estimators,’’Econometrica,71, 1121– 1159.

Amemiya, T. (1985): Advanced Econometrics, Cambridge, MA: Harvard University Press.

Anderson, E. (1970), ‘‘Asymptotic Properties of Conditional Maximum Like-lihood Estimators,’’Journal of the Royal Statistical Society: Series B,32 (2), 283–301.

Andrews, D. W. K. (1991), ‘‘Heteroskedasticity and Autocorre-lation Consistent Covariance Matrix Estimation,’’Econometrica,59, 817– 858.

Andrews, D. W. K., and Monahan, J. C. (1992), ‘‘An Improved Hetero-skedasticity and Autocorrelation Consistent Covariance Matrix Estimator,’’

Econometrica,60, 953–966.

Arellano, M. (2003), ‘‘Discrete Choice with Panel Data,’’Investigacion Eco-nomica,27, 423–458.

Arellano, M., and Hahn, J. (2005): ‘‘Understanding Bias in Nonlinear Panel Models: Some Recent Developments,’’ Invited Lecture, London: Econo-metric Society World Congress.

Arellano, M., and Honore´, B. (2001): ‘‘Panel Data Models: Some Recent Developments,’’ inHandbook of Econometrics, Vol. 5, eds. J.J. Heckman and E. Leamer, Amsterdam: North–Holland.

Bester, C. A., and Hansen, C. B. (2006), ‘‘Bias Reduction for Bayesian and Frequentist Estimators,’’ SSRN Working Paper, available at www. ssrn.com.

Carro, J. M. (2006), ‘‘Estimating Dynamic Panel Data Discrete Choice Mod-els,’’Journal of Econometrics,forthcoming.

Chamberlain, G. (1984): ‘‘Panel Data,’’ in Handbook of Econometrics, Vol. 2, eds. Z. Griliches and M. Intriligator, Amsterdam: North– Holland.

——— (1985): ‘‘Longitudinal Analysis of Labor Market Data,’’ in

Heterogeneity, Omitted Variable Bias, and Duration Dependence, eds. J.J. Heckman and B. Singer. New York; Cambridge, UK: Cambridge University Press.

Cox, D. R., and Reid, N. (1987), ‘‘Parameter Orthogonality and Approximate Conditional Inference (with Discussion),’’Journal of the Royal Statistical Society: Series B,49, 1–39.

Ferna´ndez-Val, I. (2004): ‘‘Bias Correction in Panel Data Models with Indi-vidual Specific Parameters,’’ Mimeo.

——— (2005): ‘‘Estimation of Structural Parameters and Marginal Effects in Binary Choice Panel Data Models with Fixed Effects,’’ Mimeo.

Firth, D. (1993), ‘‘Bias Reduction of Maximum Likelihood Estimates,’’ Bio-metrika,80, 27–38.

Hahn, J., and Kuersteiner, G. (2004): ‘‘Bias Reduction for Dynamic Nonlinear Panel Models with Fixed Effects,’’ in Mimeo.

Hahn, J., and Newey, W. K. (2004), ‘‘Jackknife and Analytical Bias Reduction for Nonlinear Panel Models,’’Econometrica,72, 1295–1319.

Honore´, B. E. (1992), ‘‘Trimmed LAD and Least Squares Estimation of Truncated and Censored Models with Fixed Effects,’’ Econometrica,60, 533–565.

Honore´, B. E., and Kyriazidou, E. (2000a), ‘‘Estimation of Tobit-Type Models with Individual Specific Effects,’’Econometric Reviews,19, 341–366. ——— (2000b), ‘‘Panel Data Discrete Choice Models with Lagged Dependent

Variables,’’Econometrica,68, 839–874.

Horowitz, J. L., and Lee, S. (2004), ‘‘Semiparametric Estimation of a Panel Data Proportional Hazard Model with Fixed Effects,’’Journal of Econo-metrics,119, 155–198.

Kyle, A. S. (1985), ‘‘Continuous Auctions and Insider Trading,’’Econometrica,

53, 1315–1335.

Lancaster, T. (2002), ‘‘Orthogonal Parameters and Panel Data,’’The Review of Economic Studies,69, 647–666.

Manski, C. (1987), ‘‘Semiparametric Analysis of Random Effects Linear Models from Binary Panel Data,’’Econometrica,55, 357–362.

Meulbroek, L. K. (1992), ‘‘An Empirical Analysis of Illegal Insider Trading,’’

The Journal of Finance,47, 1661–1699.

Newey, W. K., and West, K. D. (1987), ‘‘A Simple, Positive Semi-Definite Heteroskedasticity and Autocorrelation Consistent Covariance Matrix,’’

Econometrica,55, 703–708.

——— (1994), ‘‘Automatic Lag Selection in Covariance Matrix Estimation,’’

The Review of Economic Studies,61, 631–654.

Neyman, J., and Scott, E. L. (1948), ‘‘Consistent Estimates Based on Partially Consistent Observations,’’Econometrica,16, 1–32.

Parzen, E. (1957), ‘‘On Consistent Estimates of the Spectrum of a Stationary Time Series,’’Annals of Mathematical Statistics,41, 44–58.

Roulstone, D. T. (2006): ‘‘Insider Trading and the Information Content of Earnings Announcements,’’ Chicago GSB Working Paper.


(6)

Rozeff, M., and Zaman, M. (1998), ‘‘Overreaction and Insider Trading: Evidence from Growth and Value Portfolios,’’The Journal of Finance,53, 701–716. Sartori, N. (2003), ‘‘Modified Profile Likelihoods in Models with Stratum

Nuisance Parameters,’’Biometrika,90, 533–549.

Severini, T. A. (1998), ‘‘An Approximation to the Modified Profile Likelihood Function,’’Biometrika,85, 403–411.

Severini, T. A. (2000):Likelihood Methods in Statistics, Oxford: Oxford Uni-versity Press.

——— (2002), ‘‘Modified Estimating Functions,’’ Biometrika, 89, 333– 343.

Velasco, C. (2000), ‘‘Local Cross-Validation for Spectrum Bandwidth Choice,’’

Journal of Time Series Analysis,21, 329–361.

White, H. (1982), ‘‘Maximum Likelihood Estimation of Misspecified Models,’’

Econometrica,50 (1), 1–25.

Woutersen, T. (2005):‘‘Robustness against Incidental Parameters and Mixing Distributions,’’Mimeo.