Directory UMM :Data Elmu:jurnal:J-a:Journal of Econometrics:Vol95.Issue2.2000:

Journal of Econometrics 95 (2000) 375}389

Empirically relevant critical values for
hypothesis tests: A bootstrap approach
Joel L. Horowitz*, N.E. Savin
Department of Economics, University of Iowa, 108 Pappajohn Business Bldg., Iowa City IA 52242, USA

Abstract
Tests of statistical hypotheses can be based on either of two critical values: the Type
I critical value or the size-corrected critical value. The former usually depends on
unknown population parameters and cannot be evaluated exactly in applications, but it
can often be estimated very accurately by using the bootstrap. The latter does not depend
on unknown population parameters but is likely to yield a test with low power. The
critical values used in most Monte Carlo studies of the powers of tests are neither Type
I nor size-corrected. They are irrelevant to empirical research. ( 2000 Elsevier Science
S.A. All rights reserved.
JEL classixcation: C12; C15
Keywords: Hypothesis test; Critical value; Size; Type I error; Bootstrap

1. Introduction
A key objective in classical testing of statistical hypotheses is achieving good

power while controlling the probability that the test makes a Type I error, that
is, the probability that the test rejects a correct null hypothesis, H . If H is
0
0

* Corresponding author. Tel.: #1-319-335-0844; fax: #1-319-335-1956.
E-mail address: joel-horowitz@uiowa.edu (J.L. Horowitz)

0304-4076/00/$ - see front matter ( 2000 Elsevier Science S.A. All rights reserved.
PII: S 0 3 0 4 - 4 0 7 6 ( 9 9 ) 0 0 0 4 2 - 1

376

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

simple, then controlling the probability that the test makes a Type I error is not
di$cult. A simple H completely speci"es the data generation process (DGP).
0
Therefore, the "nite-sample distribution of a test statistic under a simple H can
0

be calculated, as can the probability that the test makes a Type I error for any
critical value. Similarly, it is possible to calculate the critical value corresponding to any probability of a Type I error.
Most hypotheses in econometrics are composite, meaning that H does not
0
completely specify the DGP. It speci"es only that the DGP belongs to a given
set. As a consequence, the sampling distribution of the test statistic under H is
0
unknown except in special cases because it depends on where the true DGP is in
the set speci"ed by H . One way of solving this problem is through the concept
0
of the size of the test. The size is the supremum of the test's rejection probability
over all DGPs contained in H . The a-level size-corrected critical value is the
0
critical value that makes the size equal to a. In principle, the exact size-corrected
critical value can be calculated if it exists, but this is rarely done in applications,
possibly because doing so typically entails very di$cult computations.
A second approach to dealing with a composite H is to base the test on
0
an estimator of the Type I critical value. This is the critical value that would
be obtained if the exact "nite-sample distribution of the test statistic under the

true DGP were known. In general, the true Type I critical value is unknown
because the exact "nite-sample distribution of the test statistic depends on
population parameters that are not speci"ed by H . However, an approxima0
tion to the Type I critical value often can be obtained by using the ("rst-order)
asymptotic distribution of the test statistic under H to approximate its "nite0
sample distribution. This approximation is useful because most test statistics in
econometrics are asymptotically pivotal: their asymptotic distributions do not
depend on unknown population parameters when the hypothesis being tested is
true. Thus, an approximate Type I critical value can be obtained from asymptotic distribution theory without knowledge of where the true DGP is in the set
speci"ed by H .
0
The main disadvantage of this approach is that "rst-order asymptotic approximations can be very inaccurate with samples of the sizes encountered in
applications. As a result, the true and nominal probabilities that a test rejects
a correct H can be very di!erent when the critical value is obtained from the
0
asymptotic distribution of the test statistic. We argue in Section 4, however, that
the bootstrap can often be used to obtain a good estimator of the true Type
I critical value. When the bootstrap does this, the resulting test based on an
estimated Type I critical value has greater power than a test based on an exact
size-corrected critical value (if one exists). We argue later in this paper that in

most testing situations, the best way to achieve high power while maintaining
control over the probability of a Type I error is to base the test on a good
estimate of the Type I critical value. Such an estimate is often provided by the
bootstrap.

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

377

In econometrics, Monte Carlo studies of the "nite-sample powers of tests
usually take a di!erent approach to obtaining critical values. Most studies
report powers based on critical values that are called size-corrected but, as is
explained later in this paper, are really critical values for essentially arbitrary
simple null hypotheses. They have no empirical analogs and cannot be used in
applications. In other words, the size-corrected critical values usually obtained
in Monte Carlo studies of power are both misnamed and irrelevant to empirical
research. We propose an alternative method for obtaining critical values in
power studies. This method can be implemented in applications by using the
bootstrap. Thus it provides a way to obtain empirically relevant critical values
in Monte Carlo studies of power and in applications.

Possibly because of the empirical irrelevance of most size-corrected critical
values reported in the econometrics literature, applied research relies heavily on
asymptotic distribution theory to obtain critical values for tests. Reliance on
asymptotic distribution theory can blur the distinction between size-corrected
and Type I critical values. We emphasize the distinction here so as to avoid
possible confusion later in this paper. The asymptotic null distribution of an
asymptotically pivotal test statistic does not depend on unknown population
parameters, so it may appear that the asymptotic critical value approximates the
size-corrected critical value. This appearance is misleading except in special
cases. Except in special cases, the asymptotic distribution of a test statistic is not
a good approximation to its "nite sample distribution uniformly over the
null-hypothesis set (Beran, 1988). Regardless of how large the sample is, there
may be points in the null hypothesis for which the asymptotic distribution is
very far from the exact "nite-sample distribution. Such points are typically
crucial for determining the size-corrected critical value. When they are, the
critical value obtained from the asymptotic distribution need not be close to the
size-corrected critical value even if the sample is very large. Except in special
cases, the asymptotic critical value approximates the Type I critical value at the
parameter point corresponding to the true DGP. It does not approximate the
size-corrected critical value. This distinction is discussed further in Section 3.

The remainder of this paper is organized as follows. The Type I critical value
is de"ned in Section 2. The size-corrected critical value is reviewed in Section 3,
and the bootstrap critical value is developed in Section 4. These critical values
are all based on sampling under the assumption that the null hypothesis is true.
In Section 5, we discuss the properties of the bootstrap critical value when the
null hypothesis is false. This section also describes our method for obtaining
critical values for use in Monte Carlo studies of power. Section 6 concludes the
paper.
The statistical tests that we consider in this paper consist of a test statistic and
a critical value. The paper is concerned with the choice of critical value given the
test statistic. The paper does not address the important problem of the choice of
a test statistic. In addition, we restrict attention to ®ular' hypotheses whose

378

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

test statistics are asymptotically normally or chi-square distributed under the null
hypothesis. Thus, for example, we do not consider hypotheses that are de"ned
through one-sided inequality constraints such as H : b*c for some population

0
parameter b and constant c. We do consider hypotheses of the form H : b"c.
0
2. Type I critical values when the null is true
Let the data be a random sample of size n from a probability distribution
whose cumulative distribution function (CDF) is F. Denote the data by
MX : i"1,2, nN. F is assumed to belong to a family of CDFs that is indexed by
i
the "nite or in"nite-dimensional parameter h whose population value hH. We
write F (x, hH) for P (X)x) and F ( ) , h) for a general member of the parametric
family. The unknown parameter h is restricted to a parameter set H. The null
hypothesis H restricts h to a subset H of H. If H is composite, then H
0
0
0
0
contains two or more points.
Let ¹ "¹ (X , 2, X ) be a statistic for testing H . Let
n
n 1

n
0
G [q, F ( ) , h)],P(¹ )qDh) be the exact "nite sample CDF of ¹ when the
n
n
n
CDF of the sampled distribution is F ( ) , h). Consider a symmetrical, two-tailed
test of H . H is rejected by such a test if D¹ D exceeds a suitable critical value and
0 0
n
accepted otherwise. The exact, a-level Type I critical value of D¹ D, z , is de"ned
n na
as the solution to the equation G [z , F( ) , hH)]!G [!z , F( ) , hH)]"1!a
n na
n
na
for hH3H . A test based on this critical value rejects H if D¹ D'z . Such a test
0
0
n

na
makes a Type I error with probability a. However, z can be calculated in
na
applications only in special cases. If H is simple so that H contains only one
0
0
point, then hH is speci"ed by H , and z can be calculated or estimated with
0
na
arbitrary accuracy by Monte Carlo simulation. If, as usually happens in econometrics, H is composite, then hH is unknown and z cannot be evaluated unless
0
na
G [q, F( ) , h)] does not depend on the location of h3H when H is true. In this
n
0
0
special case, ¹ is said to be pivotal. The Student t statistic for testing a hypothen
sis about the mean of a normal population or a slope coe$cient in a normal
linear regression model is pivotal. However, pivotal test statistics are not
available in most econometric applications unless strong distributional assumptions are made. When ¹ is not pivotal, its Type I critical value z can be very

n
na
di!erent at di!erent points in H . This is illustrated by Example 1 at the end of
0
this section.
When H is composite and ¹ is not pivotal, it is necessary to replace the true
0
n
Type I critical value with an approximation or estimator. First-order asymptotic distribution theory provides one approximation. Most test statistics in
econometrics are asymptotically pivotal. Indeed, the asymptotic distributions of
most commonly used test statistics are standard normal or chi-square under H ,
0
regardless of the details of the DGP. If n is su$ciently large and ¹ is asympn
totically pivotal, then G [ ) , F( ) , hH)] can be approximated accurately by the
n

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

379


asymptotic distribution of ¹ . The asymptotic distribution does not depend on
n
the location of hH3H when ¹ is asymptotically pivotal and H is true, so
0
n
0
approximate critical values for ¹ can be obtained from the asymptotic distribun
tion without having to know hH.
Critical values obtained from asymptotic distribution theory are widely used
in applications. However, Monte Carlo experiments have shown that "rst-order
asymptotic theory often gives a poor approximation to the distributions of test
statistics with the sample sizes available in applications. As a result, the true and
nominal probabilities that a test makes a Type I error can be very di!erent when
an asymptotic critical value is used. Under conditions explained later, the
bootstrap provides an approximation that is more accurate than the approximation of "rst-order asymptotic theory.
We conclude this section with an example that illustrates the extent to which
the Type I critical value of a test can vary among DGPs in the null-hypothesis
parameter set H .
0
Example 1. Sensitivity of the Type I critical value to the location of hH in H .
0
This example uses Monte Carlo simulation to compute exact Type I critical
values for the Chesher (1983) and Lancaster (1984) version of White's (1982)
information-matrix (IM) test of a binary probit model. The IM test is a speci"cation test for parametric models that are estimated by maximum likelihood. It
tests the null hypothesis that the Hessian and outer-product forms of the
information matrix are equal. Rejection implies that the model is misspeci"ed.
In this example, the model being tested is
P (>"1D X"x)"U(b@x),

(2.1)

where > is a dependent variable whose only possible values are 0 and 1,
X"(X , X , X )@ is a vector of explanatory variables, b"(b , b , b )@ is a vec1 2 3
1 2 3
tor of parameters, and U is the standard normal CDF. As in Horowitz (1994),
X consists of constant, X , and two covariates, X X , whose values are
1
2 3
sampled independently from the N(0, 1) distribution and are the same in
all Monte Carlo replications. Each Monte Carlo replication consists of generating an estimation data set of size n"100 by random sampling from the model
(2.1) with a given value of b, estimating b by maximum likelihood, and
computing the IM test statistic using the full vector of indicators. In other
words, all distinct elements of the Hessian and outer product information
matrices are tested simultaneously. The 0.05-level Type I critical value is the 0.95
quantile of the empirical distribution of the IM statistic obtained in 1000 such
replications.
Type I critical values were calculated for the grid of b values obtained by
allowing each component of b to range over the interval [0, 2] in units of 0.2.
The 0.05 Type I critical values varied from 11.95 to 83.56, depending on the

380

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

point in the grid. For example, "xing b and b at 0 and letting b vary yields the
2
3
1
following critical values:
b
0
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
1
Crit. Val. 11.95 12.45 14.30 17.08 20.96 26.35 34.92 47.98 67.02 77.60 83.56
The asymptotic critical value is 11.07. Thus, "rst-order asymptotic theory
provides a very poor approximation to the Type I critical value at most points in
H . The main reason is that Studentization of the test statistic entails estimation
0
of the covariance matrix of the covariance matrix of a parameter estimator
(Chesher and Spady, 1991). Estimators of such higher-order moments of a distribution are often very imprecise.

3. Size-corrected critical values
When ¹ is not pivotal, the size of a test based on ¹ can be calculated and
n
n
controlled if H is a su$ciently small set. Suppose that H is rejected if
0
0
D ¹ D'c for some c and accepted otherwise. The size of this test is a and the
n
na
na
a-level, size-corrected critical value is c if
na
sup P(D¹ D'c )"a.
n
na
h|H0
The size can be interpreted as the supremum of the power function when H is
0
true.
The a-level Type I critical value cannot exceed the a-level size-corrected
critical value. That is, z )c . This follows from the de"nition of the size of
na
na
a test. Moreover, the probability of a Type I error is less than or equal to a when
the test uses the a-level size-corrected critical value, whereas it is exactly a with
the Type I critical value:
P (D¹ D'c ))P (D¹ D'z )"a
n
na
n
na
for hH3H . If ¹ is pivotal, the Type I and size-corrected critical values are
0
n
equal. Otherwise, they are unequal except in special cases. The result that
z )c is illustrated in Example 1. If H contains the grid used in that example,
na
na
0
the 0.05-level size-corrected critical value must be at least 83.56. This value
exceeds all of the other Type I critical values in the grid. A further important
implication of z )c is that the power of a test based on c cannot exceed and
na
na
na
may be much smaller than the power of a test based on z . This point is
na
discussed further in Section 5.
The advantage of the size of a test in comparison to the probability of a Type
I error is that the size and the size-corrected critical value do not depend on

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

381

h3H . Therefore, a size-corrected test can be implemented without knowledge
0
of h. The a-level size-corrected critical value is the solution to a nonlinear
optimization problem and usually cannot be calculated analytically. In principle, it can be estimated with arbitrary accuracy by numerical methods. The
feasibility of doing this in practice depends on the dimension of H . Numerical
0
estimation of the size-corrected critical value is feasible if the dimension is low
and almost certainly infeasible if it is in"nite.
In addition to the di$culty of computing size-corrected critical values, sizecorrected tests (that is, tests based on size-corrected critical values) have two
fundamental problems. First, the size-corrected critical value may be in"nite, in
which case a test based on the size concept has no power. Second, even if the
size-corrected critical value is "nite, the power of the test may be the same as its
size. Dufour (1997) gives several examples of the "rst problem. Bahadur and
Savage (1956) demonstrate the second for the case of testing a hypothesis about
a population mean. Both problems are consequences of having a null-hypothesis set H that is too large. Dufour (1997) has shown that this happens in a wide
0
variety of settings that are important in applied econometrics. Thus, even if
computation of a size-corrected critical value is not an issue, size-corrected tests
are available only when H is a su$ciently small set. Usually, this is accomp0
lished by restricting F( ) , h) to a suitably small, "nite-dimensional family of
functions.
Size-corrected critical values are rarely computed in practice, possibly because of the foregoing computational and conceptual problems. However, many
papers in econometrics claim to report size-corrected critical values, usually as
the results of Monte Carlo experiments designed to compare the powers of
competing tests. In most of these papers, what is really computed is the exact
Type I critical value at a speci"c and essentially arbitrary h in H .1 The
0
empirical relevance of such a critical value is questionable at best. Because
the chosen h value is arbitrary, the critical value has no empirical analog
and cannot be implemented in an application. Except, perhaps, in special cases,
the chosen h does not correspond to the true DGP in an application if H is true
0

1 A search of Econometrica, the Journal of Econometrics, and Econometric Theory yielded the
following examples of papers that do this: Abadir (1995), Baltagi et al. (1992), Bierens (1997),
Burridge and Guerre (1996), Cavanagh et al. (1995), Cheung and Lai (1997), Choi and Ahn (1995),
Choi et al. (1997), Davidson and MacKinnon (1984), Donald (1997), Durlauf (1991), Elliot et al.
(1996), Gri$ths and Judge (1992), Evans and King (1985), Hansen (1996), Hassan and Koenker
(1997), Haug (1996), Herce (1996), Hidalgo and Robinson (1996), Hong and White (1995), Hong
(1996), King (1985), Lee (1992), Lee et al. (1993), Lieberman (1997), Lo et al. (1989), LuK tkepohl and
Poskitt (1996), MacKinnon and White (1985), Nankervis and Savin (1985), Ng and Perron (1997),
Perron (1997), Rahman and King (1996), Saikkonen and Luukkonen (1997), Staiger and Stock
(1997), Toda (1995), Tyoda and Ohtani (1986), Tsurumi and She#in (1985), and Whang (1998).

382

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

and is not relevant to testing H under whatever DGP has generated the data if
0
H is false.
0
We close this section by re-emphasizing that critical values obtained from the
asymptotic distribution of a test statistic do not approximate size-corrected
critical values in general, regardless of how large n is. For many important tests,
the size-corrected critical value is in"nite for all "nite values of n (Dufour, 1997;
De Jong et al., 1992). The asymptotic critical values of these tests are "nite,
however. Thus, there is no sense in which the asymptotic critical value approximates the size-corrected critical value. Instead, when n is su$ciently large, the
asymptotic critical value approximates the Type I critical value at the point
h corresponding to the true DGP. This point has also been made by Beran
(1988).

4. The bootstrap critical value when the null is true
The bootstrap provides a way to obtain approximations to the Type I critical
value of a test and the probability of a Type I error that are more accurate than
the approximations of "rst-order asymptotic distribution theory. The bootstrap
does this by using the information in the sample to estimate h and, thereby,
G [ ) , F( ) , hH)]. The estimator of G [ ) , F( ) , hH)] is G ( ) , F ) where
n
n
n
n
F (x)"F(x, h ) and h is an n1@2-consistent estimator of h3H . The idea is that
n
n
n
0
h has a high probability of being close to hH when H is true. Therefore, F is
n
0
n
close to F( ) , hH). The bootstrap estimator of the a-level Type I critical value for
D¹ D, zH , solves G (zH , F ) ! G (!zH , F )"1!a. Thus, the bootstrap esna n
n
n na n
n na
timator of the Type I critical value is, in fact, the exact Type I critical value under
sampling from the distribution whose CDF is F( ) , h ).
n
Usually, G ( ) , F ) and zH cannot be evaluated analytically. They can, howna
n
n
ever, be estimated with arbitrary accuracy by carrying out a Monte Carlo
experiment in which random samples are drawn from F . Although the bootn
strap is usually implemented by Monte Carlo simulation, its essential characteristic is the use of F to approximate F in G [ ) , F( ) , hH)], not the method that is
n
n
used to evaluate G ( ) , F ). From this perspective, the bootstrap is an analog
n
n
estimator in the sense of Manski (1988); it simply replaces the unknown F with
a sample analog.
The bootstrap usually provides a good approximation to G [ ) , F( ) , hH)] and
n
z if n is su$ciently large. This is because under mild regularity conditions,
na
sup D F (x)!F(x, hH) D and sup D G (q, F ) ! G [q, F( ) , hH)] D converge to zero
x n
q n
n
n
in probability or almost surely. Of course, "rst-order asymptotic distribution
theory also provides a good approximation if n is su$ciently large. It turns out,
however, that if ¹ is asymptotically pivotal and certain technical conditions are
n
satis"ed, the bootstrap approximations are more accurate than those of "rstorder asymptotic theory. See Hall (1992) for the details.

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

383

In particular, the bootstrap is more accurate than "rst-order asymptotic
theory for estimating distribution of a &smooth' asymptotically pivotal statistic.
It can be shown that
P(D¹ D'zH )"a#O(n~2)
na
n
when H is true and regularity conditions are satis"ed. See, for example,
0
Hall (1992). Thus, with the bootstrap critical value and under regularity
conditions, the di!erence between the true and nominal probabilities that
a symmetrical test makes a Type I error is O(n~2) if the test statistic is
asymptotically pivotal. In contrast, when a critical value based on "rst-order
asymptotic theory is used, the di!erence is O(n~1). This ability to improve upon
"rst-order asymptotic approximations makes the bootstrap an attractive
method for estimating Type I critical values. Horowitz (1997) presents implementation details and the results of Monte Carlo experiments showing that the
use of bootstrap critical values can dramatically reduce the di!erence between
the true and nominal probability that a test makes a Type I error.2

5. Critical values when the null hypothesis is false
The discussion of Type I critical values up to this point has assumed that H is
0
true so that there is a hH3H corresponding to the true DGP. The exact, a-level
0
Type I critical value of a symmetrical test based on the statistic ¹ is the 1!a
n
quantile of the distribution of D¹ D that is induced by the DGP corresponding to
n
h"hH3H . When H is false, there is no hH3H that corresponds to the true
0
0
0
DGP, so it is not immediately clear what h value should be used to obtain the
exact Type I critical value. In this section, we propose using a speci"c h called
the pseudo-true value. The bootstrap estimates the exact Type I critical value at
the pseudo-true value of h. Therefore, when H is false, the bootstrap provides
0
an empirical analog of a test based on the exact Type I critical value evaluated at
the pseudo-true value of h.
The problem discussed in this section does not arise with size-corrected
critical values or asymptotic critical values for asymptotically pivotal test
statistics. This is because size-corrected critical values and asymptotic critical

2 Regularity conditions under which the bootstrap provides O(n~2) accuracy are satis"ed in many
but not all settings of interest in econometrics. The conditions are satis"ed, for example, by ®ular'
maximum-likelihood and method-of-moments estimators. When the regularity conditions are not
satis"ed, the accuracy provided by the bootstrap may be worse than O(n~2) or the bootstrap may be
inconsistent. Satisfaction of the conditions for achieving O(n~2) accuracy does not guarantee good
numerical performance of the bootstrap in an application, but the results of many Monte Carlo
experiments suggest that the bootstrap performs very well in many settings that are important in
applied econometrics.

384

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

values for asymptotically pivotal statistics do not depend on h. Thus, the
problem discussed in this section arises only in connection with higher-order
approximations to the Type I critical value such as that provided by the
bootstrap.
Section 5.1 de"nes the pseudo-true parameter value and describes the behavior of the bootstrap when H is false. Section 5.2 explains how critical values
0
based on the pseudo-true parameter value can be obtained in Monte Carlo
studies of the powers of tests. Section 5.3 provides a numerical illustration of the
relative powers of tests based on size-corrected critical values and Type I critical
values evaluated at the pseudo-true h.
5.1. The Type I critical value when H0 is false
When H is false the true parameter value hH is in the complement of H in H.
0
0
There is no h3H that corresponds to the true DGP. What value of h should
0
then be used to de"ne the Type I critical value? If H is simple, H consists of
0
0
a single point, and this point can be used to de"ne the Type I critical value
whether H is true or false. If H is composite, however, there are many points
0
0
in H, and it is not clear which of them, if any, should be used to de"ne the Type
I critical value. Our solution to this problem is guided by two considerations.
First, in an application it is not known whether H is true or false. Second, it
0
must be possible to estimate the chosen h value consistently, regardless of
whether H is true or false.
0
As has already been explained, we recommend using the bootstrap to estimate
the Type I critical value. Therefore, it is useful to consider how the bootstrap
operates when H is false. Let the test statistic be ¹ , and let its CDF under the
0
n
true DGP be G [ ) , F( ) , hH)]. As has already been explained, the bootstrap
n
estimator of the Type I critical value is obtained from the distribution whose
CDF is G [ ) , F( ) , h )], where h is an estimator of h. Under regularity conditions
n
n
n
(see, e.g. White, 1982; Amemiya, 1985) h converges in probability or almost
n
surely as nPR to a nonstochastic limit h0 and n1@2(h !h0)"O (1). If H is
n
1
0
true, then h0"hH. Otherwise, h0OhH in general and will be called the pseudotrue value of h. Regardless of whether H is true, the bootstrap computes the
0
distribution of ¹ at a point h that is a consistent estimator of h0. Therefore, it is
n
n
natural to let h0 be the parameter point used to calculate the Type I critical value
when H is false. Doing this insures that h0 and the Type I critical value are
0
consistently estimable (e.g. by the bootstrap) regardless of whether H is true.
0
Indeed, it can be shown that when H is false, the bootstrap provides a higher0
order approximation to the Type I critical value based on h0 (Horowitz, 1994).
The Type I critical value based on h0 is related in a straightforward way to the
critical value obtained from "rst-order asymptotic approximations. If ¹ is
n
asymptotically pivotal with asymptotic CDF G ( ) ) and G satis"es mild
=
n
smoothness conditions, then as nPR, G [q, F( ) , h0)]PG (q) almost surely or
n
=

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

385

in probability uniformly over q. Therefore, the critical value based on h0 converges almost surely or in probability to the asymptotic critical value as nPR.
The pseudo-true value may or may not be in H . It is in H if the restrictions
0
0
of H are imposed on the estimator h . Otherwise, h0 is not in H . It is of little
0
n
0
importance whether h0 is in H when H is false. The important considerations
0
0
are twofold: (a) the h value used to obtain the Type I critical value coincides with
hH when H is true and has an empirical analog that can be implemented in
0
applications regardless of whether H is true, and (b) the resulting test has good
0
power in comparison to alternatives when H is false. We have already ex0
plained that h0 satis"es the "rst of these conditions. Moreover, convergence of
the Type I critical value based on h0 to the asymptotic critical value insures that
a test based on this Type I critical value inherits any asymptotic optimality
properties of a test based on the asymptotic critical value. Section 5.3 provides
numerical illustrations of the "nite sample power of tests based on Type I critical
values with h"h0.
5.2. Obtaining Type I critical values in Monte Carlo studies of power
Because the Type I critical value based on h0 has an empirical analog that can
be implemented in applications, it is more attractive for use in Monte Carlo
studies of power than the so-called but misnamed &size-corrected' critical values
that are often used in such studies. As was explained in Section 3, the critical
values obtained in most Monte Carlo studies of power have no empirical analog
and, therefore, are irrelevant to applications. The bootstrap method for obtaining Type I critical values that has been proposed in this paper can be implemented in Monte Carlo studies by incorporating bootstrap estimation of the
critical value into the Monte Carlo procedure. That is, at each Monte Carlo
replication, the bootstrap can be used to estimate the critical value conditional
on the data set generated at that replication. This critical value is then used to
decide whether H is accepted or rejected in the replication. The bootstrap
0
method can also be implemented by using numerical techniques to compute
h0 for the DGP under consideration and then using the methods of conventional
Monte Carlo studies to estimate the exact, "nite-sample critical value at h0.
Horowitz (1994) provides illustrations of both of these ways of obtaining critical
values in a Monte Carlo study of power.
5.3. A numerical illustration of power
This section presents a numerical example that illustrates:
(1) The relation between the power of an exact test based on the Type I critical
value at h"h0 and the same test based on the bootstrap critical value, and

386

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

(2) the large loss of power that can result from using the size-corrected critical
value instead of the Type I critical value based at h"h0.
Example 2. The powers of tests based on Type I and size-corrected critical values.
This example uses Monte Carlo simulation to compute the power of the
Chesher (1983) and Lancaster (1984) version of the IM test of a binary probit
model. Power is computed using the Type I critical value at the pseudo-true
parameter value, the bootstrap critical value, and a version of the size-corrected
critical value.
The model tested in this example is (2.1). Data are generated by simulation
from two di!erent models. These are
P(>"1DX"x)"U(b@xe~0.75b{x)

(5.1)

and
P(>"1DX"x)"U(b@x#1.5x x ).
(5.2)
2 3
In each model, X is as described in Example 1, and b"(0.5, 1, 1)@, In (5.2),
x and x are the second and third components of x. The sample size is n"100.
2
3
Model (5.1) is a heteroskedastic binary probit model. It is equivalent to the
model >"I(b@X#;'0), where I is the indicator function and ; is an
unobserved random variable whose distribution is N(0, eb{X). When data are
generated from (5.1), model (2.1) is misspeci"ed because it does not take account
of heteroskedasticity. When data are generated from (5.2), model (2.1) is misspeci"ed because it does not take account of the interaction term X X .
2 3
The powers of the Chesher}Lancaster IM test based on Type I critical values
at h0 and on bootstrap critical values are reported in Horowitz (1994), who also
gives further details of the simulation procedure. The results are:
Power with
Model
(5.1)
(5.2)

Type I critical
value
0.715
0.925

Bootstrap critical
value
0.667
0.875

Size-corrected
critical value
0.000
0.000

As expected, the powers based on Type I and bootstrap critical values are
close.
Now consider the power based on the size-corrected critical value. The results
of Example 1 show that this critical value is at least 83.56 if the parameter set for
b is at least as large as [0, 2]3. The table above shows that with this critical value,
the power of the test is zero against both alternatives. This result can easily be
understood. The Type I critical values at h0 are 21.7 and 23.4 with models (5.1)
and (5.2), respectively. The low power of the size-corrected test results from its
use of a critical value that is much larger than the Type I critical value.

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

387

6. Conclusions
This paper has been concerned with the choice of critical values for tests of
statistical hypotheses. Two types of critical values have been de"ned, the Type
I critical value and the size-corrected critical value. These critical values are
conceptually distinct and their numerical values are unequal except in special
cases that usually entail strong distributional assumptions. The Type I critical
value usually depends on details of the data generation process that are unknown in applications, so it cannot be evaluated exactly in applications. However, it can often be estimated very accurately by using the bootstrap. It can also
be estimated by using "rst-order asymptotic distribution theory, but the resulting approximation can be highly inaccurate in samples of practical size. The
size-corrected critical value does not depend on unknown population parameters, so it does not require making asymptotic approximations that may be
inaccurate. It can, however, be di$cult to compute, possibly intractably so.
More importantly, the size-corrected critical value is not necessarily "nite. Even
when it is "nite, the power of a size-corrected test is likely to be lower than the
power of the same test with the Type I critical value. Often, the power of
a size-corrected test is equal to or less than its size. We conclude, therefore, that
tests based on Type I critical values are preferable to size-corrected tests. Such
tests can often be implemented empirically with the bootstrap or, when the
sample size is su$ciently large, by use of "rst-order asymptotic approximations.
This paper has also pointed out that the methods used to obtain the so-called
size-corrected critical values in most Monte Carlo studies of the "nite-sample
powers of tests cannot be implemented in applications. The critical values
obtained in these studies are irrelevant to empirical research. Because the
methods proposed in this paper can be implemented in applications, it seems to
us that they should replace existing methods for obtaining critical values in
Monte Carlo studies of the powers of tests.

Acknowledgements
We thank Art Goldberger, Beth Ingram, Tom Rothenberg, and Paul Weller
for comments on an earlier draft of this paper. The research of Joel L. Horowitz
was supported in part by NSF grant SBR-9617925.

References
Abadir, K.M., 1995. A new test for nonstationarity against the stable alternative. Econometric
Theory 11, 81}104.
Amemiya, T., 1985. In: Advanced Econometrics. Harvard University Press, Cambridge.

388

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

Bahadur, R.R., Savage, L.J., 1956. The nonexistence of certain statistical procedures in nonparametric problems. Annals of Mathematical Statistics 27, 1115}1122.
Baltagi, B.H., Chang, Y.-J., Li, Q., 1992. Monte carlo results on several new and existing test for the
error component model. Journal of Econometrics 54, 95}120.
Beran, R., 1988. Prepivoting test statistics: a bootstrap view of asymptotic re"nements. Journal of
the American Statistical Association 83, 687}697.
Bierens, H.J., 1997. Testing the unit root with drift hypothesis against nonlinear trend stationarity, with an application to the U.S. price level and interest rate. Journal of Econometrics 81,
29}64.
Burridge, P., Guerre, E., 1996. The limit distribution of level crossings of a random walk, and
a simple unit root test. Econometric Theory 12, 705}723.
Cavanagh, C.L., Elliott, G., Stock, J.H., 1995. Inference in models with nearly integrated regressors.
Econometric Theory 11, 1131}1147.
Chesher, A., 1983. The information matrix test. Economics Letters 13, 45}48.
Chesher, A., Spady, R., 1991. Asymptotic expansions of the information matrix test statistic.
Econometrica 59, 787}815.
Choi, I., Ahn, B.C., 1995. Testing for cointegration in a system of equations. Econometric Theory 11,
952}983.
Choi, I., Park, J.Y., Yu, B.C., 1997. Cannonical cointegration regression and testing in the presence
of I(1) and I(2) variables. Econometric Theory 13, 850}876.
Chueng, Y.-W., Lai, K.S., 1997. Bandwidth selection, prewhitening, and the power of the Phillips}Perron test. Econometric Theory 13, 679}691.
Davidson, R., MacKinnon, J.G., 1984. Convenient speci"cation tests for logit and probit models.
Journal of Econometrics 25, 241}262.
De Jong, D.N., Nankervis, J.C., Savin, N.E., Whiteman, C.H., 1992. The power problems of unit root
tests in time series with autoregressive errors. Journal of Econometrics 53, 323}343.
Donald, S.G., 1997. Inference concerning the number of factors in a multivariate nonparametric
relationship. Econometrica 65, 103}132.
Dufour, J.-M., 1997. Some impossibility theorems in econometrics with applications to structural
and dynamic models. Econometrica 65, 1365}1387.
Durlauf, S.N., 1991. Spectral based testing of the martingale hypothesis. Journal of Econometrics 50,
355}376.
Elliott, G., Rothenberg, T.J., Stock, J.H., 1996. E$cient test for an autoregressive unit root.
Econometrica 64, 813}836.
Evans, M.A., King, M.L., 1985. A point optimal test for heteroskedastic disturbances. Journal of
Econometrics 27, 163}178.
Gri$ths, W., Judge, G., 1992. Testing and estimating location vectors when the error covariance
matrix is unknown. Journal of Econometrics 54, 96}120.
Hall, P., 1992. The Bootstrap and Edgeworth Expansion. Springer, New York.
Hansen, B.E., 1996. Inference when a nuisance parameter is not identi"ed under the null hypothesis.
Econometrica 64, 413}430.
Hasan, N.N., Koenker, R.W., 1997. Robust rank tests of the unit root hypothesis. Econometrica 65,
1033}1162.
Haug, A.A., 1996. Tests for cointegration: a Monte Carlo comparison. Journal of Econometrics 71,
89}116.
Herce, M.A., 1996. Asymptotic theory of LAD estimation in a unit root process with "nite variance
errors. Econometric Theory 12, 129}153.
Hidalgo, J., Robinson, P., 1996. Testing for structural change in long-memory environment. Journal
of Economics 70, 159}174.
Hong, Y., 1996. Consistent testing for serial correlation of unknown form. Econometrica 64,
837}864.

J.L. Horowitz, N.E. Savin / Journal of Econometrics 95 (2000) 375}389

389

Hong, Y., White, H., 1995. Consistent speci"cation testing via nonparametric series regression.
Econometrica 63, 1133}1160.
Horowitz, J.L., 1994. Bootstrap-based critical values for the information matrix test. Journal of
Econometrics 61, 395}411.
Horowitz, J.L., 1997. Bootstrap methods in econometrics: theory and numerical performance. In:
Kreps, D.M., Wallis, K.F. (Eds.) Advances in Economics and Econometrics: Theory and
Applications, Vol. 3. Cambridge University Press, New York, pp. 188}222.
King, M.L., 1985. A point optimal test for autoregressive disturbances. Journal of Econometrics 27,
21}38.
Lancaster, T., 1984. The covariance matrix of the information matrix test. Econometrica 52,
1051}1053.
Lee, H.S., 1992. Maximum likelihood inference on cointegration and seasonal cointegration. Journal
of Econometrics 54, 1}48.
Lee, T.H., White, H., Granger, C.W.J., 1993. Testing for neglected nonlinearity in time series models.
Journal of Econometrics 56, 269}290.
Lieberman, O., 1997. The e!ect of nonnormality. Econometric Theory 13, 79}118.
Lo, A.W., MacKinlay, A.C., 1989. The size and power of the variance ratio test in "nite samples:
a Monte Carlo investigation. Journal of Econometrics 40, 203}238.
LuK tkepohl, H., Poskitt, D.S., 1996. Testing for causation using in"nite order vector autoregressive
processes. Econometric Theory 12, 61}89.
MacKinnon, J.G., White, H., 1985. Some heteroskedasticity-consistent covariance matrix estimators
with improved "nite sample properties. Journal of Econometrics 29, 305}326.
Manski, C.F., 1988. Analog Estimation Methods in Econometrics Chapman & Hall, New York.
Nankervis, J.C., Savin, N.E., 1985. Testing the autoregressive parameter with the-statistic. Journal of
Econometrics 27, 143}162.
Ng. S., Perron, P., 1997. Estimation and inference in nearly balanced nearly cointegrated systems.
Journal of Econometrics 79, 53}82.
Perron, P., 1997. Further evidence on breaking trend functions in macroeconomic variables. Journal
of Econometrics 80, 355}388.
Rahman, S., King, M.L., 1996. Marginal likelihood score-based tests of regression disturbances in
the presence of nuisance parameters. Journal of Econometrics 82, 81}106.
Saikkonen, P., Luukkonen, R., 1997. Testing cointegration in in"nite order vector autoregression
processes. Journal of Econometrics 81, 93}126.
Staiger, D., Stock, J.H., 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557}586.
Toda, H., 1995. Finite sample performance of likelihood ratio tests for cointegrating ranks in vector
autoregressions. Econometric Theory 11, 1015}1032.
Toyoda, T., Ohtani, K., 1986. Testing equality between sets of coe$cients after a preliminary test for
equality of disturbance variances in two linear regressions. Journal of Econometrics 31, 67}80.
Tsurumi, H., She#in, N., 1985. Some tests for constancy of regressions under heterskedasticity.
Journal of Econometrics 27, 221}234.
Whang, Y.-J., 1998. A test of autocorrelation in the presence of heteroskedasticity of unknown form.
Econometric Theory 14, 87}122.
White, H., 1982. Maximum likelihood estimation of misspeci"ed models. Econometrica 50, 1}26.