3. What if you have multiway clustering and few clusters? Sometimes we are worried about multiway clustering but one or both of the ways
has few clusters. Currently, we are not aware of an ideal approach to deal with this problem. One potential solution is to try to add suffi cient control variables so as to
minimize concerns about clustering in one of the ways, and then use a one- way few- clusters cluster robust approach on the other way. Another potential solution is to
model one of the ways of clustering in a parametric way, such as with a common shock or an autoregressive error model. Then you can construct a variance estimator that is a
hybrid of the parametric model and cluster robust in the remaining dimension.
VII. Extensions
The preceding material has focused on the OLS and FGLS estimator and tests on a single coeffi cient. The basic results generalize to multiple hypothesis
tests, instrumental variables IV estimation, nonlinear estimators, and generalized method of moments GMM.
These extensions are incorporated in Stata though Stata generally computes test p- values and confi dence intervals using standard normal and chi- squared distributions
rather than T and F distributions. And for nonlinear models, stronger assumptions are needed to ensure that the estimator of
 retains its consistency in the presence of
clustering. We provide a brief overview.
A. Cluster- Robust F- tests
Consider Wald joint tests of several restrictions on the regression parameters. Except in the special case of linear restrictions and OLS with iid normal errors, asymptotic
theory yields only a chi- squared distributed statistic, such as W, that is
2
h distrib-
uted where h is the number of linearly independent restrictions. Alternatively, we can use the related F statistic, F = W h. This yields the same p- value
as the chi- squared test if we treat F as being F h,
∞ distributed. In the cluster case, a
fi nite- sample adjustment instead treats F as being Fh, G – 1 distributed. This is analo- gous to using the TG – 1 distribution rather than N[0,1] for a test on a single coeffi cient.
In Stata, the fi nite- sample adjustment of using the TG – 1 for a t- test on a single coeffi cient, and using the Fh,G – 1 for an F- test, is only done after OLS regression
with command regress. Otherwise, Stata reports critical values and p- values based on the N[0,1] and
2
h distributions.
Thus, Stata does no fi nite- cluster correction for tests and confi dence intervals fol- lowing instrumental variables estimation commands, nonlinear model estimation com-
mands, or even after command regress in the case of tests and confi dence intervals us- ing commands testnl and nlcom. The discussion in Section VI was limited to inference
after OLS regression, but it seems reasonable to believe that for other estimators one should also base inference on the TG – 1 and Fh,G – 1 distributions, and even then
tests may overreject when there are few clusters.
Some of the few- cluster methods of Section VI can be extended to tests of more than one restriction following OLS regression. The Wald test can be based on the bias-
adjusted variance matrices CR2VE or CR3VE, rather than CRVE. For a bootstrap with asymptotic refi nement of a Wald test of
H : R = r,
in the b
th
resample we compute W
b
= R ˆ 
b
− R ˆ ′[R ˆV
clu
[ ˆ 
b
] ′
R ]
−1
R ˆ 
b
− R ˆ
. Extension of the data- determined de- grees of freedom method of Section VID to tests of more than one restriction requires,
at a minimum, extension of Theorem 4 of Bell and McCaffrey 2002 from the case that covers
 , where
 is a single component of
 , to
R
. An alternative ad hoc ap- proach would be to use the
Fh, v distribution where
v is an average possibly
weighted by estimator precision of v defi ned in Equation 26 computed separately for each exclusion restriction.
For the estimators discussed in the remainder of Section VII, the rank of ˆ
V
clu
[ ˆ ]
is again the minimum of G – 1 and the number of parameters K. This means that at
most G – 1 restrictions can be tested using a Wald test, in addition to the usual require- ment that
h ≤ K .
B. Instrumental Variables Estimators