Empirical Example Cluster- Randomized Experiments

to account for the clustered nature of the data. And so when designing these experi- ments, they should also account for clustering. Fitzsimons et al. 2012 uses a wild cluster bootstrap in an experiment with 12 treated and 12 control clusters. Traditional guidance for computing power analyses and minimum detectable effects see Dufl o, Glennerster, and Kremer 2007, pp. 3918–22; Hemming and Marsh 2013 are based on assumptions of either independent errors or, in a clustered setting, a ran- dom effects common- shock model. Ideally, one would account for more general forms of clustering in these calculations the types of clustering that motivate cluster- robust variance estimation, but this can be diffi cult to do ex ante. If you have a data set that is similar to the one you will be analyzing later, then you can assign a placebo treat- ment and compute the ratio of cluster- robust standard errors to default standard errors. This can provide a sense of how to adjust the traditional measures used in design of experiments.

VIII. Empirical Example

In this section, we illustrate the most common applications of cluster- robust inference. There are two examples. The fi rst is a Moulton- type setting that uses individual- level cross section data with clustering on state. The second is the Bertrand et al. 2004 example of DiD in a state- year panel with clustering on state and poten- tially with state fi xed effects. The microdata are from the March CPS, downloaded from IPUMS- CPS King et al. 2010. We use data covering individuals who worked 40 or more weeks during the prior year and whose usual hours per week in that year was 30 or more. The hourly wage is constructed as annual earnings divided by annual hours usual hours per week times number of weeks worked, defl ated to real 1999 dollars, and observations with real wage in the range 2, 100 are kept. The cross- section example uses individual- level data for 2012. The panel example uses data aggregated to the state- year level for 1977 to 2012. In both cases, we esti- mate log- wage regressions and perform inference on a generated regressor that has zero coeffi cient. Specifi cally, we test H : ␤ = 0 using w = ˆ ␤ s ˆ ␤ . For each example, we present results for a single data set before presenting a Monte Carlo experiment that focuses on inference when there are few clusters. We contrast various ways to compute standard errors and perform Wald tests. Even when using a single statistical package, different ways to estimate the same model may lead to different empirical results due to calculation of different degrees of freedom, especially in models with fi xed effects, and due to uses of different distributions in computing p- values and critical values. To make this dependence clear, we provide the particular Stata command used to obtain the results given below; similar issues arise if alternative statistical packages are employed. The data and accompanying Stata code Version 13 are available at our websites.

A. Individual- Level Cross- Section Data: One Sample