Statistics authors titles recent submissions

arXiv:1711.06399v1 [math.ST] 17 Nov 2017

Average treatment effects in the presence of
unknown interference∗
Fredrik Sävje†

Peter M. Aronow‡

Michael G. Hudgens§

fredrik.savje@yale.edu

peter.aronow@yale.edu

mhudgens@bios.unc.edu

November 20, 2017
Abstract
We investigate large-sample properties of treatment effect estimators under unknown interference in
randomized experiments. The inferential target is a generalization of the average treatment effect estimand
that marginalizes over potential spillover effects. We show that estimators commonly used to estimate

treatment effects under no-interference are consistent with respect to the generalized estimand for most
experimental designs under limited but otherwise arbitrary and unknown interference. The rates of
convergence are shown to depend on the rate at which the amount of interference grows and the degree
to which it aligns with dependencies in treatment assignment. Importantly for practitioners, our results
show that if one erroneously assumes that units do not interfere in a setting with limited, or even moderate,
interference, standard estimators are nevertheless likely to be close to an average treatment effect if the
sample is sufficiently large.
Keywords: Causal effects, causal inference, experiments, interference, sutva.

1 Introduction
Investigators of causality routinely assume that subjects under study do not interfere with each other. The
no-interference assumption has been so ingrained in practice of causal inference that its application is often
left implicit. Yet, interference appears to be at the heart of most of the social and medical sciences. Humans
interact, and that is precisely the motivation for much of the research in these fields. Just like ideas spread
from person to person, so do viruses. The prevalence of the assumption is the result of perceived necessity;
it is assumed that our methods require isolated observations. The opening chapter of a recent and muchanticipated textbook on causal inference captures the sentiment well. The authors state that “causal inference
is generally impossible without no-interference assumptions” and leave the issue largely uncommented
thereafter (Imbens & Rubin, 2015, p. 10). This sentiment provides the motivation for the current study.
We investigate to what degree one can weaken the assumption of no-interference and still draw credible
inferences about causal relationships. We find that, indeed, causal inferences are impossible without nointerference assumptions, but the prevailing view severely exaggerates the issue. One must ensure that the

∗ We thank Alex D’Amour, Matt Blackwell, Peng Ding, Naoki Egami, Avi Feller, Lihua Lei and Jas Sekhon for helpful suggestions
and discussions. This work was partially supported by NIH grant R01 AI085073.
† Department of Political Science, Yale University.
‡ Department of Political Science & Department of Biostatistics, Yale University.
§ Department of Biostatistics, University of North Carolina, Chapel Hill.

1

interference is not all-encompassing but can otherwise allow the subjects to interfere in unknown and largely
arbitrary ways.
We focus specifically on estimation of average treatment effects in randomized experiments. A random
subset of a sample of units is exposed to some treatment, and we are interested in the average effect of the
units’ treatments on their own outcomes. The no-interference assumption in this context is the restriction
that no unit’s treatment affects other units’ outcomes. We consider the setting where such spillover effects
exist, and in particular, when the form they may take is left unspecified.
The contribution is two-fold. We first introduce an estimand—the expected average treatment effect or
eate—that generalizes the conventional average treatment effect (ate) to a setting with interference. The
conventional inferential target is not well-defined when units interfere since a unit’s outcome may, in that
case, be affected by more than one treatment. We resolve the issue by marginalizing the effects of interest
over the assignment distribution of the incidental treatments. That is, for a given assignment, we may ask

how a particular unit’s outcome is affected when we change only its own treatment. We may ask the same
for each unit in the experiment, and we may average these effects. This defines an average treatment effect
that remains unambiguous also under interference. However, if we were to repeat the exercise for another
assignment, the results may be different; average effects under this definition depends on which assignment
we use as a reference. We marginalize the effects over all possible reference assignments in order to capture
the typical effect in the experiment, and this gives us eate. The estimand is a generalization of ate in the
sense that they coincide whenever the latter is well-defined.
The second contribution is to demonstrate that eate can be estimated consistently under mild restrictions
on the interference and without structural knowledge thereof. Our focus is on the standard Horvitz-Thompson
and Hájek estimators. The results also pertain to the widely used difference-in-means and ordinary least
squares estimators, as they are special cases of the Hájek estimator. We initially investigate the Bernoulli and
complete randomization experimental designs, and show that the estimators are consistent for eate as long as
the average amount of interference grows at a sufficiently slow rate (according to a measure we define shortly).
Root-n consistency is achieved whenever the average amount of interference is bounded. We continue with
an investigation of the paired randomization design. The design introduces perfectly correlated treatment
assignments, and we show that this can make the estimators unstable even when interference is sparse. We
must restrict the degree to which the dependencies introduced by the experimental design align with the
interference structure in order to achieve consistency. Information about the interference structure beyond
these aggregated restrictions is, however, still not needed. The insights from the paired design extend to a
more general setting, and we show that similar restrictions must be enforced for consistency under arbitrary

experimental designs.
Our findings are of inherent theoretical interest as they move the limits for casual inference under
interference. They are also of practical interest. The results pertain to standard estimators under standard
experimental designs. As such, they apply to many previous studies where interference might have been
present, but where it was assumed not to be. In particular, our results show that studies that mistakenly
assume that units are isolated might not necessarily be invalidated. The reported estimates are likely to be
close to the expected average treatment effects if the samples are sufficiently large.

2 Related work
Our investigation builds on a young, but quickly growing, literature on causal inference under interference
(see Halloran & Hudgens, 2016, for a recent review). The no-interference assumption itself is due to Cox

2

(1958). The iteration that is most commonly used today was, however, formulated by Rubin (1980) as a part
of the stable unit treatment variation assumption, or sutva. Early departures from this assumption are modes
of analysis inspired by Fisher’s exact randomization test (Fisher, 1935). This approach employs “sharp” null
hypotheses that stipulates the outcome of all units under all assignments. The most common such hypothesis
is simply that treatment is inconsequential so the observed outcomes are constant over all assignments. As
this subsumes that both primary and spillover effects are non-existent, the approach tests for the existence of

both types of effects simultaneously. The test has recently been adapted and extended to study interference
specifically (see, e.g., Rosenbaum, 2007; Luo et al., 2012; Aronow, 2012; Bowers et al., 2013, 2016; Athey
et al., 2017; Basse et al., 2017; Choi, 2017).
We are, however, interested in estimation, and Fisher’s framework is not easily adopted to accommodate
that. Early treatments of estimation restricted the interference process through parametric or semi-parametric
models and thereby presumed that interactions took a particular form (Manski, 1993). The structural approach
has been extended so to capture effects under weaker assumptions in a larger class of interference processes
(Lee, 2007; Graham, 2008; Bramoullé et al., 2009). It still has been criticized for being too restrictive
(Goldsmith-Pinkham & Imbens, 2013; Angrist, 2014).
A strand of the literature closer to the current exposition relaxes the parametric assumptions. Interference
is allowed to take arbitrary forms as long as it is contained within known and disjoint groups of units (Sobel,
2006). The assumption is called partial interference. Hudgens & Halloran (2008) was the first in-depth
treatment of the approach, and it has received considerable attention since (see, e.g., VanderWeele & Tchetgen
Tchetgen, 2011; Tchetgen Tchetgen & VanderWeele, 2012; Liu & Hudgens, 2014; Rigdon & Hudgens, 2015;
Kang & Imbens, 2016; Liu et al., 2016; Basse & Feller, 2017). While some progress can be made simply
from the fact that interference is isolated to disjoint groups, the assumption is usually coupled with stratified
interference. This additional assumption stipulates that the only relevant aspect of the interference is the
proportion of treated units in one’s group (i.e., the identities of the treated units are inconsequential). Much
like the structural approach, stratified interference severely restricts the form the interference can take.
Recent contributions have focused on relaxing the partial interference assumption. Interference is no

longer restricted to disjoint groups, and units are allowed to interfere along more general structures such
as social networks (see, e.g., Manski, 2013; Toulis & Kao, 2013; Ugander et al., 2013; Basse & Airoldi,
2016; Eckles et al., 2016; Forastiere et al., 2016; Aronow & Samii, 2017; Ogburn & VanderWeele, 2017;
Sussman & Airoldi, 2017). This allows for quite general forms of interference, but the methods generally
require detailed qualitative knowledge about the interference (e.g., that some pairs of units are known not to
interfere).
Two recent studies are particularly relevant to the current investigation. Basse & Airoldi (2017) and
Egami (2017) both consider arbitrary and unknown interference much in the same way as we do. The first
study differs from the current in that the authors focus on a markedly different estimand (as discussed in
the next section). This leads them to conclude that causal effects cannot be investigated under arbitrary and
unknown interference. In the second study, Egami (2017) assumes that the interference can be described
by a set of networks such that the stratified interference assumption holds in each network. This imposes
stronger restrictions than the current approach, but Egami’s framework admits considerably more general
forms of interference than previous methods since the networks are allowed to be overlapping and partially
unobserved.

3

3 Treatment effects under interference
Suppose we have a sample of n units indexed by the set U = {1, 2, · · · , n}. We intervene on the world

in ways that potentially affect the units. Our intervention is described by a n-dimensional binary vector
z = (z1, z2, · · · , zn ) ∈ {0, 1} n . A particular value of z could, for example, denote that we give some drug to a
certain subset of the units in U and withhold it from the others. We are particularly interested in how unit i
is affected when we alter the ith dimension of z. For short, we say that unit i’s treatment is zi .
The effects of different interventions are defined as comparisons between the outcomes they produce.
Each unit has a function yi : {0, 1} n → R denoting the observed outcome for the unit under a specific
(potentially counterfactual) intervention (Neyman, 1923; Holland, 1986). In particular, yi (z) is the response
of i that we would have observed if we intervened on the world according to z. We refer to the elements of
the image of this function as potential outcomes. It will prove convenient to write the potential outcomes in a
slightly different form. Let z−i = (z1, · · · , zi−1, zi+1, · · · , zn ) denote the (n − 1)-element vector constructed by
deleting the ith element from z. The potential outcome yi (z) can then equivalently be written as yi (zi ; z−i ).
We will assume that the potential outcomes are well-defined throughout the paper. The assumption
implies that the way we manipulate z is inconsequential; no matter how we intervene on the world to set z to
a particular value, the outcome is the same. Well-defined potential outcomes also implies that no physical
law or other circumstance prohibits us from intervening on the world so to set z to any value in {0, 1} n . This
ensures that all potential outcomes are, indeed, potential in the sense that there exists an intervention that
can realize them. The assumption does not restrict the way we choose to intervene on the world, and we may
design our experiments so that some interventions are never realized.
We conduct a randomized experiment, so we intervene on the world by setting z according to some
random vector Z = (Z1, · · · , Zn ). The probability distribution of Z is the design of the experiment. The

experimental design is the sole source of randomness we will consider. Let Yi denote the observed outcome
of unit i. Yi is a random variable connected to the experimental design through the potential outcomes,
Yi = yi (Z). Similar to above, Z−i denotes the random vector we get when deleting the ith element from Z.
The observed outcome can, thus, be written as Yi = yi (Zi ; Z−i ).

3.1 Expected average treatment effects
It is conventional to assume that the potential outcomes are restricted so a unit’s outcome is only affected
by its own treatment. That is, for any two assignments z and z′ , if a unit’s treatment is the same for both
assignments, then its outcome is the same; zi = zi′ is assumed to imply yi (z) = yi (z′ ). We say that a sample
satisfies no-interference when the potential outcomes are restricted in this way. The assumption allows us
to define the treatment effect for unit i as the contrast between its potential outcomes when we change its
treatment:
τi = yi (1; z−i ) − yi (0; z−i ),
where z−i is any value in {0, 1} n−1 . No-interference implies that the choice of z−i is inconsequential for the
values of yi (1; z−i ) and yi (0; z−i ). The average of the unit-level effects is the average treatment effect that is
commonly used by experimenters to summarize treatment effects.
Definition 1. Under no-interference, the average treatment effect (ate) is the average unit-level treatment
effect:
n
1X

τi .
τate =
n i=1
4

The definition requires that no-interference holds. Potential outcomes will depend on more than one
treatment when units interfere, and τi may then vary under permutations of z−i in ways we cannot easily
characterize. As a consequence, we can no longer unambiguously talk about the effect of a unit’s treatment.
The ambiguity is contagious; our average treatment effect similarly becomes ill-defined.
To resolve the issue, we redefine the unit-level treatment effect for unit i as the contrast between its potential
outcomes when we change its treatment while holding all other treatments fixed at a given assignment z−i .
We might call this quantity the assignment conditional unit-level treatment effect:
τi (z−i ) = yi (1; z−i ) − yi (0; z−i ).
Our redefined effects still vary with z−i , but we have made the connection explicit. Averaging these effects,
we similarly get a version of the average treatment effect that is well-defined under interference.
Definition 2. An assignment conditional average treatment effect (acate) is the average of the assignment
conditional unit-level treatment effect under a given assignment:
τate (z) =

n

1X
τi (z−i ).
n i=1

The acates are unambiguous under interference, but they are unwieldy. An average effect exists for
each assignment, so the number of effects grows exponentially in the sample size. We do not imagine that
experimenters would find it useful to study individual acates. Similar to how we aggregate the unit-level
effects to an average effect, a summary of the acates will often be more informative.
Definition 3. The expected average treatment effect (eate) is the expected acate:
τeate = E[τate (Z)].
The expectation in the definition of eate is taken over the distribution of Z induced by the experimental
design. Under no-interference, τate (z) does not depend on z, and the marginalization is inconsequential.
eate is, thus, a generalization of ate in the sense that the two estimands coincide whenever no-interference
holds. When units interfere, eate marginalizes the effects over all possible treatment assignments; it asks
what the typical average treatment effect is under the experimental design we actually implement.

3.2 Previous definitions under interference
The eate estimand builds on ideas previously proposed in the literature. An estimand introduced by Hudgens
& Halloran (2008) resolves the ambiguity of treatment effects under interference in a similar fashion. The
authors refer to the quantity as the average direct causal effect, but we opt for another name in an effort to

highlight how it differs from ate and eate.
Definition 4. The average distributional shift effect (adse) is the average difference between the conditional
expected outcomes for the two treatment conditions:
τadse =


n 
1X
E[Yi | Zi = 1] − E[Yi | Zi = 0] .
n i=1

Similar to eate, adse marginalizes the potential outcomes over the experimental design. The estimands
differ in which distributions they use for the marginalization. The expectation in eate is over the unconditional
5

assignment distribution, while adse marginalizes each potential outcome separately over different conditional
distributions. The difference becomes salient when we express the estimands in similar forms:
n


1X
E yi (1; Z−i ) − yi (0; Z−i ) ,
n i=1

n 
1X
E[yi (1; Z−i ) | Zi = 1] − E[yi (0; Z−i ) | Zi = 0] .
=
n i=1

τeate =
τadse

As Hudgens & Halloran (2008) note, similar effects are investigated elsewhere in the causal inference
literature; eate is related to adse in a similar way as the median treatment effect is related to the difference
between median potential outcomes (Lee, 2000).
The two estimands answer different causal questions. eate captures the expected effect of changing a
random unit’s treatment in the current experiment. It is the expected average unit-level treatment effect. adse
is the expected effect of changing from an experimental design where we hold the average unit’s treatment
fixed at Zi = 1 to another design where its treatment is fixed at Zi = 0. That is, the estimand captures the
compound effect of changing a unit’s treatment and simultaneously changing the experimental design (see,
e.g., the discussion in Sobel, 2006). As a result, the estimand may be non-zero even when all unit-level
effects are exactly zero. Expressed in symbols, τadse may be non-zero even when τi (z−i ) = 0 for all i and z−i .
It is possible to define average treatment effects as the contrast between the average outcome when all
Pn
units are treated and the average outcome when no unit is treated: n−1 i=1
[yi (1) − yi (0)], where 1 and 0 refer
to the unit and zero vectors. ate under this definition coincides exactly with the estimand in Definition 1 (and
thus eate) whenever no-interference holds. The definitions are, for this reason, often used interchangeably in
the literature. However, average treatment effects under the alternative definition rarely coincides with eate
under interference, and we are forced to choose our inferential target in that case. The estimands provide
qualitatively different perspectives of the causal setting. eate captures the typical treatment effect in the
experiment actually implemented, while the alternative estimand captures the effect of completely scaling up
or down treatment. Both effects could be of interest, but, as Basse & Airoldi (2017) have established, average
treatment effects under the alternative definition can only rarely be investigated under arbitrary interference.

4 Quantifying interference
Our results do not require detailed structural knowledge on how the units interfere. No progress can, however,
be made if we allow for completely unrestricted interference. We show this formally below; the intuition is
simply that the change of a single unit’s treatment could lead to non-negligible changes in all units’ outcomes
when the interference is unrestricted. We use the following definitions to quantify the amount of interference.
We say that unit i interferes with unit j if changing i’s treatment changes j’s outcome under at least one
treatment assignment. We use Iij to indicate such interference. That is, we define our interference indicator
for any two units i and j as:


1 if y j (z) , y j (z′) for some z, z′ ∈ {0, 1} n such that z−i = z−i
,




Iij = 1 if i = j,


 0 otherwise.


Our concept of interference is directed. Unit i can interfere with unit j without the converse being true. We
have defined the indicator so that Iii = 1 even when a unit’s treatment does not affect its outcome since this
simplifies the subsequent exposition.
6

The interference indicator is defined purely on the potential outcomes. Thus, the definition itself does
not impose any restrictions on how the units may interfere; it is simply a description of the interference
structure in a sample. The indicators may, for this reason, not necessarily align with social networks or other
structures through which units are thought to interact. In fact, we rarely know enough about the interference
to deduce or estimate the value of the indicators. Their role is to allow us to define an aggregated summary
of the interference.
Definition 5 (Average interference dependence).
davg

n
n X
1X
dij ,
=
n i=1 j=1

where

dij =

(

1 if Iℓi Iℓ j = 1 for some ℓ ∈ U,
0 otherwise.

The interference dependence indicator, dij , tells us whether units i and j are affected by a common
treatment. That is to say, i and j are interference dependent if they interfere directly with each other or if
P
some third unit interferes with both i and j. The sum nj=1 dij gives the number of interference dependencies
for unit i, so davg is the unit-average number of interference dependencies. The quantity acts as measure
of how close an experiment is to no-interference. In particular, no-interference is equivalent to davg = 1,
which indicates that units are only interfering with themselves. At the other extreme, davg = n indicates that
interference is complete in the sense that all pairs of units are affected by a common treatment. If sufficiently
many units are interference dependent—in the sense that davg takes on a large value—small perturbation
of the treatment assignments may be amplified by the interference and induce large changes in many units’
outcomes.
The interference dependence concept might appear overly complex. It can, however, easily be related to
simpler quantities. Consider the following definitions:
Ii =

n
X
j=1

Iij ,

Iavg =

n
1X
I i,
n i=1

Imsq =

n
1X
I 2,
n i=1 i

and

Imax = max Ii .
i ∈U

Ii indicates how many units i interferes with. That is, if changing unit i’s treatment would change the
outcome of five other units, i is said to be interfering with six units and Ii = 6 (i.e., the five units and itself).
Information about these quantities is often useful (see, e.g., Aronow et al., 2016), but such knowledge is
beyond our grasp in most experiments. The subsequent three quantities provide a more aggregated description
of the interference. In particular, they are the average (Iavg), mean square (Imsq ) and maximum (Imax ) of the
unit-level variables. The quantities can be shown to bound davg from below and above.
2 ≤ d
2
Lemma 1. Iavg
avg ≤ Imsq ≤ Imax .

The proof of Lemma 1 and all other proofs are given in appendices.

5 Large sample properties
We investigate an asymptotic regime inspired by Isaki & Fuller (1982). Consider a sequence of samples (Un )
indexed by their sample size. We imagine that an experiment is conducted for each sample in the sequence.
All variables related to the experiment have their own sequence also indexed by n. For example, we have a
sequence of potential outcomes and experimental designs. While nearly all variables are indexed by n under
this asymptotic regime, we leave the index implicit as no confusion ensues. Subject to conditions on the
sequences, we will investigate how two common estimators of average treatment effects behave as we let
sample size approach infinity.
7

Definition 6 (Horvitz-Thompson, ht, and Hájek, há, estimators).
τ̂ht

n
n
1X
Zi Yi 1 X
(1 − Zi )Yi
=

,
n i=1 pi n i=1 1 − pi

and

n
X
Zi Yi
=
i=1 pi

τ̂há

,

!
,
!
n
n
n
X
X
Zi
(1 − Zi )Yi X
1 − Zi

,
1 − pi
i=1 pi
i=1
i=1 1 − pi

where pi = Pr(Zi = 1) is the marginal probability that unit i is assigned to treatment condition Zi = 1.

The estimators were first introduced in the sampling literature to estimate population means under unequal
inclusion probabilities (Horvitz & Thompson, 1952; Hájek, 1971). They have since received wide-spread
attention from the causal inference and policy evaluation literatures where they are often referred to as
inverse probability weighting estimators (Hahn, 1998; Hirano & Imbens, 2001; Hirano et al., 2003; Hernán
& Robins, 2006). It can be shown that other estimators commonly used to analyze experiments—such as the
difference-in-means and ordinary least squares estimators—are special cases of the Hájek estimator. As a
consequence, our results apply directly to those estimators as well.
We assume throughout the paper that the experimental design and potential outcomes are well-behaved
as formalized in the following assumption.
Assumption 1 (Regularity conditions). There exist constants k < ∞, q ≥ 2 and s ≥ 1 such that for all i ∈ U:
1a (Probabilistic assignment). k −1 ≤ Pr(Zi = 1) ≤ 1 − k −1 ,


1b (Bounded outcome moments). E |Yi | q ≤ k q ,


1c (Bounded potential outcome moments). E |yi (z; Z−i )| s ≤ k s for both z ∈ {0, 1}.

Remark 1. The exact values of q and s are inconsequential for our results in Section 5.2. The regularity
conditions can, in that case, be written simply with q = 2 and s = 1. However, as we show in the subsequent
section, the rate of convergence for an arbitrary (i.e., worst-case) experimental design depends on which
moments can be bounded. In other words, the rate depends on the values of q and s. The ideal case is when
the potential outcomes themselves are bounded, in which case the assumptions hold as q → ∞ and s → ∞.
Remark 2. Assumption 1b does not imply 1c since there may exists some z such that Pr(Zi = zi |Z−i = z−i ) = 0
while Pr(Z−i = z−i ) > 0. The opposite implication does not hold since s may be smaller than q.

5.1 Limited interference
Under our asymptotic regime, a sequence of davg exists, and it captures the amount of interference in each
sample in our sequence. Our concept of limited interference is formalized as restrictions on this sequence.
Assumption 2 (Restricted interference). davg = o(n).
Assumption 3 (Bounded interference). davg = O(1).
Assumption 2 stipulates that the units, on average, are interference dependent with an asymptotically
negligible fraction of the sample. The assumption still allows for substantial amounts of interference. The
unit-average number of interference dependencies may even grow with the sample size. It is only assumed that
the average does not grow proportionally to the sample size. The total amount of interference dependencies
may, thus, grow at a faster rate than n. Assumption 3 is more restrictive and limits the average number of
interference dependencies to some constant. It allows for interference, but the unit-average cannot grow with
the sample size. The total amount of interference dependencies may still grow proportionally to the sample
size under bounded interference.
8

In addition to restricting the amount of interference, the assumptions impose weak restrictions on the
structure of the interference. In particular, they rule out that the interference is so unevenly distributed that
a few units are interfering with most other units. If the interference is concentrated—in the sense that a few
treatments are affecting many units—small perturbations of the treatment assignments could be amplified
through those treatments. At the extreme, a single unit interferes with all other units, so all units’ outcome
would change if we were to change its treatment. The estimators would not stabilize if the interference is
structured in this way even if the interference otherwise was sparse.
Lemma 1 tells us that we can define our interference restrictions using Imsq or Imax rather than davg . For
example, our results follow if we assume Imsq = o(n) or Imax = o(n0.5 ). These assumptions are, however,
stronger than necessary. There are sequences for which the alternative restrictions do not hold but davg = o(n)
does. The connection is useful as it may be more intuitive to make assumptions about Imsq and Imax rather
than on interference dependencies. We opt to use davg as it affords the most generality.
Assumption 2 is not sufficient for consistency. There exist sequences of experiments for which the
assumption holds but the estimators are not converging to eate with high probability. Restricted interference
is, however, necessary for consistency of the ht and há estimators in the following sense.
Proposition 1 (Necessity of restricted interference). For any sequence of experimental designs, if Assumption
2 does not hold, there exists a sequence of potential outcomes satisfying Assumptions 1b and 1c such that the
ht and há estimators do not converge in probability to eate.
The proposition implies that it is not possible to design an experiment so to avoid issues introduced by
interference if we are completely agnostic about its properties; we must somehow limit the interference in
order to make progress. The proposition also implies that the weakest possible restriction we can impose
on davg for consistency is Assumption 2. If any weaker restriction is imposed, e.g., davg = O(n), potential
outcomes exist (for any experimental design) so that the relaxed interference restriction is satisfied but the
ht and há estimators do not converge. Naturally, it might be possible to achieve consistency if one imposes
stronger regularity conditions than we do or if one restricts the interference in some other way.

5.2 Common experimental designs
We start our investigation with three specific experimental designs. They are among the designs that are most
commonly used by experimenters (Athey & Imbens, 2017). They are, thus, of interest in their own right.
The designs also provide a good illustration of the issues that arise under unknown interference and act as a
backdrop to our investigation of arbitrary designs in the subsequent section.
5.2.1 Bernoulli and complete randomization
The simplest experimental design assigns treatment independently; we flip a coin for each unit and administer
treatment accordingly. We call this a Bernoulli randomization design, and it satisfies:
Pr(Z = z) =

n
Y

pizi (1 − pi )1−zi

i=1

for some set of assignment probabilities p1, p2, · · · , pn bounded away from zero and one.
The outcomes of any pair of units are independent under no-interference with a Bernoulli design. This
is not the case when units interfere since a single treatment can affect two or more units in that setting. Our
definitions allow us to characterize the dependencies introduced by the interference. Under the Bernoulli
9

design, two outcomes are dependent only when the corresponding two units are affected by a common
treatment. That is, when they are interference dependent according to Definition 5. Limiting this dependence
grants consistency.
Proposition 2 (Consistency under Bernoulli randomization). With a Bernoulli randomization design under
our regularity conditions and restricted interference (Assumptions 1 and 2), the ht and há estimators are
consistent for eate and converge at the following rates:
0.5 
(τ̂ht − τeate ) = Op n−0.5 davg
,

and

0.5 
(τ̂há − τeate ) = Op n−0.5 davg
.

The Bernoulli design tends to be inefficient in small samples as the size of the treatment groups vary over
assignments. Experimenters often opt to use designs that stabilize the treatment groups. A common such
design introduces dependencies in assignments so to keep the number of treated units fixed but otherwise
assigns treatment with equal probability:
(
Pn
n  −1
if i=1
zi = m,
m
Pr(Z = z) =
0
otherwise,
where m = ⌊pn⌋ for some fixed p strictly between zero and one. We refer to this design as complete
randomization.
The dependencies introduced by complete randomization are not of concern under no-interference. The
outcomes are, then, only affected by a single treatment, and the dependence between any two treatments is
asymptotically negligible. This need not be the case when units interfere, and that complicates the asymptotic
analysis. There are two issues to consider. First, the interference could interact with the experimental design
so that two units’ outcomes are strongly dependent asymptotically even when they are not affected by a
common treatment (i.e., when dij = 0). Consider, as an example, when one unit is affected by the first half
of the sample and another unit is affected by the second half. Complete randomization introduces a strong
dependence between the two halves of the sample. In particular, the number of treated units in the first half
is perfectly correlated with the number of treated in the second half. The outcomes of the two units may
therefore be (perfectly) correlated even when no treatment affects them both. Our assumptions do not allow
us to rule out that such dependencies exist. We can, however, show that they are rare whenever Assumption
2 holds.
The second issue is that the dependencies introduced by the design will distort our view of the potential
outcomes. Complete randomization introduces a negative correlation between assignments. Whenever we
observe a unit assigned to a certain treatment condition, units that interfere with the unit tend to be assigned
to the other condition. eate weights the potential outcomes for a given z−i equally, so the dependence tends
to bias our estimators since it induces a weighting that differs from the estimand’s. To illustrate the issue,
Pn
zi , so eate
consider when the potential outcomes are equal to the total number of treated units, yi (z) = i=1
is equal to one. Under complete randomization, the number of treated units is fixed at m, so all revealed
potential outcomes are m. Our estimators will, thus, be constant at zero and biased.
Another way to state this is that we cannot separate the effect of a unit’s own treatment from the spillover
effects of other units’ treatments. In general under complete randomization, if the number of units interfering
with a unit is of the same order as the sample size, our view of the unit’s potential outcomes will be distorted
also asymptotically. We cannot rule out that such distortions exist, but we can show that restricted interference
implies that they are sufficiently rare. Taken together, this allows us to prove consistency under complete
randomization with the same rate of convergence as under Bernoulli randomization.
10

Proposition 3 (Consistency under complete randomization). With a complete randomization design under
our regularity conditions and restricted interference (Assumptions 1 and 2), the ht and há estimators are
consistent for eate and converge at the following rates:
0.5 
(τ̂ht − τeate ) = Op n−0.5 davg
,

and

0.5 
(τ̂há − τeate ) = Op n−0.5 davg
.

Recall that no-interference is equivalent to davg = 1, and Propositions 2 and 3 reassuringly give us root-n
consistency in that case. The propositions, however, make clear that no-interference is not necessary for rootn consistency. As captured in the following corollary, we can allow for non-trivial amounts of interference
and still achieve efficient rates of convergence.
Corollary 1. With a Bernoulli or complete randomization design under our regularity conditions and
bounded interference (Assumptions 1 and 3), the ht and há estimators are root-n consistent for eate.
5.2.2 Paired randomization
Complete randomization restricted the assignment to ensure fixed-sized treatment groups. The paired
randomization design imposes even greater restrictions. In particular, units are divided into pairs, and
assignment is restricted so that exactly one unit in each pair is assigned to treatment. It is implicit that the
sample size is even so that all units are paired. Paired randomization could be forced on the experimenter by
external constraints or used to improve precision (see, e.g., Fogarty, 2017 and the references therein).
In symbols, let ρ : U → U describe a pairing so that ρ(i) = j indicates that units i and j are paired. Since
the pairing is symmetric, the self-composition of ρ is the identity function. The paired randomization design
then satisfies:
(
2−n/2 if zi , zρ(i) for all i ∈ U,
Pr(Z = z) =
0
otherwise.
The paired design accentuates the issues we faced under complete randomization. The dependence
within any set of finite number of treatments is asymptotically negligible under complete randomization. As
a result, issues only arose when the number of treatments affecting a unit was of the same order as the sample
size. Under paired randomization, Zi and Z j are perfectly correlated also asymptotically whenever ρ(i) = j.
Dependencies may therefore remain in large samples even if the number of treatments affecting a unit is of
a lower order than the sample size. Assumption 2 is no longer sufficient for consistency. We must consider
to what degree the dependencies between treatment introduced by the design align with the structure of the
interference. The following two definitions quantify the alignment.
Definition 7 (Average pair-induced interference dependence).
(
n
n X
1 if (1 − dij )Iℓi Iρ(ℓ)j = 1 for some ℓ ∈ U,
1X
bij ,
where
bij =
bavg =
n i=1 j=1
0 otherwise.
Definition 8 (Within-pair interference). R =

Pn

i=1 Iρ(i)i .

Under complete randomization, the dependence between the outcomes of two units i and j not affected by
a common treatment remained asymptotically non-negligible only when the number of treatments affecting
the units grew sufficiently fast. This is not necessary the case under paired randomization. Yi and Yj could
be (perfectly) correlated even when dij = 0 if there exist two other units k and ℓ such that Iki = 1 and
Iℓ j = 1, and k and ℓ are paired, i.e., ρ(ℓ) = k. The purpose of Definition 7 is to capture such dependencies.
11

The definition is similar in structure to Definition 5. Indeed, the upper bounds from Lemma 1 apply so that
2
bavg ≤ Imsq ≤ Imax
.
The second issue we discussed under complete randomization is affected in a similar fashion. No matter
the number of units that are interfering with unit i, if one of those units is the unit paired with i, we cannot
separate the effects of Zi and Zρ(i) . The design imposes Zi = 1− Zρ(i) , so any effect of Zi on i’s outcome could
just as well be attributed to Zρ(i) . Such dependencies will introduce bias—just as they did under complete
randomization. However, unlike the previous design, restricted interference does not imply that the bias will
vanish asymptotically. We must separately ensure that this type of alignment between the design and the
interference is sufficiently rare. Definition 8 captures how common interference is between paired units. The
two definitions allow us to formulate our restrictions on the degree to which the interference aligns with the
pairs in the design.
Assumption 4 (Restricted pair-induced interference). bavg = o(n).
Assumption 5 (Pair separation). R = o(n).
Experimenters may find that the first assumption is quite tenable under restricted interference. As both
davg and bavg are bounded by Imsq , restricted pair-induced interference tends to hold in cases where restricted
interference can be assumed. It is, however, possible that the latter assumption holds even when the former
does not if paired units are interfering with sufficiently disjoint sets of units.
Whether pair separation holds largely depend on how the pairs were formed. We often expect that units
interfere exactly along the pairing. It is not uncommon that the pairs reflect some social structure (e.g., paired
units may live in the same household). The interference will, in such cases, align with the pairing almost
perfectly, and Assumption 5 is unlikely to hold. Pair separation is more reasonable when pairs are formed
based on generic background characteristics. This is often the case when the experimenter uses the design to
increase precision. The assumption could, however, still be violated if the background characteristics include
detailed geographical data or other information likely to be associated with interference.
Proposition 4 (Consistency under paired randomization). With a paired randomization design under our
regularity conditions, restricted interference, restricted pair-induced interference and pair separation (Assumptions 1, 2, 4 and 5), the ht and há estimators are consistent for eate and converge at the following
rates:

0.5
+ n−0.5 b0.5
(τ̂ht − τeate ) = O n−1 R + n−0.5 davg
avg ,


0.5
and (τ̂há − τeate ) = O n−1 R + n−0.5 davg
+ n−0.5 b0.5
avg .

Remark 3. Similar to the Bernoulli and complete designs, root-n consistency follows if we assume bounded
interference. We must, however, now bound the newly defined interference measures as well. That is, we
must assume bavg = O(1) and R = O(n0.5 ) in addition to Assumption 3.

5.3 Arbitrary experimental designs
We conclude our investigations by considering an arbitrary sequence of experiments. We saw in the
previous section that the estimators may not be consistent for eate if the experimental design aligns with the
interference structure to a sufficient degree. Unlike the designs we considered so far,assignment dependencies
are not easily characterized when we allow for arbitrary experimental designs. Our final investigation, thus,
start by introducing a set of definitions that allow us to characterize the alignment between the design and
the interference in a general setting.
12

It will prove useful to collect all treatments affecting a particular unit i into a vector:
ZIi = (I1i Z1, I2i Z2, · · · , Ini Zn ).
The vector is defined so that its jth element is Z j if j is interfering with i, and zero otherwise. Similar to our
previous definitions, let ZI−i be the (n − 1)-dimensional vector constructed by deleting the ith element from
ZIi . These definitions have the following convenient properties:

Yi = yi (Z) = yi ZIi ,


yi (1; Z−i ) = yi 1; ZI−i ,


yi (0; Z−i ) = yi 0; ZI−i .

and


These properties allow us to redefine our interference conditions using ZIi . We have Yi = yi ZIi , so the
outcomes of two units i and j will be independent whenever ZIi and ZIj are independent. We should, thus,
be able to characterize the worst-case outcome dependence introduced by the experimental design by the
dependence between ZIi and ZIj . Similarly, the dependence between Zi and ZI−i will govern how distorted our
view will be of the potential outcomes. If the dependence between Zi and ZI−i is asymptotically non-negligible,
we will not be able to separate the effect of i’s own treatment from potential spillover effects.
We use the alpha-mixing coefficient introduced by Rosenblatt (1956) to measure the dependence between
the assignment vectors. Specifically, for any two random variables X and Y, let:



α X, Y = sup Pr(x ∩ y) − Pr(x) Pr(y) ,
x ∈σ(X)
y ∈σ(Y)

where σ(X) and σ(Y ) denote the sub-sigma-algebras generated by the respective random variable. The
coefficient α(X, Y ) is zero if and only if X and Y are independent, and increasing values indicate increasing
dependence. The maximum is α(X, Y ) = 1/4. In this sense, the coefficient is similar to the familiar correlation
coefficient. The coefficients differ in that the alpha-mixing coefficient is not restricted to linear associations
between two scalar random variables but can capture any type of dependence between any two sets of random
variables. The alpha-mixing coefficient allows us to capture the average amount of dependence between ZIi
and ZIj and between Zi and Z−i .
Definition 9 (External and internal mixing coefficients). Let q and s be the maximum values such that
Assumptions 1b and 1c hold:
αext =

n X
n

  q −2
1X
(1 − dij ) α ZIi , ZIj q ,
n i=1 j=1

and

αint =

n 
X
i=1

α Zi, ZI−i

  s−1
s

.


Each term of the external mixing coefficient, α ZIi , ZIj , captures the dependence between the treatments
affecting unit i and the treatments affecting unit j. Thus, if the dependence between ZIi and ZIj tends to be
weak or rare, αext will be small compared to n. Similarly, if dependence between Zi and ZI−i tends to be
weak or rare, αint will be small relative to n. In this sense, the external and internal mixing coefficients are
direct generalizations of Definitions 7 and 8. Indeed, one can show that αext ∝ bavg and αint ∝ R under
paired randomization where the proportionality constants are given by q and s. We can use these definitions
to generalize the assumptions we made under paired randomization to an arbitrary experimental design.
Assumption 6 (Design mixing). αext = o(n).
Assumption 7 (Design separation). αint = o(n).

13

Design mixing and separation stipulate that dependence between treatments are sufficiently rare or
sufficiently weak (or some combination thereof). This encapsulates and extends the conditions in the
previous sections. In particular, complete randomization under bounded interference constitutes a setting

where dependence is weak: α ZIi , ZIj approaches zero for pairs of units with dij = 0 in that case. Paired

randomization under Assumptions 4 constitutes a setting where dependence is rare: α ZIi , ZIj is equal to
1/4 for some pairs also asymptotically, but those pairs are rare. Complete randomization under Assumption

2 combines the two settings: α ZIi , ZIj might be non-negligible asymptotically for some pairs with dij = 0,
but such pairs are rare. For all other pairs with dij = 0, the pair-level mixing coefficient will approach zero
quickly. A similar comparison can be make for the design separation assumption.
Proposition 5 (Consistency under arbitrary designs). Under our regularity conditions, restricted interference,
design mixing and design separation (Assumptions 1, 2, 6 and 7), the ht and há estimators are consistent for
eate and converge at the following rates:
0.5
0.5 
,
+n−0.5 αext
(τ̂ht −τeate ) = O n−1 αint +n−0.5 davg

0.5
0.5 
and (τ̂há −τeate ) = O n−1 αint +n−0.5 davg
.
+n−0.5 αext

Remark 4. Assumptions 2, 6 and 7 are incompatible unless Assumption 1 holds for some q > 2 and s > 1.
If the regularity conditions only hold for q = 2 and s = 1, the mixing coefficients can be redefined as:
αext =

n X
n



1X
(1 − dij )1 α ZIi , ZIj > 0 ,
n i=1 j=1

and

αint =

n
X



1 α Zi, ZI−i > 0 ,
i=1

where 1[·] maps to one if the expression in brackets is true, and zero otherwise.
Remark 5. The convergence results for Bernoulli and paired randomization presented in the previous section
can be proven as consequences of Proposition 5. That is, however, not the case for complete randomization.
The current proposition applied to that design would suggest slower rates of convergence than given by
Proposition 3 (unless the potential outcomes are bounded). This highlights that Proposition 5 provides
worst-case rates for all designs that satisfy Assumptions 1a, 6 and 7. Particular designs might be better
behaved and, thus, ensure that the estimators converge at faster rates. For complete randomization, we can
prove that restricted interference implies a stronger mixing condition than the conditions defined above. In
particular, we can redefine the conditions using the ∗- or ψ-mixing coefficients as introduced by Blum et al.
(1963) and Philipp (1969) rather than the alpha-mixing coefficient. This provides rates of convergence that
are independent of which moment we can bound in our regularity conditions, and Proposition 3 would follow.
We have not encountered other designs that satisfy this stronger mixing condition and have, for this reason,
not explored that route further.
Remark 6. If no units interfere, ZI−i will be constant at zero, and Assumption 7 is trivially satisfied. However,
no-interference does not imply that Assumption 6 holds. Consider a design that restricts all treatments to
be equal, Z1 = Z2 = · · · = Zn . The external mixing coefficient would not be zero in this case; in f