Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji jbes%2E2009%2E07161

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Missing Treatments
Francesca Molinari
To cite this article: Francesca Molinari (2010) Missing Treatments, Journal of Business &
Economic Statistics, 28:1, 82-95, DOI: 10.1198/jbes.2009.07161
To link to this article: http://dx.doi.org/10.1198/jbes.2009.07161

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 193

View related articles

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 12 January 2016, At: 00:14

Missing Treatments
Francesca M OLINARI
Department of Economics, Cornell University, 492 Uris Hall, Ithaca, NY 14853-7601 (fm72@cornell.edu)
This article analyzes the problem of identifying a treatment effect with imperfect observability of the
treatment received by the population. Imperfect observability may be due to item/survey nonresponse or
to noncompliance with randomly assigned treatments. I derive sharp worst-case bounds that are a function
of the available prior information on the distribution of missing treatments. Under the assumption of
monotone treatment response, I show that prior information on the distribution of missing treatments is
not necessary to get sharp informative bounds. I illustrate the results with an empirical analysis of drug
use and employment using data from the National Longitudinal Survey of Youth.
KEY WORDS: Bounds; Missing data; Partial identification; Treatment effect.

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

1. INTRODUCTION


A few examples may help clarify the extent of this problem.
The National Longitudinal Survey of Youth (NLSY) asks female respondents about their drinking behavior during pregnancy; in the 1984 wave of the survey, the nonresponse rate
across these items was 6%–14%. This leads to unobservability of treatments when studying the effect of drinking during
pregnancy on birth outcomes. Researchers face similar itemnonresponse rates when studying the effect of “problem drinking” or drug abuse on labor market outcomes. The Survey of
Consumer Finances asks respondents about their ownership of
stocks, bonds and businesses, as well as about their income
(from wages, business, or pension) and bank accounts. In 1995,
the nonresponse rates for these items were 6%–20%. The nonresponse rates for similar items in the 1992 wave of the Health
and Retirement Study (HRS92) and in the 1993 wave of the Assets and Health Dynamics Among the Oldest Old were 13%–
33% and 13%–45%, respectively (although in these surveys interval information is obtained from some of the nonrespondents
using unfolding brackets). The nonresponse rate in HRS92 to a
question asking whether the respondent’s children had a family income falling in one of three categories was 13.91%. This
leads to unobservability of treatments when studying the effect
of children’s family income on the probability that they receive
transfers from their parents.
In practice, in applied work it often is assumed that the data
are missing completely at random, and a complete-case (CC)
analysis is conducted (Little and Rubin 1987, chap. 3). In this
article I examine the missing-treatments problem from a “conservative” perspective, in the sense that I first determine the inferences that can be drawn in the absence of assumptions about

the missing-data mechanism and then illustrate the identifying
power of credible assumptions posed on the distribution of the
missing treatments and on the treatment response and selection mechanism. In the setting considered here, there are three
problems preventing point identification of treatment effects:
the usual latent outcome problem, the impossibility of identifying the distribution of received treatments, and the impossibility of matching the unobserved received treatments with the
observed outcomes. I propose a method to jointly address these

Part of the existing literature on program evaluation has examined what can be inferred about a treatment effect of interest
when the observability of some of the relevant variables is imperfect, and when noncompliance to assigned treatments prevents identification in randomized experiments. Little (1992),
Robins, Rotnizky, and Zhao (1994), and Wang et al. (1997)
formulated sets of assumptions strong enough to achieve point
identification for the case of missing covariate data. Imbens and
Pizer (1999) showed that the assumption of covariate data missing at random can be tested in randomized experiments with
complete random assignment of treatment, and proposed models that allow point identification of the parameters of interest.
Horowitz and Manski (2000) studied the problem of missing
outcome and covariate data in randomized experiments, and
derived sharp bounds on the distribution of outcomes conditional on covariates without invoking untestable assumptions
on the missing data mechanism. Angrist, Imbens, and Rubin
(1996) addressed the problem of imperfect compliance in classical randomized experiments. They posed a set of assumptions under which it is possible to identify the treatment effect
within the subpopulation of persons who comply with the assigned treatment. Balke and Pearl (1997) studied similar identification problems under weaker assumptions. They made use

of the statistical independence between the response functions
and the assigned treatments to propose alternative, assumptionfree sharp bounds for assessing the average effect of treatment
over the population as a whole. A different problem was addressed by Hotz, Mullin, and Sanders (1997), who studied what
can be learned about treatment effects when using a contaminated instrumental variable, that is, when a mean-independence
assumption holds in a population of interest, but the observed
population is a mixture of the population of interest and one in
which the assumption does not hold.
A common feature in this literature is the assumption of perfect observability of the received treatment (in both randomized experiments and observational studies), or ignorability of
the data with missing treatments. This assumption often is at
odds with the empirical evidence. Item nonresponse in surveys
can affect the observability of a variable whose effect is under
study, and often it is not plausible to assume that the decision to
respond to a specific question in a survey is random, or that the
distribution of treatments between respondents and nonrespondents is the same.

© 2010 American Statistical Association
Journal of Business & Economic Statistics
January 2010, Vol. 28, No. 1
DOI: 10.1198/jbes.2009.07161
82


Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

Molinari: Missing Treatments

problems, and show that significant progress can be made when
the researcher has some prior information on the distribution of
the missing treatments.
The article is organized as follows. Section 2 introduces the
notation and the questions of interest. Section 3 derives worstcase bounds. It shows that when nothing is assumed about the
distribution of the missing treatments, then no information can
be extracted from the observations for which the received treatment is unknown, with the additional degree of underidentification proportional to the fraction of missing data. Section 4.1
shows that under the maintained assumption of monotone treatment response (Manski 1997), such prior knowledge is not
needed to extract information from the observations for which
the received treatment is unknown, but shrinks the bounds significantly when it is available. Section 4.2 derives bounds on
the parameters of interest for the case where treatment selection is exogenous. Section 5 comments on the implications of
the missing-treatments problem. The bounds derived in this article are sharp, in the sense that they exhaust all information
available from the data and the maintained assumptions. In Sections 3–5, to keep the focus on identification, I treat identified
quantities as known. Section 6 illustrates the theoretical results
with an application studying the effect of drug abuse during

work hours on unemployment. It shows that the assumption of
ignorability of the observations with missing treatment data can
be rejected for this problem. I address statistical considerations
following the approach of Beresteanu and Molinari (2008). Section 7 concludes. All proofs are given in the Appendix.
2. SETUP OF THE PROBLEM
Using standard notation (e.g., Neyman 1923), let each member j of a population of interest J be characterized by some
covariates xj ∈ X, be exposed to a set of mutually exclusive and
exhaustive treatments T, and have a specific response function
yj (·) : T → Y mapping treatments t ∈ T into outcomes yj (t) ∈ Y.
If zj ∈ T is the treatment that individual j actually receives, then
yj ≡ yj (zj ) is the realized outcome, whereas yj (t) is a latent outcome for t = zj . Denote by dj a binary variable that takes value
1 if the treatment received by individual j is observed and 0
otherwise, and assume that the population is a measure space
(J, , P), with probability measure P. In the analysis developed in this article I assume perfect observability of realized
outcomes as well as covariates; I also assume that all variables
are measured correctly. Such assumptions are maintained to focus attention on the problem of missing treatments; the results
of Horowitz and Manski (1995, 1998, 2000) can be easily incorporated into the analysis in case of missing or contaminated
outcome data, or missing covariate data. The results of Molinari
(2003) can be applied in the presence of classification error in
the outcome data, the covariate data, or the treatment data.

The researcher learns the distribution P[y, x, d] of realized
outcomes, covariates, and observability of the realized treatments, as well as the distribution P[z|x, y, d = 1] of realized
treatments given covariates, realized outcome, and observability of the realized treatment. The researcher’s problem is to
learn the distribution P[y(·)|x] of response functions to be able
to infer the effect of a treatment.

83

In Sections 3 and 4.1 I study the problem of missing treatments in the context of observational studies with nonexogenous selection of treatment. This problem also can occur in observational studies with exogenous selection of treatment and
in randomized experiments, either because the assigned treatments are partially unobservable or because there is uncertainty
about the degree of individuals’ compliance with the treatments. In Section 4.2 I show that if there is full compliance
and partial unobservability of the assigned treatment (or, in the
context of observational studies, if the treatment is exogenously
assigned), then the problem can be expressed as a case of missing covariates identical to the problem studied by Horowitz and
Manski (1998, 2000), and thus their results apply. If there is
perfect observability of the assigned treatment but uncertainty
as to the degree of compliance, then the problem can be approached by adapting the method of Balke and Pearl (1997),
who considered the problem of an experimental study with random assignment but imperfect compliance (i.e., the treatment
received differs from that assigned). In the case of observability of the received treatments, they derived sharp bounds on the
average treatment effect by solving a complex linear programming problem. The same method can be used in the case of unobservability of the received treatment or uncertainty about the

degree of compliance, as long as d can be perfectly observed.
(If it is only known that a fraction of the agents do not comply
with the assigned treatments, but not who does not comply, then
the method does not apply, because in that case we are facing a
contamination problem.)
To simplify notation, I omit the covariates in what follows.
I assume that the outcome variable y takes values in a bounded
set Y, where K0 ≡ inf Y and K1 ≡ sup Y are known finite numbers. For ease of exposition, I assume that T = {0, 1}; in case
of multiple treatments, all of the results in Sections 3 and 4.2,
and some of the results in Section 4.1 still hold. Because the
focus of this article is the missing-treatments problem, I assume that 0 < Pr(d = 0) < 1. I address the following questions:
(1) What can be learned about E[y(t)], t ∈ T, the average outcome under a mandatory policy? (2) What can be learned about
E[y(1)]−E[y(0)], the average treatment effect (ATE)? (3) What
can be learned about E[y(t)] − E[y], that is, the status quo treatment effect (STE)?
3. WORST–CASE BOUNDS
Here I analyze what can be learned about the treatment effects of interest when nothing is assumed about the distribution
of the missing treatments. The availability of prior information
on this distribution turns out to be crucial; if nothing is known
about Pr[z = 1|d = 0], then no information can be extracted
from the observations for which the treatments are missing. But

if a bound (or point identification) on Pr[z = 1|d = 0] is available, then it is possible to extract information from the observations with missing treatment data.
3.1 Worst-Case Bounds on a Mandatory Policy
and on the STE
Suppose that we are interested in E[y(1)]; by the law of iterated expectations,
E[y(1)] = E[y(1)|d = 1] Pr(d = 1) + E[y(1)|d = 0] Pr(d = 0).
(1)

84

Journal of Business & Economic Statistics, January 2010

From the data, we can learn Pr(d = 1) and Pr(d = 0). Regarding the other two terms on the right side of (1), first focus
attention on E[y(1)|d = 1]:
E[y(1)|d = 1] = E[y|d = 1, z = 1] Pr[z = 1|d = 1]
+ E[y(1)|d = 1, z = 0] Pr[z = 0|d = 1].

(2)

The equality in (2) expresses the usual problem of learning
the distribution of outcomes under a mandatory policy (in this

case assigning treatment 1); the only unobserved quantity is the
counterfactual probability of success under the treatment for
people who actually did not receive it: E[y(1)|d = 1, z = 0].
Now consider E[y(1)|d = 0]:

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

E[y(1)|d = 0] = E[y|d = 0, z = 1] Pr[z = 1|d = 0]
+ E[y(1)|d = 0, z = 0] Pr[z = 0|d = 0].

(3)

The problem arising in the case of missing treatments is that
all quantities in (3) are unknown; not only do we have the usual
problem of latent outcomes, but we also do not know the distribution of treatments when they are unobservable. Moreover,
the data do not reveal how to match the realized outcome with
the received but unobservable treatment. The only thing that we
can learn from the data is the distribution of realized outcomes
under unobserved treatments; that is, we can learn
Q(s) ≡ Pr[y ≤ s|d = 0]

= Pr[y ≤ s|d = 0, z = 1]p + Pr[y ≤ s|d = 0, z = 0](1 − p),
where p ≡ Pr[z = 1|d = 0]. If p is known, then the result of
corollary 4.1 of Horowitz and Manski (1995) can be used to
find a sharp bound on E[y(1)|d = 0] using the information provided by Q (similarly for E[y(0)|d = 0]). But if we do not know
anything about p, then knowledge of Q does not help bound the
outcome distribution under received treatment unobservability.
Indeed, it may be that all treatments that we do not observe are
of type 0, in which case knowledge of Q does not provide any
information about E[y(1)].
Proposition 1 states what can be learned about a mandatory
policy both when p is known and when p is unknown. Before
stating the result, I introduce some additional notation. For any
δ ∈ [0, 1], let r(δ) denote the δ-quantile of Q: r(δ) ≡ inf{s :
Q(s) ≥ δ}. Define probability distributions Lδ and Uδ on ℜ as
follows:

Q(s)
for s < r(δ)
Lδ [−∞, s] ≡
δ
1
for s ≥ r(δ),

0
for s < r(1 − δ)
Uδ [−∞, s] ≡ Q(s) − (1 − δ)
for s ≥ r(1 − δ).
δ
Proposition 1. Given the value of p ∈ [0, 1] and no other information, the sharp bounds on E[y(1)|d = 0] and on E[y(1)]
are given by
E[y(1)|d = 0]
 


∈ p y dLp + (1 − p)K0 , p y dUp + (1 − p)K1 ,

(4)



E[y|d = 1, z = 1] Pr[z = 1, d = 1] + P[z = 0, d = 1]K0

 
+ p y dLp + (1 − p)K0 Pr(d = 0)
≤ E[y(1)]


≤ E[y|d = 1, z = 1] Pr[z = 1, d = 1] + P[z = 0, d = 1]K1

 
(5)
+ p y dUp + (1 − p)K1 Pr(d = 0).
In the absence of knowledge of p, the sharp bounds on
E[y(1)|d = 0] and on E[y(1)] are given by
K0 ≤ E[y(1)|d = 0] ≤ K1 ,
E[y|d = 1, z = 1] Pr[z = 1, d = 1]
+ (P[z = 0, d = 1] + Pr(d = 0))K0
≤ E[y(1)]
≤ E[y|d = 1, z = 1] Pr[z = 1, d = 1]
+ (P[z = 0, d = 1] + Pr(d = 0))K1 .
Whereas the width of the bound on E[y(1)] with no missing
data is equal to Pr[z = 0](K1 − K0 ), with missing treatments it
increases to (Pr[z = 0|d = 1] Pr(d = 1) + Pr(d = 0))(K1 − K0 ).
Now consider the case in which the researcher is interested
in learning the STE, which can be decomposed as follows:
E[y(t)] − E[y]



= E[y(t)|d = 1] − E[y|d = 1] Pr(d = 1)



+ E[y(t)|d = 0] − E[y|d = 0] Pr(d = 0),

t = 0, 1.
(6)

Manski and Nagin (1998) derived sharp worst-case bounds on
the STE under observability of the received treatment, the quantity in curly brackets in the first line of equation (6). Here
we have the additional problem of missing treatments. But the
bounds on the STE can be easily obtained from Proposition 1,
observing that the STE is obtained by subtracting an identified
quantity, E[y], from E[y(t)]. The following corollary (the proof
of which is omitted) gives the results for t = 1.
Corollary 2. Given the value of p ∈ [0, 1] and no other information,
E[y(1)|d = 0] − E[y|d = 0]
 
∈ p y dLp + (1 − p)K0 − E[y|d = 0],
p




y dUp + (1 − p)K1 − E[y|d = 0] .

This bound is informative for any value of p. The sharp bound
on E[y(1)] − E[y] is given by


E[y|d = 1, z = 1] Pr[z = 1, d = 1] + P[z = 0, d = 1]K0

 
+ p y dLp + (1 − p)K0 Pr(d = 0) − E[y]
≤ E[y(1)] − E[y]

Molinari: Missing Treatments

85



≤ E[y|d = 1, z = 1] Pr[z = 1, d = 1] + P[z = 0, d = 1]K1

 
+ p y dUp + (1 − p)K1 Pr(d = 0) − E[y].

The bounds on the STE always cover zero.
3.2 Worst-Case Bounds on the ATE

Now consider the case in which the researcher is interested
in learning the ATE, which we can decompose as follows:

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

E[y(1)] − E[y(0)]



= E[y(1)|d = 1] − E[y(0)|d = 1] Pr(d = 1)



+ E[y(1)|d = 0] − E[y(0)|d = 0] Pr(d = 0). (7)

Manski (1995) derived sharp worst-case bounds on the ATE
under observability of the received treatment, the quantity in
curly brackets in the first line of Equation (7). Proposition 3
states what can be learned about the ATE both when p is known
and when it is unknown.
Proposition 3. Given the value of p ∈ [0, 1] and no other information, the sharp bound on E[y(1)|d = 0] − E[y(0)|d = 0] is
given by


=
p
y
dL
+
(1

p)K

(1

p)
y dU1−p − pK1 ,
LBd=0
p
0
ATE
UBd=0
ATE

=p



y dUp + (1 − p)K1 − (1 − p)



y dL1−p − pK0 .

This bound is informative for any value of p. The sharp bound
on E[y(1)] − E[y(0)] is given by
d=0
LBd=1
ATE · Pr(d = 1) + LBATE · Pr(d = 0)

≤ E[y(1)] − E[y(0)]

d=0
≤ UBd=1
ATE · Pr(d = 1) + UBATE · Pr(d = 0),

(8)

where
LBd=1
ATE = E[y|d = 1, z = 1] Pr[z = 1|d = 1]
+ K0 Pr[z = 0|d = 1]

3.3 Identifying the Power of Assumptions
on Pr[z = 1|d = 0]
The bounds derived in Propositions 1 and 3 and in Corollary 2 are functions of p ≡ Pr[z = 1|d = 0]. I have shown that
as long as p is unknown, no information can be extracted from
the observations with missing treatment data. However, if some
prior information on the distribution of the missing treatments
is available, progress can be made. This is illustrated in the empirical application in Section 6.
Suppose, for example, that p is identified, either because the
results of a validation study are available or because one is willing to assume that the treatments are missing at random (TMR),
that is, d ⊥ z, which implies
TMR :

− K1 Pr[z = 1|d = 1],

UBd=1
ATE = E[y|d = 1, z = 1] Pr[z = 1|d = 1]
+ K1 Pr[z = 0|d = 1]
− E[y|d = 1, z = 0] Pr[z = 0|d = 1]
− K0 Pr[z = 1|d = 1].
In the absence of knowledge of p, the sharp bounds on
E[y(1)|d = 0] − E[y(0)|d = 0] and on E[y(1)] − E[y(0)] are
given by
−(K1 − K0 ) ≤ E[y(1)|d = 0] − E[y(0)|d = 0] ≤ (K1 − K0 ),
(9)
LBd=1
ATE · Pr(d = 1) − (K1 − K0 ) Pr(d = 0)
≤ E[y(1)] − E[y(0)]

(10)

The width of the band in (10) is equal to (K1 − K0 )(1 + Pr(d =
0)).

p = Pr[z = 1|d = 1].

[Note that in this case we are not assuming d ⊥ (y(0), y(1), z).]
Then the foregoing bounds can be evaluated at this value of p.
Now consider the case of survey questions concerning illicit
activities, socially unacceptable behaviors, or any other activity that is socially stigmatized in varying degrees. In that case
there can be reason to be concerned that the decision to nonrespond is motivated by respondents’ reluctance to report that
they engaged in such activities. If this is the case, then it can
be credible to assume that the probability that nonrespondents
engaged in such activities is at least as high as the probability that respondents did. This assumption was used by Pepper
(2001). Formally, if z = 1 denotes engaging in the activity (in
our example, being under the effect of illicit drugs during work
hours), then the assumption of stigma affecting response (SAR)
implies
SAR :

− E[y|d = 1, z = 0] Pr[z = 0|d = 1]

≤ UBd=1
ATE · Pr(d = 1) + (K1 − K0 ) Pr(d = 0).

The bounds on the ATE always cover zero.

Pr[z = 1|d = 0] ≥ Pr[z = 1|d = 1].

The identifying power of this assumption is obviously
weaker than the identifying power of the TMR assumption;
whereas in that case we get point identification of p, here we
simply shrink the range of p to be [Pr[z = 1|d = 1], 1].
More generally, one may learn that p ∈ [p1 , p2 ]. Then the
bound on the ATE can be calculated by taking (respectively) the
infimum value and the supremum value that the lower bound
and the upper bound reported in Proposition 3 achieve for p
ranging in [p1 , p2 ]. Regarding the mandatory policy, the bound
on E[y(1)] can be calculated by evaluating the bound in Proposition 1 at p = p1 (while the information provided by p2 can be
used to bound more tightly E[y(0)]). Another example along
these lines can be found in recent work of Kreider and Hill
(2005), who applied some of the results in this article to study
the problem of learning utilization rates of health services under a hypothetical policy of universal health insurance, when
insurance status is subject to classification error and is unverified for some respondents (the respondents with missing treatments).

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

86

Journal of Business & Economic Statistics, January 2010

4. MISSING TREATMENTS WITH ASSUMPTIONS ON
SELECTION AND RESPONSE

These bounds are informative even when p is unknown (and for
this case they are given in the Appendix).

4.1 Monotonicity Assumptions

The results for the STE can be easily obtained by subtracting E[y] from the lower bound and from the upper bound on
E[y(t)].
The result in Proposition 4 is quite intuitive; if we assume
yj (0) ≤ yj (1), it follows that assigning treatment 0 as a mandatory policy cannot imply a larger outcome than the realized one,
whereas assigning treatment 1 as a mandatory policy cannot imply a smaller outcome than the realized one. This observation
explains the upper bound (lower bound) on E[y(0)] (E[y(1)]).
The other bound is obtained applying a similar argument as in
Proposition 1. If what is known about the probability of treatment is that p ∈ [p1 , p2 ], then the bounds can be calculated
by taking (respectively) the infimum value and the supremum
value that the lower bounds and the upper bounds reported in
Proposition 4 achieve for p ranging in [p1 , p2 ].
Assume now that E[y(t)|z = u] is weakly increasing in u; assume further that this restriction holds for the subpopulations
with complete data and incomplete data. In the case studied
here, because T = {0, 1}, this implies that for each t ∈ T,

Manski (1997) investigated what may be learned about treatment response when it is assumed that response functions
are monotone, semimonotone, or concave monotone and no
assumptions are imposed on the treatment selection process.
He showed that assuming monotone or concave monotone response qualitatively improves the identification problem relative to the worst-case situation in which no prior information
is available. Manski and Pepper (2000) studied the identifying
power of monotone instrumental variable assumptions, in particular the special case of monotone treatment selection (MTS).
They showed that the joint assumption of monotone treatment
selection and response can be tested, and that if such joint assumption is maintained, then informative bounds can be obtained even if Y is unbounded.
In this section I study the identifying power of assuming
monotone treatment response (MTR), and of jointly assuming monotone treatment response and selection (MTR–MTS),
when some of the treatments are missing. I show that under the
maintained assumption of monotone treatment response (alone
or jointly with MTS), one can extract information from the observations for which the treatment data are missing even without any prior knowledge of p ≡ Pr[z = 1|d = 0]. Clearly, prior
information on p shrinks the bounds further.
Assume that the response function is weakly increasing; in
the case studied here, because T = {0, 1}, this implies that for
each j ∈ J,
MTR :

yj (0) ≤ yj (1).

(11)

I return to the interpretation of this assumption in Section 6.
Proposition 4. Suppose that the treatment response function is weakly increasing. Given the value of p ∈ [0, 1] and no
other information, the sharp bounds on E[y(t)], t = 0, 1, and on
E[y(1)] − E[y(0)] are given by
K0 Pr[z = 1, d = 1] + E[y|z = 0, d = 1] Pr[z = 0, d = 1]



+ (1 − p) y dL1−p + pK0 Pr(d = 0)
≤ E[y(0)] ≤ E(y),
≤ E[y|z = 1, d = 1] Pr[z = 1, d = 1]
+ K1 Pr[z = 0, d = 1]
 

+ p y dUp + (1 − p)K1 Pr(d = 0),
0 ≤ E[y(1)] − E[y(0)]
≤ (E[y|z = 1, d = 1] − K0 ) Pr[z = 1, d = 1]
+ (K1 − E[y|z = 0, d = 1]) Pr[z = 0, d = 1]
 

+
p y dUp + (1 − p)K1
− (1 − p)



y dL1−p + pK0



i = 0, 1.

(12)

I return to the interpretation of this assumption in Section 6.
Given (12), the results of Manski and Pepper (2000) can be
used to tighten the bounds on E[y(t)|d = 1], t = 0, 1. Regarding
E[y(t)|d = 0], t = 0, 1, for a given value of p, the following
bounds are obtained.
Proposition 5. Suppose that E[y(t)|z = 1, d = 0] ≥ E[y(t)|
z = 0, d = 0], t = 0, 1. Given the value of p ∈ [0, 1] and no
other information, the sharp bounds on E[y(t)|d = 0], t = 0, 1,
are given by

y dL1−p ≤ E[y(0)|d = 0]
≤ (1 − p)
p





y dU1−p + pK1 ,

y dLp + (1 − p)K0 ≤ E[y(1)|d = 0] ≤



y dUp .

If one is willing to impose the MTR and MTS assumptions
jointly, then the width of the bounds shrinks significantly. The
following proposition gives the result.

E(y) ≤ E[y(1)]



E[y(t)|z = 1, d = i] ≥ E[y(t)|z = 0, d = i],

MTS :

Pr(d = 0).

Proposition 6. Suppose that (11) and (12) jointly hold. Given
the value of p ∈ [0, 1] and no other information, the sharp
bounds on E[y(t)], t = 0, 1, and on E[y(1)] − E[y(0)] are given
by

E[y|z = 0, d = 1] Pr(d = 1) + y dL1−p Pr(d = 0)
≤ E[y(0)] ≤ E(y),

E(y) ≤ E[y(1)] ≤ E[y|z = 1, d = 1] Pr(d = 1)

+ y dUp Pr(d = 0),
0 ≤ E[y(1)] − E[y(0)]

(13)

(14)

Molinari: Missing Treatments

87




≤ E[y|z = 1, d = 1] − E[y|z = 0, d = 1] Pr(d = 1)



y dUp − y dL1−p Pr(d = 0).
+

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

These bounds are informative even when p is unknown (and for
this case they are given in the Appendix).
The results for the STE can be easily obtained by subtracting E[y] from the lower bound and from the upper bound on
E[y(t)].
The joint MTR–MTS assumption is a testable hypothesis that
should be rejected if E[y|z = u] is not weakly increasing in u.
When some of the treatments are missing, it is straightforward
to show that such assumption still can be tested on the subpopulation for which the treatments are observable; however, the
hypothesis cannot be tested on the subpopulation with missing
data or thus on the population as a whole.
4.2 Exogenous Treatment Selection
Consider the case where in an observational study it can be
argued convincingly that there is exogenous treatment selection
(ETS) or in a randomized experiment the assigned treatments
are partially unobservable but there is full compliance. In such
a case it may be credible to assume P[y(t)] = P[y|z = t], t ∈ T.
This assumption can be weakened to
ETS :

E[y(t)] = E[y|z = t],

t ∈ T.

Under the ETS assumption, the parameters of interest become
(1) E[y(t)] = E(y(t)|z = t), t ∈ T; (2) E[y(1)] − E[y(0)] =
E[y(1)|z = 1] − E[y(0)|z = 0]; and (3) E[y(t)] − E(y) =
E[y(t)|z = t] − E(y), t ∈ T.
In this case the missing treatments are the only problem preventing point identification of the parameters of interest. The
exogenous selection of treatments implies that here the results
of Horowitz and Manski (1998, 2000) regarding partial identification with missing covariate data apply. In particular, given
the value of p ∈ [0, 1], let
π1 (p) =
π0 (p) =

Pr[z = 1|d = 1] Pr(d = 1)
,
Pr[z = 1|d = 1] Pr(d = 1) + p Pr(d = 0)

Pr[z = 0|d = 1] Pr(d = 1)
.
Pr[z = 0|d = 1] Pr(d = 1) + (1 − p) Pr(d = 0)

Then
E[y|z = 0, d = 1]π0 (p) + (1 − π0 (p))
≤ E[y(0)]



y dL1−p

≤ E[y|z = 0, d = 1]π0 (p) + (1 − π0 (p))
E[y|z = 1, d = 1]π1 (p) + (1 − π1 (p))
≤ E[y(1)]





y dU1−p , (15)

y dLp

≤ E[y|z = 1, d = 1]π1 (p) + (1 − π1 (p))



y dUp .

(16)

The bounds on the STE are obtained by subtracting E(y) from
both the lower and upper bounds in the foregoing equation. The
bounds on the ATE are

E[y|z = 1, d = 1]π1 (p) + (1 − π1 (p)) y dLp
− E[y|z = 0, d = 1]π0 (p) − (1 − π0 (p))
≤ E[y(1)] − E[y(0)]
≤ E[y|z = 1, d = 1]π1 (p) + (1 − π1 (p))



− E[y|z = 0, d = 1]π0 (p) − (1 − π0 (p))



y dU1−p

y dUp


y dL1−p .

When all that is known about p is that it belongs to an interval
[p1 , p2 ], with 0 ≤ p1 ≤ p2 ≤ 1, the bounds on the parameters
of interest can be calculated by taking (respectively) the infimum value and the supremum value that the foregoing lower
and upper bounds achieve for p ranging in [p1 , p2 ].
5. THE IMPLICATIONS OF MISSING TREATMENTS
The presence of missing treatments adds a layer of underidentification to the selection problem studied in the treatment
effects literature. This additional identification problem stems
from the fact that for the subpopulation with missing treatment
data, the observable outcomes are a mixture of the outcomes experienced under the two treatments of interest, but which treatment each individual receives is unknown. Assumptions on the
probability p of treatment in this subpopulation (or on the treatment response function) allow information to be extracted from
the individuals with missing treatment data as well.
As may be expected, the inference drawn under the assumptions at the base of the CC analysis (i.e., ignoring the missing
treatments) and/or under the ETS assumption lead to bounds
that are a proper subset of the worst-case ones derived in Section 3. In the absence of missing treatments, it also is true that
the ETS assumption leads to point identification of the parameters of interest and that the point-identified quantities lie inside
the worst-case bounds, as well as inside the bounds obtained in
Section 4.1 under the MTR and under the MTR–MTS assumption.
In the presence of missing treatments, if either the MTR or
the joint MTR–MTS assumption is maintained, then if p is unknown, the CC analysis continues to give bounds on the ATE
that are a proper subset of those obtained when also considering
the observations with missing treatment data. But when evaluating the effect of a mandatory policy, the CC analysis does not
necessarily lead to narrower bounds than when considering all
observations, even when nothing is known about p. This is because under the MTR assumption, information can be extracted
from the observations with missing treatments irrespective of p.
The CC analysis ignores this information. In particular, it can
be easily shown that if E[y|d = 1] = E[y|d = 0], then, under
the maintained assumption of MTR, either the lower bound on
E[y(0)] obtained ignoring the observations with missing data is
strictly smaller than that obtained using all of the observations,
or the upper bound on E[y(1)] obtained ignoring the observations with missing data is strictly larger than that obtained using

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

88

Journal of Business & Economic Statistics, January 2010

all of the observations. Moreover, the width of the bound in the
CC analysis is not necessarily narrower than that of the bound
obtained using all of the observations.
In the presence of missing treatments, the bounds for the ATE
obtained under the ETS assumption are not necessarily a subset
of those obtained under the MTR or the MTR–MTS assumption, unless one performs a CC analysis. The same is true for the
bounds on the effect of a mandatory policy. Moreover, the width
of the bounds obtained under the ETS assumption does not
need to be narrower than the width of the bounds obtained under the MTR–MTS assumption. Although this result may seem
counterintuitive, it can be easily explained by comparing Equations (13)–(14) with Equations (15)–(16). For example, in both
cases the lower bounds on E[y(0)] involve E[y|z = 0, d = 1]
and y dL1−p , but these quantities are averaged using different weights. Which weighting leads to the higher lower bound
depends on the specific problem at hand.
6. EMPIRICAL APPLICATION
6.1 The Effect on Unemployment of Drug
Use During Work Hours
Illicit drug and alcohol abuse generally have been associated
with huge economic costs, and such association has been a motivation for drug-related and alcohol-related public policies in
the United States and worldwide. A large share of these costs is
related to reduced labor productivity (Harwood, Fountain, and
Livermore 1998). It may seem logical to expect a negative relationship between the presence and severity of an alcohol and/or
drug abuse problem and the employment status, labor supply,
and wage rate of individuals. But although evidence exists that
“problem drinking” is associated with lower employment rates
and greater unemployment (e.g., Mullahy and Sindelar 1996),
the findings on the effects of drug use on labor market outcomes
remain controversial. For example, Kandel and Davies (1990),
Kaestner (1991), Register and Williams (1992), and Gill and
Michaels (1992) found a positive correlation between drug use
and wages, even when individual characteristics and endogeneity were accounted for. At the same time, Kaestner (1994) and
Buchmueller and Zuvekas (1998) found a significant negative
effect of drug use on employment or labor supply for males, but
an insignificant or even a positive effect for females. Kandel and
Yamaguchi (1987) studied the effect of drug use on job mobility and found that drug use predicted job turnover and decreased
tenure on the job; however, their results suggest that these effects probably reflect the influence of preexisting differences
among individuals who start using drugs, instead of the effects
of drugs themselves. One problem affecting the empirical work
in this area is the relatively high fraction of missing data (see,
e.g., Pepper 2001); people are reluctant to answer questions relative to illicit activities or stigmatized activities, such as drug
abuse and “problem drinking.” In practice, researchers often assume that the data are missing completely at random (MCAR),
and conduct their analysis only on the subpopulation of respondents. I adopt a more conservative approach and study the effect
for the population as a whole of drug use during work hours on
the average number of weeks of unemployment.

Data. I use data from the NLSY. In its base year of 1979,
the NLSY interviewed 12,686 persons age 14–22. The survey
has been updated each year from 1979 to the early 1990s and
every 2 years thereafter. The data contain detailed information
on a respondent’s labor market experience and family and personal background. Approximately half of the total NLSY respondents were randomly sampled, with the remainder selected
to overrepresent certain demographic groups (see BLS 1999).
In what follows I restrict attention to the randomly sampled subpopulation; thus problems connected with sampling design can
be ignored. In 1984 and 1988 the respondents were asked questions about their lifetime and current use of several illicit drugs.
In 1984 a (randomly sampled) group of 1441 respondents who
had been employed either in that year or in the past were asked
whether in their most recent job since the 1983 interview, they
had ever been under the effect of illicit drugs during work hours.
Although self-reports may provide much information about
an individual’s behavior, the validity of such reports is sometimes questioned. In this article I assume that when respondents answer the question on drug abuse, they do so truthfully.
Although this assumption is commonly used, it clearly is a
strong one. I maintain it here to focus on the missing-treatments
problem. Provided that one has some prior information on the
amount of classification error in drug use during work hours,
the assumption of truthful reports can be relaxed, and partial
identification of the parameters of interest accounting for both
missing data and misclassification can be obtained merging the
results in this article with those of Molinari (2008, sec. 5).
Some empirical evidence exists supporting the assumption
of valid reports for drug use during work hours. Mensch and
Kandel (1998) compared the declared illicit drug use in the
1984 youth survey with the reports of other national surveys
of drug use and found underreporting of drug abuse (other than
marijuana). But they also suggested that such underreporting
seemed to be more common in light drug users than in heavy
users. As Gleason, Veum, and Pergamit (1991) argued, it may
be that individuals who use drugs at work are more frequent
users, and as Mensch and Kandel (1998) documented, less
likely to underreport their drug use.
I focus attention on the 1345 (randomly sampled) respondents who were employed in 1983. Thus my empirical analysis concerns the subpopulation of persons who, in the notation introduced in Section 2, have the shared observable covariate x = {employed in 1983} (covariates left implicit in Secs. 3
and 4). Out of this group, 236 persons answered that they used
drugs during work hours, 994 answered that they did not, and
115 refused to answer the question (8.55%). I take the outcome of interest to be the number of weeks that an individual was unemployed during the calendar year 1983–1984. If
the respondent answered the question on drug abuse (dj = 1),
then I observe whether she has been under the effect of drugs
during work hours (zj = 1|dj = 1) or not (zj = 0|dj = 1); if
the respondent skipped the question, the I register the missing
data (dj = 0). In the subpopulation of respondents, the average
number of weeks of unemployment is relatively low regardless of drug use. Abstracting from sample variability, E[y|z =
1, d = 1] = 3.267, and E[y|z = 0, d = 1] = 2.703, whereas
Pr[z = 1|d = 1] = 0.192. If exogenous treatment selection (i.e.,
z ⊥ {y(0), y(1)}) and ignorability of the observations with missing treatment data (i.e. d ⊥ {y(0), y(1), z}) is assumed, then it

Molinari: Missing Treatments

89

Table 1. NLSY descriptive statistics (n = 1345)
Point est.

95% CI

Probability of missing treatments: Pr[d = 0]

0.086

[0.071, 0.100]

Average number of weeks of unemployment in subpopulation for whom
treatments are observed and drugs are used: E[y|z = 1, d = 1]

3.267

[2.317, 4.217]

2.703

[2.303, 3.104]

2.811

[2.440, 3.182]

1.930

[0.914, 2.947]

0.564

[−0.466, 1.593]

−0.108

[−0.654, 0.438]

Probability of drug use given observable treatments: Pr[z = 1|d = 1]

Average number of weeks of unemployment in subpopulation for whom
treatments are observed and drugs are not used: E[y|z = 0, d = 1]

Average number of weeks of unemployment in subpopulation for whom
treatments are observed: E[y|d = 1]

Average number of weeks of unemployment in subpopulation for whom
treatments are not observed: E[y|d = 0]

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

ATE assuming exogenous treatment selection and treatments missing
completely at random: E[y|z = 1, d = 1] − E[y|z = 0, d = 1]
STE assuming exogenous treatment selection and treatments missing
completely at random: E[y|z = 0, d = 1] − E[y|d = 1]

can be concluded that the ATE is equal to 0.564 weeks, and
that the STE comparing a mandatory policy of no illegal drugs
use to the status quo is equal to −0.108 weeks. This implies
that not using drugs decreases the average number of weeks
of unemployment in a calendar year by little more than onehalf of a week compared with using drugs, and by a little more
than one-tenth of a week compared with the status quo. Table 1
summarizes these descriptive statistics, along with their 95%
confidence intervals.
Testing for Missing Completely at Random and for Validity of the Complete-Case Analysis. When facing missing-data
problems as the one described earlier, researchers often conduct “complete-case” analyses, in which observations with any
missing values are simply discarded. In the context of treatment effects, when the treatment data are partially unobservable, this can lead to valid inference if (a) the treatment data are
MCAR, (b) d ⊥ {(y(0), y(1))|x}, and (c) d ⊥ {(y(0), y(1), z)|x}.
Note that assuming that P[y(t)|z, d, x] = P[y(t)|z, x] alone does
not imply validity of the CC analysis. Validity would be ensured if it also is assumed that Pr(z = t|d, x) = Pr(z = t|x).
The assumption d ⊥ {(y(0), y(1))|x}, is not testable and generally does not seem appealing. For example, if Pr(z = t|d, x) =
Pr(z = t|x) and P[y(t)|z, d, x] = P[y(t)|z, x], then it follows
that P[y(t)|x, d = 1] = P[y(t)|x, d = 0]. Similarly, if Pr(z =
 P[y(t)|z, x], then it
t|d, x) = Pr(z = t|x) and P[y(t)|z, d, x] =
again follows that P[y(t)|x, d = 1] =
 P[y(t)|x, d = 0].
On the other hand, a necessary condition for the assumption
d ⊥ {(y(0), y(1), z)|x} and the MCAR assumption to hold can
be tested. In particular, the following result is easy to verify:
Lemma 7. Suppose that d ⊥ {(y(0), y(1), z)|x}. Then
P[y|d = 1, x] = P[y|d = 0, x].

(17)

Clearly, if the observability of the treatments is independent of the response functions and also of the received treatments conditional on the observed covariates, then, conditional
on the observed covariates, the distribution of realized outcomes for the subpopulation for which the received treatments
are observable should be the same as that for the subpopulation with missing treatment data. In practice, before conducting

0.192

[0.170, 0.214]

CC analyses, one can test whether the equality in (17) holds.
Note that if the assumptions P[y(t)|z, d, x] = P[y(t)|z, x] and
P(z|d, x) = P(z|x) are maintained, then condition (17) should
hold, and thus the same test as described in Lemma 7 can be
performed. But in both cases, condition (17) is necessary but
not sufficient for the CC analysis to be valid or for the MCAR
assumption to hold.
Given the NLSY sample, for x = {employed in 1983}, a
Wilcoxon rank-sum test rejects the equality in (17) at the 5%
significance level. Conditioning on additional covariates, the
equality in (17) again can be rejected. For example, if we look at
the group of respondents employed in 1983 who were younger
than 23 and with at least a high school degree, then the null is
rejected at 5% significance level.
6.2 Results
Tables 2–4 report respectively the bounds on the average outcome under a mandatory policy of no illicit drug use (MP(0)),
the bounds on the ATE, and the bounds on the STE comparing MP(0) with the status quo, along with the Beresteanu and
Molinari (2008; BM henceforth) 95% confidence intervals. The
bounds are obtained under different sets of assumptions on
(a) the ignorability of the missing treatments (complete cases
vs all cases), (b) the probability of treatment for the subpopulation with missing treatments, and (c) treatment selection and
treatment response. Before discussing the results, I briefly outline how the confidence intervals are constructed.
Construction of the Confidence Intervals. Confidence intervals which asymptotically cover the identification regions derived in Sections 3 and 4 with a prespecified probability 1 − α
can be obtained using the results of BM (a conceptually different type of confidence regions that asymptotically cover the true
parameter of interest, rather than its identification region, with
probability at least 1 − α can be obtained by using the results
of Imbens and Manski 2004). For the special case of interval
identified parameters, as those studied in this article, Beresteanu
and Molinari (2008, thm. 3.1) established the asymptotic equivalence between these confidence intervals, which are based on

90

Journal of Business & Economic Statistics, January 2010

Table 2. Bounds and BM CIs for E[y(0)]
Assumptions on treatment response and selection

Complete cases
95% CI

No assumptions

MTR

MTR–MTS

ETS

[2.185, 12.162]
[1.2762, 13.0701]

[2.185, 2.811]
[1.8345, 3.1614]

[2.703, 2.811]
[2.3198, 3.1948]

2.703
[2.3025, 3.1038]

[1.998, 2.736]
[1.6647, 3.0691]
[1.998, 2.736]
[1.6647, 3.0691]
[1.998, 2.736]
[1.6647, 3.0691]
[1.999, 2.736]
[1.6658, 3.0696]
[2.163, 2.736]
[1.8290, 3.0699]

[2.472, 2.736]
[2.1155, 3.0927]
[2.472, 2.736]
[2.1155, 3.0927]
[2.472, 2.736]
[2.1155, 3.0927]
[2.474, 2.736]
[2.1106, 3.0994]
[2.637, 2.736]
[2.2780, 3.0952]

[2.473, 2.876]
[2.0996, 3.2500]
2.703
[2.3140, 3.0924]
[2.474, 2.876]
[2.1002, 3.2499]
[2.474, 2.676]
[2.1085, 3.0418]
2.623
[2.2612, 2.9850]

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

All cases, with the following assumptions on p
p ∈ [0, 1]
[1.998, 15.568]
95% CI
[1.0478, 16.5180]
p=1
[1.998, 15.568]
95% CI
[1.0478, 16.5180]
SAR
[1.998, 15.568]
95% CI
[1.0478, 16.5180]
TMR
[1.999, 12.140]
95% CI
[1.0858, 13.0535]
p=0
[2.163, 11.287]
95% CI
[1.3196, 12.1303]

Table 3. Bounds and BM CIs for the ATE: E[y(1)] − E[y(0)]
Assumptions on treatment response and selection
No assumptions

MTR

MTR–MTS

ETS

[−11.535, 40.465]
[−12.6125, 41.5425]

[0, 40.465]
[0, 41.3918]

[0, 0.564]
[0, 1.4001]

0.564
[−0.4659, 1.5933]

All cases, with the following assumptions on p
p ∈ [0, 1]
[−14.995, 41.451]
95% CI
[−16.0626, 42.5191]
p=1
[−14.830, 37.170]
95% CI
[−15.9630, 38.3035]
SAR
[−14.887, 40.760]
95% CI
[−15.9826, 41.8558]
TMR
[−11.567, 40.760]
95% CI
[−12.6443, 41.8378]
p=0
[−10.714, 41.286]
95% CI
[−11.7012, 42.2737]

[0, 41.451]
[0, 42.2874]
[0, 37.170]
[0, 38.1396]
[0, 40.760]
[0, 41.6815]
[0, 40.760]
[0, 41.6815]
[0, 41.286]
[0, 42.1271]

[0, 4.962]
[0, 5.9571]
[0, 0.681]
[0, 1.4408]
[0, 1.366]
[0, 2.2213]
[0, 1.366]
[0, 2.2267]
[0, 4.797]
[0, 5.7687]

[−0.563, 1.389]
[−1.4721, 2.2986]
0.126
[−0.6850, 0.9367]
[−0.563, 1.366]
[−1.4802, 2.2837]
[0.311, 1.366]
[−0.6760, 2.3535]
0.644
[−0.3617, 1.6494]

Complete cases
95% CI

Table 4. Bounds and BM CIs for the STE: E[y(0)] − E[y]
Assumptions on treatment response and selection
No assumptions

MTR

MTR–MTS

ETS

[−0.628, 9.350]
[−1.4891, 10.2127]

[−0.628, 0]
[−0.7808, 0]

[−0.108, 0]
[−0.2696, 0]

−0.108
[−0.6540, 0.4376]

All cases, with the following assumptions on p
p ∈ [0, 1]
[−0.738, 12.832]
95% CI
[−1.6890, 13.7826]
p=1
[−0.738, 12.832]
95% CI
[−1.6890, 13.7826]
SAR
[−0.738, 12.832]
95% CI
[−1.6890, 13.7826]
TMR
[−0.737, 9.404]
95% CI
[−1.6044, 10.2716]
p=0
[−0.573, 8.551]
95% CI
[−1.3897, 9.3674]

[−0.738, 0]
[−0.8989, 0]
[−0.738, 0]
[−0.8989, 0]
[−0.738, 0]
[−0.8989, 0]
[−0.737, 0]
[−0.9000, 0]
[−0.573, 0]
[−0.7145, 0]

[−0.264, 0]
[−0.4265, 0]
[−0.264, 0]
[−0.4265, 0]
[−0.264, 0]
[−0.4265, 0]
[−0.262, 0]
[−0.4287, 0]
[−0.099, 0]
[−0.2469, 0]

[−0.263, 0.140]
[−0.4537, 0.3312]
−0.033
[−0.2503, 0.1846]
[−0.262, 0.140]
[−0.4567, 0.3347]
[−0.262, −0.060]
[−0.4518, 0.1300]
−0.113
[−0.2876, 0.0617]

Complete cases
95% CI

Molinari: Missing Treatments

91

inverting a Wald-type statistic, and the confidence intervals proposed by Chernozhukov, Hong, and Tamer (2007), which are
based on inverting a QLR-type test statistic.
Denoting by ϑˆ L and ϑˆ U the estimated lower and upper bound
for a certain parameter of interest, with ϑL and ϑU their population counterparts, all what is required is that





d
z−1
ϑˆ L − ϑL
N ˆ

∼ N(0, ).
z1
ϑU − ϑ U
The confidence interval is then of the form [ϑˆ L −

cˆ αN ˆ

, ϑU
N

+

Downloaded by [Universitas Maritim Raja Ali Haji] at 00:14 12 January 2016

cˆ αN

], where cˆ αN is an estimator of cα = inf{t : Pr{max{(z−1 )+ ,
N
(z1 )− } > t} ≥ α} obtained through a standard bootstrap proce-

dure. This confidence interval has the property that



cˆ αN
cˆ αN
ˆ
ˆ
lim Pr [ϑL , ϑU ] ⊆