07350015%2E2012%2E741053

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Examining the Distributional Effects of Military
Service on Earnings: A Test of Initial Dominance
Christopher J. Bennett & Ričardas Zitikis
To cite this article: Christopher J. Bennett & Ričardas Zitikis (2013) Examining the Distributional
Effects of Military Service on Earnings: A Test of Initial Dominance, Journal of Business &
Economic Statistics, 31:1, 1-15, DOI: 10.1080/07350015.2012.741053
To link to this article: http://dx.doi.org/10.1080/07350015.2012.741053

Accepted author version posted online: 06
Nov 2012.

Submit your article to this journal

Article views: 836

View related articles


Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]

Date: 11 January 2016, At: 21:58

Examining the Distributional Effects of Military
Service on Earnings: A Test of Initial Dominance
Christopher J. BENNETT
Department of Economics, Vanderbilt University, VU Station B #351819, 2301 Vanderbilt Place,
Nashville, TN 37235-1819 (chris.bennett@vanderbilt.edu )

Ričardas ZITIKIS

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

Department of Statistical and Actuarial Sciences, University of Western Ontario, London,
Ontario N6A5B7, Canada (zitikis@stats.uwo.ca)
Existing empirical evidence suggests that the effects of Vietnam veteran status on earnings in the decadeand-a-half following service may be concentrated in the lower tail of the earnings distribution. Motivated
by this evidence, we develop a formal statistical procedure that is specifically designed to test for lower tail

dominance in the distributions of earnings. When applied to the same data as in previous studies, the test
reveals that the distribution of earnings for veterans is indeed dominated by the distribution of earnings
for nonveterans up to $12,600 (in 1978 dollars), thereby indicating that there was higher social welfare
and lower poverty experienced by nonveterans in the decade-and-a-half following military service.
KEY WORDS: Causal effect; Crossing point; Hypothesis test; Potential outcome; Stochastic dominance;
Treatment effect.

1.

Abadie (2002) also exploited the variation in the draft lottery
to identify the causal effect of service on earnings, focusing on
the effects of Vietnam veteran status over the entire distribution
of earnings. Abadie (2002) reported no statistically significant
difference between the distributions of earnings for nonveterans
and veterans. However, Abadie’s (2002) empirical analysis does
suggest that the effects on earnings may be isolated to the lower
quantiles of the earnings distributions.
Influenced by these findings, here we give a further look at the
problem and investigate, roughly speaking, the range of income
levels over which there are statistically significant differences in

the distributions of potential earnings. A rigorous formulation
of the new test, the corresponding statistical inferential theory,
and subsequent empirical findings make up the main body of
the present article.
To give an indication of our findings, we note at the outset
that the empirical evidence points to a distribution of earnings
for nonveterans that stochastically dominates the corresponding
distribution of earnings for veterans for all income levels up
to $12,600. This finding lends support to the aforementioned
observation by Abadie (2002) concerning the isolated effects on
the distribution of earnings. Furthermore, our findings suggest
that, for any poverty line up to $12,600 (in 1978 dollars), there
was statistically greater poverty among Vietnam veterans in the
early 1980s.
The rest of the article is organized as follows. In Section 2,
we formulate hypotheses pertaining to the notion of “initial
dominance,” explain the intuition behind these hypotheses, develop a large-sample theory for the corresponding test statistic, and illustrate the performance of the resulting test in a

INTRODUCTION


Measuring and analyzing the effects of participation and treatment play important roles in evaluating the impact of various
programs and policies, particularly in the health and social
sciences (Heckman and Vytlacil 2007; Morgan and Winship
2007).
The social and economic costs of military service, for example, have drawn considerable attention from researchers and policy makers, in part due to continuing military engagements and
questions surrounding adequate compensation of veterans (e.g.,
Angrist and Chen 2008; Chaudhuri and Rose 2009, and references therein). Indeed, a key policy-related question is whether
military service tends to reduce earnings over the life-cycle.
In this article, we focus specifically on the effect that military service in Vietnam had on subsequent earnings. Existing
empirical evidence (Angrist 1990; Abadie 2002), for example,
suggests that the effects of veteran status on earnings following
service in Vietnam may have been concentrated in the lower tail
of the earnings distribution. Aided with a new statistical inferential technique developed in this article, we examine the effect
of military service on the overall distribution of earnings and on
the lower tail in particular, in the decade-and-a-half following
the end of war in Vietnam.
Historically, the nonrandom selection for military service has
posed a significant challenge for those researching and analyzing
causal effects of service. To overcome the selection problem,
Angrist (1990) exploited the exogenous variation in the draft

lottery to instrument for veteran status, thereby allowing for
unbiased estimation of the average causal effect of service on
earnings. Angrist (1990) found, for example, that white male
veterans experienced roughly a 15% average loss in earnings
in the early 1980s. More recently, Angrist and Chen (2008)
reported that these losses in earnings dissipated over time and
appear to be close to zero by the year 2000.

© 2013 American Statistical Association
Journal of Business & Economic Statistics
January 2013, Vol. 31, No. 1
DOI: 10.1080/07350015.2012.741053
1

2

Journal of Business & Economic Statistics, January 2013

simulation study. In Section 3, we apply the new test to reexamine the distributional effects of military service on civilian
earnings. Section 4 contains concluding notes. Proofs and other

technicalities supporting the new test and its large-sample properties are relegated to Appendices A and B. Monte Carlo simulations of a more detailed nature and other supporting empirical
material are given in Appendices C and D.

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

2.

A TEST OF INITIAL DOMINANCE

As we noted in the introduction, uncovering relationships
between distributions of potential earnings for nonveterans (F)
and veterans (G) is of considerable interest. For example, if
F ≤ G on [0, ∞), then rather powerful statements can be made
concerning the comparative levels of poverty and social welfare
among the groups. Specifically, if the aforementioned relationship between the two cumulative distribution functions (cdf’s)
F and G were to hold, then income poverty would be greater
for veterans according to any poverty index that is symmetric
and monotonically decreasing in incomes. Similarly, it is well
known (see, e.g., Davidson and Duclos 2000) that social welfare
as measured by any social welfare function that is symmetric

and increasing in incomes would be greater for nonveterans than
for veterans.
Such interest in relationships between two cdf’s has given
rise to a large literature on statistical testing procedures, and
tests emerging from such constructions are usually associated
with the names of Kolmogorov, Smirnov, Cramér, Anderson,
and Darling in various combinations. The use of these tests—
whether two sided or one sided—presupposes that a restriction
on the nature of the difference (or lack of difference) between
two cdf’s must hold over their entire supports, or at least over a
prespecified subset of the supports. In other words, the classical
formulations impose global restrictions on the nature of the
difference between the cdf’s.
In many contexts, however, it is useful to learn both if and
where a restriction holds, particularly when the restriction may
hold only over some (unknown) subset of the supports. For
example, suppose that we are interested in comparing poverty
across two income distributions using the headcount ratio, which
is the proportion of the population with incomes at or below a
given poverty line. Because it is often difficult to reach a consensus on a specific value that will be used to demarcate the poverty

line, an attractive procedure would be the one that would identify the maximal income level x1 and hence the interval [0, x1 ]
over which the poverty ranking implied by the headcount ranking is consistent. If, for example, x1 is found to be sufficiently
large by such a procedure, say so large as to constitute an upper
bound for any reasonable choice of poverty line, then policy
makers may reach a consensus as to the poverty ranking even
without having reached a consensus on the poverty line.
Coming now back to our underlying example of Vietnam
veterans, in the context of comparing earnings distributions for
the veterans and nonveterans, one distribution may not dominate
another over the entire support (Figure 1) and yet the relations
that hold between these two cdf’s may still be sufficient for
establishing similarly powerful statements about poverty and
social welfare rankings.

Figure 1. Plots of the empirical cdf’s (top panel) and their difference (bottom panel) of potential earnings of nonveterans and veterans,
who are compliers. (Definitions of the corresponding population cdf’s,
denoted by F0C and F1C , will be given by Equation (3.2).

To tackle such empirical questions, in the following four
sections we develop a statistical procedure that will enable us

to infer the range over which a set of dominance restrictions
between two cdf’s is satisfied. The procedure will allow us to
(i) infer dominance with probability tending to 1 as the sample
size tends to infinity whenever dominance is indeed present
in the population, and (ii) consistently estimate the range over
which this dominance holds. Thus, our testing procedure will
help us to identify situations in which F dominates G, and to
also differentiate between situations in which this dominance
holds over the entire support or only over some initial range.

2.1 Hypotheses
Let F and G be two cdf’s, both continuous and with supports
on [0, ∞). Of course, we have in mind the cdf’s of potential
earnings for nonveterans (F) and veterans (G), but the discussion
that follows is for generic F and G, unless specifically noted
otherwise.
To facilitate our discussion of the hypotheses to be tested, we
first formalize the notion of initial dominance along with the
corresponding notion of the maximal point of initial dominance
(MPID).

Definition 2.1. We say that F initially dominates G up to a
point x1 > 0 when F (x) ≤ G(x) for all x ∈ [0, x1 ) with F (x) <
G(x) for some x ∈ (0, x1 ). We call x1 the maximal point of
initial dominance when F initially dominates G up to x1 and
F (x) > G(x) for all x ∈ (x1 , x1 + ǫ) for some ǫ > 0 sufficiently
small.

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

Bennett and Zitikis: Examining the Distributional Effects of Military Service on Earnings

3

Figure 2. Illustrations of the null H0 and the alternative H1 . (a) Configuration in the null: F does not initially dominate G. (b) Configuration
in the null: F does not initially dominate G. (c) Configuration in the alternative: F initially dominates G and the maximal point of dominance x1
is finite. (d) Configuration in the alternative: F initially dominates G and the maximal point of dominance x1 is +∞.

Note 2.1. Definition 2.1 includes the possibility that x1 =
∞, in which case the interval (x1 , x1 + ǫ) is empty and thus
the condition F (x) > G(x) for all x ∈ (x1 , x1 + ǫ) is vacuously

satisfied.
Our objective in this article is (i) to infer whether F initially
dominates G up to some point; and (ii) in the event that initial
dominance indeed holds, to produce an estimate of the MPID.
Accordingly, we formulate the null and alternative hypotheses
as follows:
H0 : F does not initially dominate G;
H1 : F initially dominates G.
Translating H0 and H1 into formal mathematical statements,
which we shall need later in our construction of statistical tests,
gives rise to:
H0 : F = G or else there exists ǫ > 0 such that F (x) ≥ G(x)
for all x ∈ [0, ǫ) with F (x) > G(x) for some x ∈ (0, ǫ);
H1 : There exist x1 ∈ (0, ∞] and ǫ > 0, such that F (x) ≤
G(x) for all x ∈ [0, x1 ) with some x ∗ ∈ (0, x1 ) such that
F (x ∗ ) < G(x ∗ ), and F (x) > G(x) for all x ∈ (x1 , x1 + ǫ).
The null hypothesis H0 includes all possible configurations
of F and G for which F does not initially dominate G. This

occurs, for example, whenever the configurations of F and G
are such that F starts out above G. Two such configurations are
illustrated in panels (a) and (b) of Figure 2. First, in panel (a),
the distribution F starts out above G and then crosses below,
while in panel (b), F once again starts out above G but in this
case it remains there over the entire support.
Also, F does not initially dominate G when equality holds
between the distributions so that
F (x) = G(x)

for all

x ∈ [0, ∞).

(2.1)

Consequently, the statement of the null hypothesis H0 also includes this configuration, which is at the “boundary” between
the null H0 and the alternative H1 . We will see below that this
“boundary” case plays a special role when calculating critical
values of the new test.
The alternative hypothesis H1 includes all possible configurations of F and G for which F initially dominates G. The
parameter x1 , which appears under the statement of the alternative, corresponds to the MPID as defined above in Definition 2.1.
When F initially dominates G, the MPID x1 , which is generally
unknown, may be anywhere to the right of zero. For example,
when F initially dominates G and x1 is finite, then it must be the
case that the cdf’s cross at x1 (see, e.g., panel (c) of Figure 2).

4

Journal of Business & Economic Statistics, January 2013

Alternatively, when F initially dominates G and x1 = ∞, then
initial dominance holds over the entire supports and we have
F dominating G in the classical sense of stochastic dominance.
This latter situation is depicted in panel (d) of Figure 2.
The alternative of initial dominance is of particular interest
in our investigation of earnings and poverty, because finding
initial dominance would, for example, imply that a broad class
of poverty measures agree on a poverty ranking (Davidson and
Duclos 2000). As a concrete example, consider the popular
Foster, Greer, and Thorbecke (1984) poverty measure defined
by

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

πα,x (F ) = E[g α (X; x)1{X ≤ x}],

(2.2)

where X with the cdf F is income, x ≥ 0 is the poverty line,
g(y; x) = (x − y)/x is the normalized income shortfall, and
α ≥ 0 is an indicator of “poverty aversion.” If F initially dominates G, then for every level of poverty aversion, the FGT measure records πα,x (F ) ≤ πα,x (G) for all poverty lines 0 ≤ x ≤ x1
with strict inequality holding for some poverty lines 0 ≤ x ≤ x1 .
Initial dominance thus guarantees that every FGT poverty measure records poverty to be at least as great in G for any poverty
line in the interval (0, x1 ) and strictly greater in G for some
poverty lines in the interval (0, x1 ).

To construct a test for H0 against H1 , we employ a onedimensional parameter θ such that the hypotheses can be reformulated as follows:
(2.3)

To define θ , we first introduce an auxiliary function:
 y
H (y) =
(F (x) − G(x))+ dx,
0

where z+ = 0 when z ≤ 0 and z+ = z when z ≥ 0. The function
H (y) is nonnegative, takes on the value 0 at y = 0, is nondecreasing, and has a finite limiting value H (∞) as long as
both F and G have finite first moments. Next, we define the
generalized inverse of H (y)
H −1 (t) = inf{y ≥ 0 : H (y) ≥ t},
which is a nondecreasing and left-continuous function that may
have jumps (e.g., a jump at t = 0). We then use this generalized
inverse to define the point
x1 = lim H −1 (t).
t↓0

(2.4)

The point x1 will play a pivotal role in our subsequent considerations because when F initially dominates G, then x1 as defined
in Equation (2.4) is equivalent to the MPID as introduced in
Definition 2.1. Of course, the parameter x1 in Equation (2.4)
remains well defined when F does not initially dominate G (i.e.,
under H0 ); however, it is no longer interpretable as an MPID in
this case.
With this point x1 , we define θ by the formula
 x1
(G(x) − F (x))+ dx.
θ=
0

2.3 Statistics and Their Large Sample Properties
We next construct an estimator for θ . To begin with, let
X1 , . . . , Xn be independent random variables from F, and let
Y1 , . . . , Ym be independent random variables from G. These random variables are also assumed to be independent between the
samples. (We refer to Linton, Song, and Whang (2010) for techniques designed to deal with correlated samples.) Denote the
corresponding empirical cdf’s by Fn and Gm , and then define
an estimator of H (y) by the formula
 y
(Fn (x) − Gm (x))+ dx.
Hm,n (y) =
0

The corresponding generalized inverse is
−1
Hm,n
(t) = inf{y ≥ 0 : Hm,n (y) ≥ t}.

2.2 Parametric Reformulation

H0 : θ = 0
H1 : θ > 0.

(Note that the integrand in the definition of θ is the positive
part of G(x) − F (x), unlike the integrand in the definition of
H (y), which is the positive part of F (x) − G(x).) Under the
null H0 , we have θ = 0 irrespectively of whether x1 is finite or
infinite. Under the alternative H1 , the parameter θ is (strictly)
positive because F (x) dips strictly below G(x) at least at one
point x ∈ (0, x1 ). These are the properties of θ postulated in
Equation (2.3).

We next define an estimator xm,n ≡ xm,n (δm,n ) of the point x1
by the formula
−1
xm,n = Hm,n
(δm,n ),

where δm,n > 0 is a “tuning” parameter such that

nm
δm,n ↓ 0 and
δm,n → ∞
n+m

(2.5)

when min{m, n} → ∞. Hence, this tuning parameter cannot be
overly small, nor too large, and thus in a sense resembles the
“bandwidth” choice in the classical kernel density estimator. We
shall elaborate on practical choices of δm,n below, particularly
when discussing simulation results and analyzing the veteran
data. We note that, throughout this article, we let m and n tend
to infinity in such a way that
m
→ η ∈ (0, ∞)
(2.6)
n+m

meaning that the two sample sizes are “comparable,” which is a
standard assumption when dealing with two samples.
The estimator xm,n of x1 < ∞ as formulated above is designed to detect the first point after which the sample analogue
of F (i.e., Fn ) crosses above the sample analogue of G (i.e.,
Gm ). Ideally, this estimator will consistently estimate the parameter x1 < ∞, which is equal to the MPID under H1 , and
otherwise x1 (finite or infinite) is well-defined though no longer
interpretable as an MPID under H0 . When working with these
sample analogues, we thus allow for some “wiggle room” in
sense that Fn is said to cross above Gn at xm,n only if
the
xm,n
(Fn (z) − Gm (z))+ dz exceeds the threshold parameter δm,n .
0
The reason why we cannot set δm,n to zero is because the function Fn − Gm is “rough” (think here very appropriately of the
Brownian bridge) even when the population cdf’s F and G are
identical on [0, ∞).

Bennett and Zitikis: Examining the Distributional Effects of Military Service on Earnings

The choice of δm,n clearly influences how sensitive the estimator of x1 is to regions for which Fn is observed to cross above
Gm . Taking δm,n to be small means that δm,n will, roughly speaking, pick off the first point in the sample at which Fn crosses
above Gm as the estimate of x1 . On the other hand, when δm,n
is chosen to be large, then Fn will be allowed to cross above
Gm for some time before the threshold is reached and xm,n is
declared. We shall elaborate on this point in the next section.
With the estimator xm,n of x1 in hand, we next define an
estimator θm,n of θ by
 xm,n
θm,n =
(Gm (x) − Fn (x))+ dx.

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

0

The following six theorems make up our large-sample asymptotic theory for the estimators xm,n and θm,n . We begin by investigating the consistency of the estimator xm,n under both the
null and alternative hypotheses.
Theorem 2.1. When F = G on [0, ∞) or, more generally,
when F ≤ G on [0, ∞), then x1 = ∞ and xm,n →P ∞ when
min{m, n} → ∞.

Theorem 2.2. Under the alternative H1 , and also under the
null H0 with the exception of the case when F ≤ G on [0, ∞)
(which is covered by Theorem 2.1), we have that the point x1 is
finite and xm,n →P x1 when min{m, n} → ∞.
We next investigate the asymptotic behavior of the suitably
scaled version of the estimator θm,n , namely

nm
m,n =
θm,n .
n+m
Theorem 2.3. Suppose that X and Y have 2 + κ finite moments for some κ > 0, no matter how small. When F = G on
[0, ∞), then
 ∞
m,n →d
(B(F (x)))+ dx,
(2.7)
0

where B denotes the standard Brownian bridge.

Theorem 2.3 has assumed a little bit more than two finite
moments of F and G, but this is a negligible assumption given
the underlying problem.
Theorem 2.4. Under the null H0 , with the exception of the
case F = G on [0, ∞) that has been covered by Theorem 2.3,
we have that
 x1
m,n →d
(B(F (z)))+ dz,
(2.8)
0

where the point x1 , calculated using Equation (2.4), is finite.
Hence, the limiting integral in Equation (2.8) is smaller than
that in Equation (2.7), which proves the least-favorable nature
of the critical values calculated under the (null) sub-hypothesis
F = G on [0, ∞).

The remaining two theorems shed further light on the behavior of m,n . The first theorem deals with the case when x1 is
finite.
Theorem 2.5. Under the alternative H1 , and assuming that the
point x1 is finite, we have that m,n →P ∞ when min{m, n} →
∞.

5

The final theorem of this section deals with the case when the
point x1 is infinite.
Theorem 2.6. Suppose that X and Y have 2 + κ finite moments for some κ > 0, no matter how small. When F ≤ G
but F
= G on the entire [0, ∞), and thus x1 = ∞, then
m,n →P ∞ when min{m, n} → ∞.
The proofs of the above theorems are given in Appendix B.
The next section outlines the practical implementation of the
above-developed statistical inferential theory.
2.4

Practical Implementation

In view of the above theory, we retain the null H0 when the
value of m,n is small and reject when it is large, with “small”
and “large” determined by a critical value calculated using the
following bootstrap algorithm (cf., e.g., Section 3.7.2 of van
der Vaart and Wellner 1996; Barrett and Donald 2003; Horvath,
Kokoszka, and Zitikis 2006):
Algorithm 2.1. The following steps produce the bootstrap critical value cm,n (α):
1. Form the pooled distribution Lm,n (x) = (mGm (x) +
nFn (x))/(m + n).
2. Generate mutually independent iid samples X1∗ , . . . , Xn∗ and
Y1∗ , . . . , Ym∗ from the cdf Lm,n , assuming that the values
X1 , . . . , Xn and Y1 , . . . , Ym are fixed, as is the case in practice, given data.
3. Compute

 xm,n

 ∗ 

 ∗
mn

m,n xm,n =
Gm (x) − Fn∗ (x) + dx,
m+n 0


where xm,n
is the bootstrap analogue of xm,n based on bootstrap samples generated in Step 2.

), 1 ≤
4. Repeat Steps 2 and 3 B times and record {∗m,n,b (xm,n,b
b ≤ B}.
5. The nominal α level critical value, which we denote by
cm,n (α), is then computed as the (1 − α) quantile of the distribution of bootstrap estimates from Step 4:


B


1  ∗  ∗ 
1 m,n,b xm,n,b ≤ c ≥ 1−α .
cm,n (α) = inf c :
B b=1

(2.9)

With the critical value cm,n (α), the decision rule at the nominal
level α is to reject H0 in favor of H1 , and thus infer that there is
“initial dominance” over the interval [0, xm,n ), whenever
m,n > cm,n (α).
For more details on the practical implementation of this bootstrap procedure, we refer to Barrett and Donald (2003); in our
simulation studies, we have also closely followed this article.
The above bootstrap procedure is designed to estimate the
critical value in the least favorable case (i.e., F = G on [0, ∞))
under the null. We note that it is also possible to implement
a bootstrap procedure that uses the estimator xm,n in place of

in Step 3 of Algorithm 2.1. However, implementing this
xm,n
modified bootstrap procedure would require us to incorporate

6

Journal of Business & Economic Statistics, January 2013

Andrews and Shi’s (2010) infinitesimal uniformity factor, or a
similar method used by Donald and Hsu (2010), to well control
the size of the test since both the test statistic and the bootstrap critical value based on ∗m,n (xm,n ) would converge to zero
whenever x1 = 0. Because the additional complexity (e.g., the
introduction of an additional tuning parameter) would invariably cause us to stray from the main contribution of this article,
we do not adopt such a procedure in the present article.
Theorem 2.7 shows that the bootstrap procedure outlined in
Algorithm 2.1 yields asymptotically valid tests that are also
consistent.
Theorem 2.7. Suppose that X and Y have 2 + κ finite moments for some κ > 0, no matter how small. Then we have
lim P[m,n > cm,n (α)|H0 ] ≤ α,

(2.10)

lim P[m,n > cm,n (α)|H1 ] = 1,

(2.11)

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

m,n

and
m,n

where the limits are taken with respect to the sample sizes m
and n tending to infinity in a way such that conditions (2.5) and
(2.6) are satisfied.
We next illustrate the performance of the proposed testing
procedure, with more detailed simulation results exhibiting

both the power and good finite sample control of the size
relegated to Appendix C. For this, we generate independent
log-normal samples X1 , . . . , Xn and Y1 , . . . , Ym according
to Xi = exp(σ1 Z1i + µ1 ) and Yi = exp(σ2 Z2i + µ2 ), where
the Zki ’s are independent standard normal random variables.
Various choices of the parameter-pairs (µi , σi ) are explored in
the simulation study, and they are specified in the legends of
the left-hand panels of Figure 3.
The figure illustrates the performance of the test under two
different scenarios:
Panels (a) and (b): F starts out above G and crosses below
at some later point, which is a null H0 configuration; see
panel (a) of Figure 2.
Panels (c) and (d): F lies below G up to a point and crosses
above thereafter, which is an alternative H1 configuration;
see panel (c) of Figure 2.
Hence, panels (a) and (b) are in the null H0 . Panels (c) and
(d), on the other hand, reflect a configuration of F and G in the
alternative H1 . The two left-hand panels (a) and (c) illustrate
the relationships between the cdf’s under consideration, and the
right-hand panels (b) and (d) illustrate the densities of xm,n for
different sample sizes, along with the corresponding empirical
rejection probabilities (EPRs). Recall that the point xm,n is computed prior to testing, irrespective of whether we have H0 or H1 .

Figure 3. Monte Carlo illustrations based on parent cdf’s (left-hand panels) with the corresponding density plots (right-hand panels) of xm,n
estimates, along with ERPs at the 5% nominal level. (a) A null H0 configuration with F = LN(0.7, 0.8) and G = LN(0.85, 0.6). (b) Density
plots of xm,n estimates. (c) An alternative H1 configuration with F = LN(0.85, 0.6) and G = LN(0.7, 0.8). (d) Density plots of xm,n estimates.

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

Bennett and Zitikis: Examining the Distributional Effects of Military Service on Earnings

But the point xm,n is referred to as the estimated MPID only
under the alternative H1 , which defines “initial dominance.”
We have conducted 1000 bootstrap resamples in each of
the 5000 Monte Carlo replications, setting δm,n = κ((m +
n)/(mn))1/2−ǫ with κ = 10−6 and ǫ = 0.01. (For more details on
the tuning parameter δm,n , its robustness with respect to choices
of κ and ǫ, we refer to Section 3.3 and Appendix D.) The following information can be gleaned from the plots. First, in panels
(a) and (b), which correspond to the null configuration in which
F is initially dominated by G, we see from the concentration of
the density plots about the origin that the initial dominance of G
over F is detected in virtually all of the Monte Carlo trials. More
importantly, we have not detected a single instance in which the
null would be erroneously rejected.
When there is a region of initial dominance as in panel (c) of
Figure 3, it is desirable that our test procedure have appreciable
power to detect the presence of dominance and to also return
a reliable estimate of the region over which initial dominance
holds. Panel (d) of Figure 3 reports that the test has considerable
power to detect the region of initial dominance even in samples
consisting of only 100 observations. As expected, the density
plots show that in such small samples there is a great deal
of variation in the estimated MPID about the true value, and
we also see that the densities of the estimated MPID become
increasingly concentrated around the true value as the sample
sizes are increased.
3.

VIETNAM VETERAN DATA REEXAMINED

Here we apply the new test to reexamine distributional effects of Vietnam veteran status on labor earnings. Abadie (2002)
showed that the potential distributions for veterans and nonveterans can be estimated for the subpopulation of compliers, and
he used Kolmogorov-Smirnov type tests to empirically examine
whether the earnings distribution of nonveterans stochastically
dominates that of veterans. Using the same dataset, we shall see
below that the new test allows us to make statistically significant
statements concerning the effect of veteran status on poverty.
3.1 The Potential-Outcomes Model
Let Z be a binary instrument, taking on the values 0 (not
draft eligible) and 1 (draft eligible) with some positive probabilities, and let D(0) and D(1) denote the values of the treatment
indicator D that would be obtained given Z = 0 and Z = 1,
respectively. Both D(0) and D(1) are random, taking on the values 0 (do not serve in the military) and 1 (serve in the military)
with some positive probabilities. The binary nature of both the
instrument and the treatment variables gives rise to four possible
types of individuals in the population:





Compliers when D(0) = 0 and D(1) = 1
Always-takers when D(0) = 1 and D(1) = 1
Never-takers when D(0) = 0 and D(1) = 0
Defiers when D(0) = 1 and D(1) = 0

Because our interest centers on the causal effect of military
service on earnings, let Y (1) denote the potential incomes for
individuals if they served (D = 1) and Y (0) denote the potential

7

incomes for the same individuals had they not served (D = 0).
In practice, the analyst observes only the realized outcomes
Y = Y (1)D + Y (0)(1 − D),

(3.1)

where D = D(1)Z + D(0)(1 − Z). Within this framework of
potential incomes, the distributional effect of military service
in the general population is captured by the difference between
the cdf’s of Y (0) and Y (1). However, neither of the two cdf’s
is identified because, among other things, the data are (i) uninformative about Y (0) in the subpopulation of always-takers,
and also (ii) uninformative about Y (1) for the subpopulation of
never-takers. Because of this identification problem, it is common in the treatment effects literature to focus the analysis on
the subpopulation of compliers. In the context of examining for
distributional treatment effects, this amounts to comparing the
conditional cdf’s F0C and F1C defined by
FkC (y) = P[Y (k) ≤ y| D(0) = 0, D(1) = 1],

(3.2)

where, naturally, we assume that P[D(0) = 0, D(1) = 1] > 0,
which means that compliers are present in the general population. We shall elaborate on the latter assumption at the end of
the next section.
3.2

An Underlying Theory

When comparing F0C and F1C , we are particularly interested in establishing whether (cf. the alternative H1 of “initial dominance”) there exists a level of income y1 such that
F0C (y) ≤ F1C (y) for all y ∈ [0, y1 ) with the inequality being
strict over some subset of [0, y1 ). Inferring the existence of such
an income level would allow us to make statistically significant
statements concerning poverty and social welfare orderings of
the two distributions.
We note in this regard that any member of the earlier noted
FGT (Foster, Greer, and Thorbecke 1984) class of poverty measures would indicate that poverty is at least as great or greater
among those in the subpopulation of compliers who served in the
military for any poverty line in the interval [0, y1 ). Similarly, for
any y ∗ ∈ [0, y1 ) and every monotonic utilitarian social welfare
function when applied to the censored distributions generated
by replacing incomes above y ∗ by y ∗ itself would rank the social welfare of nonveterans equal to or higher than veterans. We
refer to, for example, Foster and Shorrocks (1988) for details.
The task of testing for an initial region of dominance (compared to the alternative H1 ) among the subpopulation of compliers can be simplified by the fact that we do not actually need to
estimate the two complier cdf’s but rather examine the sign of
the difference between these cdf’s. Specifically, we shall show
in the next theorem that the relationship of initial dominance, or
lack thereof, between F0C and F1C can be investigated in terms
of two other cdf’s, namely F0 and F1 defined by
Fk (y) = P[Y ≤ y| Z = k].

(3.3)

Theorem 3.1. Assume that, for z = 0 and also for z = 1, the
triplet {Y (1), Y (0), D(z)} is independent of Z. If there are no

8

Journal of Business & Economic Statistics, January 2013

defiers, then for every y we have that


F0 (y) − F1 (y) = F0C (y) − F1C (y) P[D(0) = 0, D(1) = 1].

(3.4)

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

Hence, in particular, F0 (y) ≤ F1 (y) if and only if F0C (y) ≤
F1C (y).
The proof of Theorem 3.1 is relegated to Appendix A.
Hence, by Theorem 3.1, we have that the difference F0 − F1 is
proportional to the difference F0C − F1C , and the proportionality
coefficient is the population proportion P[D(0) = 0, D(1) = 1]
of compliers.
We conclude this section with a note that even though
we explicitly make only Condition 1(i) by Imbens and
Angrist (1994) in Theorem 3.1, our earlier made assumption P[D(0) = 0, D(1) = 1] > 0 of the presence of compliers in the general population together with the assumption of
no defiers in Theorem 3.1 imply part (ii) of Condition 1 by
Imbens and Angrist (1994); the latter condition means that
P[D(1) = 1] > P[D(0) = 1]. The proof that the condition holds
under the assumptions of the present article consists of just a
few lines:
P[D(1) = 1] = P[D(1) = 1, D(0) = 0]
+ P[D(1) = 1, D(0) = 1],
P[D(0) = 1] = P[D(0) = 1, D(1) = 0]
+ P[D(0) = 1, D(1) = 1].
The right-most probabilities in both equations are the same,
and the absence of defiers makes the middle probability in the
second equation vanish. Hence, subtracting the first equation
from the second one, we obtain the equation

adopted in our test of initial dominance. Figure 4 displays the
empirical distributions F0,n0 and F1,n1 . (Compare this figure with
C
Figure 1, where the latter depicts the empirical distributions F0,n
0
C
and F1,n1 .)
Applying the test for initial dominance to this data with
the conservative choice of parameter values κ = 10−6 and
ǫ = 10−2 , for example, yields an estimated MPID x1 of approximately $12,600 with a corresponding bootstrap p-value of
0.063. With a significance level α > 0.063, therefore, we would
reject H0 and conclude that F0C initially dominates F1C up to
$12,600, thereby implying that Vietnam veterans experienced
poverty at least as great as nonveterans based on any poverty
line up to $12,600. We note that this finding is also robust over
a reasonably wide set of choices for the tuning parameter δn0 ,n1 .
Indeed, an examination of the empirical results (see Appendix
D) over the grid of δn0 ,n1 ’s computed from combinations of
ǫ ∈ {10−j , 1 ≤ j ≤ 5} and κ ∈ {10−j , 1 ≤ j ≤ 6} reveals that
the crossing point estimates never fall below $12,600 and are
all associated with p-values no larger than 0.073.
The figure $12,600 is of course just an estimate of the population MPID, and it is therefore only natural when interpreting
the above statement to wonder about the statistical uncertainty
attached to this figure. To address the question of statistical uncertainty we would ideally report the standard error attached to
this estimate, or provide a confidence interval. However, producing such figures gives rise to a number of difficult problems
(see Note B.1 in Appendix B for more details) that are beyond
the scope of the current article. For example, because the MPID
is interpretable only under the alternative and is thus reported
only upon rejection of the null hypothesis, determining the sampling distribution of the MPID estimator first requires fixing a

P[D(1) = 1] − P[D(0) = 1] = P[D(0) = 0, D(1) = 1],
which proves the fact that, throughout the present article, we
work under the entire Condition 1 by Imbens and Angrist (1994).
Hence, this setup is equivalent to that by Abadie (2002) (see
Assumption 2.1 therein).
3.3 The Veteran Data and Test Results
In our empirical analysis, we use the CPS extract (Angrist
2011) that was especially prepared for Angrist and Krueger
(1992). This same dataset was also used by Abadie (2002). As
described in the latter article, the data consist of 11,637 white
men, born in 1950–1953, from the March Current Population
Surveys of 1979 and 1981–1985. For each individual in the
sample, annual labor earnings (in 1978 dollars), Vietnam veteran
status, and an indicator of draft eligibility based on the Vietnamera draft lottery outcome are provided. Following Angrist (1990)
and Abadie (2002), we use draft eligibility as an instrument for
veteran status. The construction of the draft eligibility variable
is described in Appendix C of the literature by Abadie (2002).
Additionally, a discussion of the validity of draft eligibility as
an instrument for veteran status may be found in the paper by
Angrist (2002).
The empirical counterparts of F0 and F1 are simpler to work
with, given the dataset that we are exploring, and are thus

Figure 4. Plots of the empirical cdf’s (top panel) and their difference
(bottom panel) of actual earnings of nondraft-eligible and draft-eligible,
who may or may not be compliers. (Definitions of the corresponding
population cdf’s F0 and F1 are in Equation (3.3).

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

Bennett and Zitikis: Examining the Distributional Effects of Military Service on Earnings

Figure 5. Density plots corresponding to estimates of the MPID.
The plots are generated using δn0 ,n1 = κ((n0 + n1 )/(n0 n1 ))0.49 based
on samples generated from log-normal distributions calibrated to the
data. The vertical dashed line marks the location of the MPID for the
calibrated distributions.

configuration under the alternative and then computing the distribution conditional on rejection. Second, the variability of the
MPID estimator will hinge critically on the rate of departure of
F from G to the right of the MPID x1 , measuring which would
require imposing conditions on the densities as well as on higher
derivatives of the cdf’s F and G. (See Note B.1 in Appendix B
for a further discussion on these points.)
In the absence of a formal theory, however, we have conducted a small simulation exercise to give a general sense of
the statistical uncertainty that might be attached to a point estimate of the MPID such as that reported above. This simulation
exercise involves drawing samples of size n0 and n1 from lognormal distributions that are calibrated to match the first and
second moments of F0,n0 and F1,n1 . Figure 5 plots the densities
arising from the use of δn0 ,n1 = κ/((n0 + n1 )/(n0 n1 ))0.49 with κ
set to 10−2 , 10−4 , and 10−6 (it suffices to examine the variation
in κ only because the sample sizes are fixed). The plots reveal
that the densities are rather insensitive to variation in κ. Also,
the plots show the mode of the density in each case to be around
$12,000, whereas the true MPID is around $14,800 for our chosen log-normal populations. Lastly, the standard errors in each
case are found to be around $4,000.
4.

CONCLUDING NOTES

In many practical situations, it is unlikely that dominance
relations hold over the entire support of the distributions under
consideration. However, a number of results in the literature
(e.g., Foster and Shorrocks 1988) demonstrates that even when
a dominance relation holds only on a subset of the support, we
may still obtain powerful orderings in terms of poverty or social
welfare. With this in mind, in this article we have developed
a statistical test that enables one to infer whether there exists
an initial region of stochastic dominance, and also to infer the
range over which the dominance relation holds. Notably, the test

9

can also be used to establish dominance over the entire support.
This occurs whenever dominance is indicated by our procedure
and there is an absence of crossing in-sample.
Our work on this statistical inference problem has been inspired by existing empirical evidence (Angrist 1990; Abadie
2002) suggesting that the effects of Vietnam veteran status on
earnings are concentrated in the lower tail of the earnings distribution. In particular, the inferential procedure developed in
the present article was specifically designed to test for the existence of lower tail dominance in the distributions of earnings.
When applied to the same data used in previous studies, our
test indicates that the distribution of earnings for veterans (i.e.,
G) is initially dominated by the distribution of earnings for
nonveterans (i.e., F) up to $12,600 in 1978 dollars, thereby suggesting that there was higher social welfare and lower poverty
experienced by nonveterans in the decade-and-a-half following
military service.
A number of interesting problems remain for future work.
Among them would be to develop a theory for the standard errors
attached to the MPID estimates, and establishing asymptotically
valid confidence intervals for this parameter of interest. Though
interesting, we anticipate these problems to be highly technically
demanding.
APPENDIX A: PROOF OF THEOREM 3.1
We first rewrite the underlying probabilities as expectations
of the corresponding indicator random variables and in this way
have the equation
F0 (y) − F1 (y) = E[1{Y ≤ y}|Z = 0] − E[1{Y ≤ y}|Z = 1].
Next we write each indicator 1{Y ≤ y} as the sum of 1{Y ≤ y}D
and 1{Y ≤ y}(1 − D), and using the additivity of expectations,
we then obtain the equation
F0 (y) − F1 (y) = E[1{Y ≤ y}D|Z = 0]
− E[1{Y ≤ y}D|Z = 1]
+ E[1{Y ≤ y}(1 − D)|Z = 0]
− E[1{Y ≤ y}(1 − D)|Z = 1]. (A.1)
Upon recalling definition (3.1) of the random variable Y, we
have from Equation (A.1) that
F0 (y) − F1 (y)
= E[1{Y (1)D(0) + Y (0)(1−D(0)) ≤ y}D(0)|Z = 0]
−E[1{Y (1)D(1) + Y (0)(1−D(1)) ≤ y}D(1)|Z = 1]
E[1{Y (1)D(0) + Y (0)(1−D(0)) ≤ y}(1 − D(0))|Z = 0]
−E[1{Y (1)D(1) + Y (0)(1−D(1)) ≤ y}(1 − D(1))|Z = 1].
(A.2)

Now we drop the conditioning from every expectation on the
right-hand side of equation (A.2) because, by our assumption,
for z = 0 and also for z = 1, the triplet {Y (1), Y (0), D(z)} is
independent of Z. In summary, we have that
F0 (y) − F1 (y) = A(y) + B(y),

(A.3)

10

Journal of Business & Economic Statistics, January 2013

where
A(y) = E[1{Y (1)D(0) + Y (0)(1 − D(0)) ≤ y}D(0)]
−E[1{Y (1)D(1) + Y (0)(1 − D(1)) ≤ y}D(1)]
and
B(y) = E[1{Y (1)D(0) + Y (0)(1 − D(0)) ≤ y}(1 − D(0))]
−E[1{Y (1)D(1) + Y (0)(1 − D(1)) ≤ y}(1 − D(1))].

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016

We proceed by splitting each of the expectations making up the
definitions of A(y) and B(y) into four parts, according to the
subdivision of the sample space into four groups as mentioned at
the beginning of Section 3.1. For example, for A(y) we have the
equation (with expectations converted back into probabilities)

−1
(δm,n ) ≥ M is equivalent to δm,n ≥ Hm,n (M), we
Since Hm,n
have that



−1

nm
nm
δm,n ≥
Hm,n (M) .
P Hm,n (δm,n ) ≥ M = P
n+m
n+m

In view of assumption (2.5), statement (B.1) holds provided that

nm
Hm,n (M) = OP (1),
n+m
which is a consequence of

 M
nm
| m,n (x)|dx = OP (1)
n+m 0

and



A(y) = P[Y (1) ≤ y|D(0) = 1, D(1) = 1]
× P[D(0) = 1, D(1) = 1]
+ P[Y (1) ≤ y|D(0) = 1, D(1) = 0]
× P[D(0) = 1, D(1) = 0]

− F1C (y)P[D(0) = 0, D(1) = 1]
− P[Y (1) ≤ y|D(0) = 1, D(1) = 1]
× P[D(0) = 1, D(1) = 1].
The first and fourth probabilities cancel out. Since we assume
that there are no defiers, which means that P[D(0) = 1, D(1) =
0] = 0, we finally arrive at the equation
A(y) = −F1C (y)P[D(0) = 0, D(1) = 1].
Analogously, we obtain that
B(y) =

F0C (y)P[D(0)

M



nm
n+m

0

(F (x) − G(x))+ dx ≤ 0.

This concludes the proof of statement (B.1) as well as that of
Theorem 2.1.

Proof of Theorem 2.2. Keeping in mind that the cdf’s F and
G are not identical throughout this proof, and irrespectively of
whether we are dealing with the null H0 or the alternative H1 ,
we always have x1 ∈ [0, ∞) and ǫ > 0 such that
(1) F (x) ≤ G(x) for all x ∈ (0, x1 ], and
(2) F (x) > G(x) for all x ∈ (x1 , x1 + ǫ).
We want to show that xm,n →P x1 , which is equivalent to the
−1
statement that P[|Hm,n
(δm,n ) − x1 | > γ ] → 0 for every γ > 0.
This follows if

−1
(δm,n ) > x1 + γ → 0
(B.2)
P Hm,n
and

= 0, D(1) = 1].

Plugging in the above expressions for A(y) and B(y) on the
right-hand side of Equation (A.3), we obtain Equation (3.4).
This concludes the proof of Theorem 3.1.

−1

P Hm,n
(δm,n ) ≤ x1 − γ → 0.

We first establish statement (B.2):

−1
(δm,n ) > x1 + γ
P Hm,n

= P[δm,n > Hm,n (x1 + γ )]


 x1 +γ
= P δm,n >
( m,n (z) + (F (z) − G(z)))+ dz

APPENDIX B: ASYMPTOTIC TECHNICALITIES

0

There are two goals that we accomplish in this section. First,
we need to prove the six theorems formulated in Section 2.3,
as they comprise the large-sample statistical inferential theory
for the test of “initial dominance.” Second, since the asymptotic
critical values depend on the unknown underlying distributions,
we also need to justify the bootstrap procedure formulated in
Section 2.4.
We start with the inferential theory. Throughout, we use the
notation
m,n (x) = (Fn (x) − F (x)) − (Gm (x) − G(x)).
Proof of Theorem 2.1. Since F ≤ G on [0, ∞) by assumption,
the function H is equal to 0 on the entire half-line [0, ∞), and
thus x1 = ∞. We shall next show that the estimator xm,n ≡
xm,n (δm,n ) converges to ∞ in probability, which means that, for
every M > 0,
−1

P Hm,n
(δm,n ) ≥ M → 1.
(B.1)

(B.3)



≤ P δm,n >

x1 +min{γ ,ǫ}
x1



≤ P δm,n > −
+



x1 +min{γ ,ǫ}
x1

x1 +min{γ ,ǫ}

x1

( m,n (z) + (F (z) − G(z)))+ dz

 m,n (z)|dz


(F (z) − G(z))+ dz .

The right-hand side converges to 0 because δm,n → 0,
 x1 +min{γ ,ǫ}
| m,n (z)|dz = oP (1)
x1

and


x1 +min{γ ,ǫ}
x1

(F (z) − G(z))+ dz > 0.

This concludes the proof of statement (B.2).



Bennett and Zitikis: Examining the Distributional Effects of Military Service on Earnings

To prove statement (B.3), we write:

11

Proof of Theorem 2.3. We begin by noting that the integral
in statement (2.7) is well defined and finite (almost surely),
because the cdf F has 2 + κ finite moments for some κ > 0, no
matter how small. By Markov’s inequality, this follows if we
verify that the expectation of the integral is finite, and we do
this next:
 ∞
 ∞
E[|B(F (x))|]dx
E[(B(F (x)))+ ]dx ≤

−1
(δm,n ) ≤ x1 − γ ]
P[Hm,n



nm
nm
δm,n ≤
Hm,n (x1 − γ )
= P
n+m
n+m


 x1 −γ
nm
nm
δm,n ≤
( m,n (z)
= P
n+m
n+m 0

+ (F (z) − G(z)))+ dz

0

0

Downloaded by [Universitas Maritim Raja Ali Haji] at 21:58 11 January 2016




 x1 −γ
nm
nm
δm,n ≤
| m,n (z)|dz , (B.4)
≤ P
n+m
n+m 0
where the right-most bound holds because F (z) ≤ G(z) for all
z ∈ (0, x1 − γ ] irrespectively of the value of γ > 0. The righthand side of bound (B.4) converges to 0 in probability because
of assumption (2.5) and the fact that

 x1 −γ
nm
| m,n (z)|dz = OP (1).
n+m 0
This concludes the proof of statement (B.3) as well as that of
Theorem 2.2.

Note B.1. We cannot theoretically derive confidence intervals
that would asymptotically maintain the prespecified level, and
the reason for this is two-fold: (i) the tuning parameter δm,n
influences the normalizing constant needed for establishing the
limiting distribution of the appropriately scaled and normalized
estimator xm,n in a way similar to the influence of the bandwidth
in the kernel density estimator, and (ii) the limiting distribution
depends in a crucial way on the rate of departure of F from G
to the