Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji jbes%2E2009%2E0001
Journal of Business & Economic Statistics
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Statistical Inference with Generalized Gini Indices
of Inequality, Poverty, and Welfare
Garry F. Barrett & Stephen G. Donald
To cite this article: Garry F. Barrett & Stephen G. Donald (2009) Statistical Inference with
Generalized Gini Indices of Inequality, Poverty, and Welfare, Journal of Business & Economic
Statistics, 27:1, 1-17, DOI: 10.1198/jbes.2009.0001
To link to this article: http://dx.doi.org/10.1198/jbes.2009.0001
Published online: 01 Jan 2012.
Submit your article to this journal
Article views: 173
View related articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 12 January 2016, At: 01:06
Statistical Inference with Generalized Gini
Indices of Inequality, Poverty, and Welfare
Garry F. BARRETT
School of Economics, University of New South Wales, Sydney, NSW, 2052 ([email protected])
Stephen G. DONALD
Department of Economics, University of Texas at Austin, Austin, TX, 78712 ([email protected])
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
This article considers statistical inference for consistent estimators of generalized Gini indices of inequality, poverty, and welfare. Our method does not require grouping the population into a fixed number
of quantiles. The empirical indices are shown to be asymptotically normally distributed using functional
limit theory. Easily computed asymptotic variance expressions are obtained using influence functions.
Inference based on first-order asymptotics is then compared with the grouped method and various
bootstrap methods in simulations and with U.S. income data. The bootstrap-t method based on our
asymptotic theory is found to have superior size and power properties in small samples.
KEY WORDS: Functional delta method; Generalized Gini index; Influence function.
1.
Important exceptions are the work of Cowell (1989) on the
traditional Gini coefficient and Bishop, Formby, and Zheng
(1997) and Xu (2006) on the ‘‘Sen poverty indices,’’ who use
results from the theory of U-statistics to derive the sampling
properties of consistent estimators of these indices. Zitikis and
Gastwirth (2002) and Zitikis (2003) consider an alternative
approach to the asymptotics of Gini inequality indices from
that pursued in Barrett and Donald (2000)—an earlier draft of
this article. Xu (2000) proposed a bootstrap method for testing
differences in the generalized Gini relative indices of
inequality. Since the Gini indices are not asymptotically pivotal, Xu (2000) considered a bias corrected, iterative variant of
the bootstrap. In this article, we derive asymptotic results using
influence functions that generalize the results derived by
Cowell (1989), Bishop et al. (1997), Zitikis and Gastwirth
(2002), and Zitikis (2003) to all members of the generalized
Gini class of inequality, poverty, and welfare indices.
In this article, we examine the asymptotic distribution of a
variety of inequality, poverty, and welfare measures that can be
written as statistical functionals of either the LC or generalized
Lorenz curve (GLC). We use results concerning the delta
method for statistical functionals to derive the asymptotic
properties of the generalized Gini indices. Our approach to
estimating standard errors, constructing confidence intervals
(CIs), and performing inference is, like that of Beach and
Davidson (1983), nonparametric since we impose no structure
on the underlying income distribution beyond weak regularity
conditions. Our approach to inference differs from that of
Beach and Davidson (1983) in that we use the influence
functions for the various objects. The use of the influence
function to study the statistical properties of estimators was
pioneered by Hampel (1974), and one of the first applications
in econometrics was by Cowell and Victoria-Feser (1996), who
examined the sensitivity of summary measures of inequality to
data contamination. As a by-product of our approach, we
INTRODUCTION
The work of Atkinson (1970), Kolm (1969), and Sen (1973)
generated new interest in the theory of inequality measurement.
Their contributions emphasized the duality between measures
of inequality (and poverty) and welfare, and led to a large literature proposing new indices of inequality explicitly derived
from the desired properties of the underlying social welfare
function (SWF) (see Lambert 1993 for an in depth survey).
More recently, researchers have begun examining the statistical
properties of these normative measures of inequality and
poverty so that they can be used for formal statistical inference.
A key development was the work of Beach and Davidson
(1983), who presented the variance-covariance matrix of a
vector of Lorenz curve (LC) ordinates without placing parametric restrictions on the form of the underlying income distribution. Subsequently, researchers have used the results in
Beach and Davidson to derive the sampling distribution of the
Gini-type inequality (Barrett and Pendakur 1995; Rongve and
Beach 1997; Davidson and Duclos 1997) and welfare indices
(Bishop, Chakraborti, and Thistle 1990) and poverty measures
(Xu and Osberg 1998), which can be expressed as functions of
LC ordinates or income quantiles. In a related strand of work,
Cowell (1989) and Thistle (1990) derived the large sample
distribution of the Atkinson and generalized entropy indices,
which are functions of the raw moments of a distribution.
A fundamental limitation of the current approach to deriving
the sampling distribution of the Gini indices is that the indices
are expressed as a function of a small, finite number of LC
ordinates rather than the complete set of ordinates implied by
an empirical distribution function. Consequently, the estimators of the indices are inconsistent. By considering only a small
number of ordinates, inequality within population quantiles is
ignored and an underestimate of the true level of inequality is
obtained. The inconsistency is greater the greater is the level of
aggregation into quantiles (the smaller the set of ordinates) and
the more unequal is the underlying income distribution. This
research embodies a compromise between using a desirable
measure of inequality and being able to undertake formal
statistical inference.
2009 American Statistical Association
Journal of Business & Economic Statistics
January 2009, Vol. 27, No. 1
DOI 10.1198/jbes.2009.0001
1
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
2
Journal of Business & Economic Statistics, January 2009
provide an alternative means for computing the variancecovariance matrix for a vector of LC and GLC ordinates. The
influence function approach has considerable computational
advantages for obtaining consistent estimates of the asymptotic
variances of the Gini-based indices. Further, our approach does
not require the aggregation of the population in a small set
of Lorenz ordinates for the computation of the indices. Our
asymptotic results show that the desirable normative properties
of quantile-based inequality, poverty, and welfare indices need
not be compromised to undertake inferential analysis. We
compare inference procedures based on our asymptotic results,
including the bootstrap-t method, with those for the grouped
Gini estimator and a variety of alternative bootstrap procedures
in a series of Monte Carlo experiments and in an empirical
application. The simple asymptotic methods are found to perform well, while, in line with theory, the bootstrap-t method,
which uses the asymptotic variance estimator to form a pivotal
test statistic, is found to have superior performance in small
samples.
The article is organized as follows. The next section briefly
reviews the set of inequality, poverty, and welfare indices that
can be written as functionals of the LC or GLC. In Section 3,
the sampling distribution of these measures is derived. Section
4 briefly summarizes the alternative approaches to inference
with the Gini indices based on asymptotic and bootstrap
resampling methods. The alternative approaches are compared
in a series of Monte Carlo experiments and their relative performance evaluated. In Section 5, the methods are illustrated
by examining changes in income inequality and poverty in the
United States between 1988 and 1998. Concluding comments
are provided in the final section.
2.
2.1
MEASURES OF INEQUALITY AND POVERTY
Preliminaries
The objective is to undertake statistical inference with the
normative indices of inequality and poverty, which are generalizations of the Gini coefficient. Let Y denote the random variable income, and let the population CDF be given
by F(y), which is assumed to be continuous and differentiable
to at least second order. Also let Q(p) ¼ F1(p) denote the
pth quantile of income. The LC for the distribution is then
given by
Z
1 QðpÞ
LðpÞ ¼
y:dFðyÞ
m 0
ð1Þ
Z
1 p
QðtÞdt;
¼
m 0
where m is the mean level of income. The LC represents the
proportion of total income going to the bottom p proportion of
the population. Closely related to the LC is the GLC,
GðpÞ ¼ pEðYjY # QðpÞÞ
Z QðpÞ
y:dFðyÞ
¼
0
¼
Z
0
p
QðtÞdt
ð2Þ
which is often used for welfare comparisons when income
distributions have unequal means (Shorrocks 1983). These
curves are popular tools in the literature on inequality and
welfare measurement, and, as shown subsequently, a variety of
indices of inequality and poverty can be represented as functionals of either the GLC or the LC.
2.2 S-Gini Indices
The S-Gini relative indices of inequality are given by
Z 1
ð1 pÞd2 LðpÞdðpÞ;
I dR ¼ 1 dðd 1Þ
0
ð3Þ
where 1 < d < ‘. The index is scale free (homogeneous of
degree 0 in y) and S-Concave (Blackorby and Donaldson,
1978). The ‘‘inequality aversion parameter’’ d is set by the
researcher. Higher values of d place progressively greater
social weight on transfers involving individuals ranked toward
the bottom of the distribution. The special case of d ¼ 2 corresponds to the popular Gini coefficient. A fuller discussion of
the properties of the S-Gini indices are given in Donaldson and
Weymark (1980, 1983), Kakwani (1980), and Yitzhaki (1983).
The S-Gini absolute index of inequality is given by
Z 1
I dA ¼ m dðd 1Þ
ð1 pÞd2 GðpÞdp:
ð4Þ
0
Absolute inequality indices are a function of absolute income
differentials rather than income shares and have been interpreted as measures of relative deprivation (Yitzhaki 1979).
The S-Gini inequality indices imply, and are implied by, the
same class of ordinally equivalent social welfare functions
(SWFs). The welfare index underlying the S-Gini inequality
indices can be represented in two ways using integration by
parts and (2)
Z 1
Z 1
W d ¼ dðd 1Þ ð1 pÞd2 GðpÞdp ¼ d ð1 pÞd1 QðpÞdp
0
0
ð5Þ
with the second being useful from the standpoint of computation (Section 3). It is easy to see that once one has obtained the
welfare index, the inequality indexes can be found using
I dA ¼ m W d and I dR ¼ ðm W d Þ=m: The special case of W d
when d ¼ 2 corresponds to the Sen welfare index. The sampling distribution of this particular index was analyzed by
Bishop et al. (1990): the results presented in Section 3.3 generalize their result to consistent estimators of all members (1 <
d < ‘) of the Gini family of welfare indices.
2.3 E-Gini Indices
Another set of generalized Gini indices are the E-Ginis
presented by Chakravarty (1988). The E-Gini relative index is
Z 1
a1
a
a
IR ¼ 2
ðp LðpÞÞ dp ;
ð6Þ
0
where a $ 1 is the inequality aversion parameter. As a
increases, the index is increasingly sensitive to transfers at the
lower end of the income distribution. When a ¼ 1, the index
Barrett and Donald: Statistical Inference with Generalized Gini Indices
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
corresponds to the Gini coefficient, and as a ! ‘, the index
approaches twice the Schutz index (the maximum distance
between the LC and line of equality).
The key normative distinction between the S-Gini and EGini indices is that only the E-Ginis (for a > 1) satisfy the
‘‘principle of diminishing transfers.’’ The S-Ginis are linear in
incomes, and the social weight attached to individuals is a
function of their rank in the ordered distribution. The effect of a
transfer on the level of measured inequality will be a function
of the ranks of the individuals involved in the transfer. However, the E-Ginis are a CES function of the difference between
the LC and the line of equality, and the effect of a transfer on
measured inequality will be sensitive to the differences in
individuals’ income shares (see Chakravarty, 1988, for a proof
and fuller discussion).
The E-Gini absolute index of inequality is given by
I aA
¼2
Z
0
1
a
ðm:p GðpÞÞ dp
a1
:
ð7Þ
Underlying the E-Gini relative and absolute inequality indices
is a common class of SWFs. One cardinalization of the E-Gini
welfare index is
0
1
Z 1
a1
a
W a ¼ 2@ m
ðm:p GðpÞÞ dp A:
ð8Þ
0
Conceptually, W a represents the level of income, which, if
distributed equally to all members of the population, generates
the same level of social welfare as the actual distribution. The
welfare index is equal to (twice) the mean scaled down by the
level of inequality.
3
W d ðyjzÞ ¼ EðYjY # zÞ
¼
when d ¼ 1, where 1(A) represents the indicator function,
which is equal to 1 when A is true (and 0 otherwise). Consequently, using (9), (10), and (11), the S-Gini poverty index is
Z 1
1
P d ¼ FðzÞ dðd 1Þ
ð1 pÞd2 GðpFðzÞÞdp ð12Þ
z
0
when d > 1, and
Pd ¼
Indices of poverty are closely related to measures of
inequality and welfare (Sen 1976). Following Blackorby and
Donaldson (1980), normative poverty indices can be expressed
as a composite function of the proportion of the population
below the poverty line (the head-count ratio), the average
income of the poor, and the level of income inequality among
the poor:
mðzÞð1 I R ðyjy < zÞÞ
P ¼ FðzÞ 1
z
ð9Þ
z W ðyjy < zÞ
;
¼ FðzÞ
z
where z is the income level representing the ‘‘poverty line,’’
IR(y|y < z) is the level of relative inequality among the poor, and
m(z) is the average income of the poor. Equivalently, the generalized Gini poverty indices can be expressed as a composite
function of the proportion of the population below the poverty
line (F(z)) and the normalized Gini welfare indices defined
over the poor W(y|z).
The S-Gini welfare index defined over the poor is given by
Z 1
1
W d ðyjzÞ ¼
dðd 1Þ
ð1 pÞd2 GðpFðzÞÞdp ð10Þ
FðzÞ
0
when d > 1, and
Eððz YÞ:1ðY # zÞÞ
z
ð13Þ
when d ¼ 1. The S-Gini poverty index in the case of d ¼ 1 is
equal to the head-count ratio times the average income shortfall
of the poor divided by the poverty line. This particular index
can be simply estimated using a scaled average, making estimation and inference straightforward. When d ¼ 2, P d corresponds to the well-known Sen index of poverty; the sampling
distribution of this specific index was analyzed by Bishop et al.
(1997) and Xu (2006).
When the welfare index underlying the E-Gini indices is
adopted, the conditional social welfare of the poor is given by
0
1
Z 1
a1
1
a
a
W ¼ 2@mðzÞ
ðmðzÞ:pFðzÞ GðpFðzÞÞÞ dp A
FðzÞ 0
ð14Þ
and, hence, the E-Gini poverty-based index is defined as
1
1
P ¼ FðzÞ GðFðzÞÞþ
z
z
a
2.4 Gini Poverty Indices
ð11Þ
EðY:1ðY # zÞÞ
FðzÞ
Z
0
1
a
ðpGðFðzÞÞGðpFðzÞÞÞ dp
a1
ð15Þ
where we have used the fact that G(F(z)) ¼ m(z)F(z), and a $ 1
parameterizes the aversion to income inequality among the
poor.
3.
3.1
VARIANCE ESTIMATION FOR THE EMPIRICAL
GENERALIZED GINI ESTIMATORS
Definition of Empirical Indices
Our basic approach uses the property that all of the indices
described previously are functionals of the LC or GLC and the
CDF at a particular point in the case of the poverty indices, and
we can consistently estimate these functionals by plugging in
empirical versions of the LC, GLC, and CDF. We assume that
we have a random sample fY i gNi¼1 drawn from an underlying
distribution with CDF F(y). We assume that F:[yl, yu] ! [0, 1],
where (i) 0 < yl < yu < ‘, (ii) F is twice continuously differentiable with density f (y) ¼ F9(y), and (iii) 0 < infy f (y) < supy
f (y) < ‘. The discussion of the empirical estimators and their
asymptotic standard errors in this section assumes an iid
random sample for simplicity. It is straightforward to allow
for independent but non-identically distributed samples by
constructing consistent, weighted estimates of the components,
which form the indices and the variances (Cameron and
4
Journal of Business & Economic Statistics, January 2009
Trivedi 2005). Indeed, most survey datasets used in distributional analysis, including the Current Population Survey
(CPS) data used in Section 5, are weighted samples, where the
weights often represent the inverse probability of selection
from the population. It is also possible to allow for more
complex sampling schemes in the estimation of the Gini
indices and asymptotic variances using the framework presented in Bhattacharya (2005). Biewen (2002) outlines how
bootstrap procedures may be adapted to replicate complex or
dependent sampling schemes to produce interval estimates and
hypothesis tests.
For the S-Gini indices, we have that
Z 1
d
^
^
I R ¼ 1 dðd 1Þ
ð16Þ
ð1 pÞd2 LðpÞdp
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
0
d
^ dðd 1Þ
I^A ¼ m
^ d ¼ dðd 1Þ
W
1
Z
0
Z
1
0
^
ð1 pÞd2 GðpÞdp
^
ð1 pÞd2 GðpÞdp;
^
^ and GðpÞ
where LðpÞ
are the empirical Lorenz and generalized
LCs, respectively. Given the definitions in (1) and (2), these
can be computed according to the ‘‘analogy principle’’ (Manski
1988) as
Z p
^
^
GðpÞ ¼
QðtÞdt
ð19Þ
0
ð20Þ
^
where FðyÞ
is the empirical distribution for the sample defined by
N
X
^ ¼1
FðyÞ
1ðY i # yÞ:
N i¼1
For the E-Gini indices, we have
a
I^R ¼ 2
Z
1
Z
1
a
I^A ¼ 2
0
0
0
a
^
ðp LðpÞÞ
dp
a1
a
^
ð^
m:p GðpÞÞ
dp
^ a ¼ 2@ m
W
^
Z
0
1
a
^
ð^
m:p GðpÞÞ
dp
ð22Þ
a1
1
A;
ð23Þ
while the Gini-based poverty indices can be written as
Z 1
1
d
^ F^ ðzÞÞdp for d > 1
^
^
P ¼ FðzÞ dðd 1Þ
ð1 pÞd2 Gðp
z
0
ð24Þ
N
1 X
d
P^ ¼
ðz Y i Þ:1ðY i # zÞÞ for d ¼ 1
Nz i¼1
ð25Þ
0
a
^ FðzÞÞ
^ FðzÞÞÞ
^
^
ðpGð
Gðp
dp
a1
ð26Þ
yl # y1 < y2 < < yN_ # yu ;
where N_ is the number of distinct values in the sample
with N_ < N if there are ties in the sample. The esti^
mator of F(y), the function FðyÞ,
is a step function with
increments,
p
^j ¼
N
1 X
1ðY i ¼ yj Þ;
N i¼1
^
occurring at each of the yj values. Therefore, the estimator FðyÞ
Pj
takes on the value l¼1 p
^ j ¼ p^j on the interval [yj, yjþ1). We
can then define the quantile function using similar arguments.
In terms of the yj, the quantile estimator is
^
QðpÞ
¼
N_
X
j¼1
1ð^
pj1 < p # p^j Þyj ;
^ 0 Þ [ p^0 [ 0: Thus, over the range of p,
where we define Fðy
where p 2 ð^
pj1 ; p^j (which has length p
^ j ), the quantile function takes on the value yj. For p # p
^ 1, the quantile function is
^1 < p # p
^1 þ p
^ 2 the quantile function is y2, and so
y1, for p
on.
Given the preceding representation, it is straightforward to
calculate the integrals that define the GLC and LC. In particular, for p such that p 2 ½^
pj1 ; p^j Þ, we have
ð21Þ
a1
1
In this section, we consider the issue of computing the
estimates of the inequality indices. A technical appendix to the
article, which is available on each author’s webpage, details
how one can compute the influence curves for all of the indices
considered in the article. Given our treatment of the indices as
statistical functionals of the GLC or LC, it is instructive to first
consider the computation of the GLC and LC. Denote the
ordered, distinct sample values for the Yi by the notation yj so
that
^
where m
^ ¼ Gð1Þ
¼ Y is the sample mean of the Yi and where
^ is the empirical quantile of the distribution of Yi and can be
QðtÞ
defined as
^
^
QðpÞ
¼ inffy : FðyÞ
$ pg;
Z
3.2 Computational Issues
ð17Þ
ð18Þ
^
^ ¼ GðpÞ ;
LðpÞ
m
^
1^ ^
1
a
^
GðFðzÞÞ þ
P^ ¼ FðzÞ
z
z
^
GðpÞ
¼ ðp p^j1 Þyj þ
¼ pyj þ aj ;
j1
X
p
^ l yl
l¼1
where
aj ¼
j1
X
l¼1
p
^ l ðyl yj Þ;
and defining a1 ¼ 0 so that when p < p^1 the function
^
^ is piecewise linear and continuous. For
GðpÞ
¼ py1 . Thus, GðpÞ
simplicity, we assume that yN_ ¼ yu so that by construction for
p ¼ 1 we have
^
Gð1Þ
¼
N_
X
j¼1
p
^ j yj ¼ m
^ ¼ Y;
:
Barrett and Donald: Statistical Inference with Generalized Gini Indices
5
which is the sample mean of the Yi. Given the estimator of the
GLC, it is then straightforward to see that
^
^ ¼ GðpÞ ;
LðpÞ
^
Gð1Þ
which is how we have defined our LC estimator previously.
^
^ ¼ 0 and Lð1Þ
^ ¼ 1: The LC estimate is
Note that Gð0Þ
¼ Lð0Þ
also piecewise linear and continuous on [0, 1].
In computing the S-Gini indices, it is most convenient to use the
representation in terms of quantiles. Use is made of the fact that
Z 1 X
N_ Z p^j
¼
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
0
j¼1
p^j1
where the relevant functions have been defined over the
intervals ð^
pj1 ; p^j : Using the result in (5) we can estimate the
S-Gini indices using the empirical quantiles, which gives an
^ d by
alternative way of calculating W
Z 1
^
^d ¼
dð1 pÞd1 QðpÞdp
W
0
¼
N_
X
j¼1
yj
Z
p^j
p^j1
dð1 pÞ
d1
dp
N_
n
o
X
¼
yj ð1 p^j1 ÞÞd ð1 p^j Þd :
j¼1
Then using (16)–(18) we can compute the S-Gini-based indices
as follows:
^d
W
d
^
IR ¼ 1
m
^
1
a
I^A ¼ 2½T^E a ;
a
I^
a
I^R ¼ A ;
m
^
and
a
^ a ¼ 2^
W
m I^A :
Computation of the Gini poverty indices is very similar to
d
that for the Gini inequality indices. The computation of P^ in
a
the case where d > 1 and P^ only will be considered, since the
d
case of P^ with d ¼ 1 is a simple average. For the S-Gini-based
index we can use a similar result to (5) to show that the
expression in (24) is equivalent to
^ Z 1
FðzÞ
d
^ FðzÞÞdp
^
^
^
dð1 pÞd1 Qðp
P ¼ FðzÞ
z
0
Z FðzÞ
^
1
^
^
^ pÞd1 QðpÞdp;
¼ FðzÞ
dðFðzÞ
^ d1 0
zFðzÞ
where the second line follows after a change of variables. To
estimate this quantity, we define N(z) to be the value for which
^ ¼ pNðzÞ , which is the number of observations that are less
FðzÞ
than or equal to the poverty line z. Then, similar to the estimation of the S-Gini inequality indices, we can compute the
poverty index by
d
^
P^ ¼ FðzÞ
0
To do this, we use the fact that over the interval ½^
pj1 ; p^j Þ
(
)
j1
X
^
p
^ l yl
m
^ :p GðpÞ
¼m
^ :p ðp p^j1 Þyj þ
l¼1
¼ ð^
m yj Þp þ
¼ bj p aj ;
j1
X
l¼1
p
^ l ðyj yl Þ
m yj Þ; which is a linear function of p. Therefore,
where bj ¼ ð^
N_
1 X
1
T^E ¼
½1ð^
m 6¼ yj Þ fðbj p^j aj Þaþ1 ðbj p^j1 aj Þaþ1 g
a þ 1 j¼1
bj
1ð^
m ¼ yj Þaj ð^
pj p^j1 Þ;
where the second term deals with cases where bj ¼ m
^ yj ¼ 0;
in which case
^
m
^ :p GðpÞ
¼ aj
so that the general integral does not apply. With these calculations, we can then compute the indices by
^ d1
zFðzÞ
NðzÞ n
o
X
^ p^j1 ÞÞd ðFðzÞ
^
yj ðFðzÞ
p^j Þd :
j¼1
Similar arguments for the E-Gini-based index suggest estimating the index in the form
d
^ d:
I^A ¼ m
^ W
Next, for the E-Gini indices, it is necessary to calculate the
quantity
Z 1
a
^
T^E ¼
ð^
m:p GðpÞÞ
dp:
1
^ ^
1
a
^ GðFðzÞÞ þ 1 T^Ez a ;
P^ ¼ FðzÞ
z
z
where, using a change of variable and arguments similar to
those used to derive T^E ,
z
T^E ¼
NðzÞ
1 X
1
½1ð^
mðzÞ 6¼ yj Þ fðbj p^j aj Þaþ1ðbj p^j1 aj Þaþ1 g
a þ 1 j¼1
bj
1ð^
mðzÞ ¼ yj Þaj ð^
pj p^j1 Þ
^ FðzÞÞ=
^
^
with m
^ ðzÞ ¼ Gð
FðzÞ.
3.3
Asymptotic Variance Estimation Using
Influence Functions
All of the objects defined previously, such as those in (16)–
(18) and (21)–(26), can be written in the following general form:
^
T^ ¼ c1 ^u1 þ c2 ^u2 þ TðHÞ;
where cj are scalar constants, ^uj are scalar estimators, and
^ is a scalar valued functional of some process
where TðHÞ
^
H that is defined on [0, 1]. For instance, in the case of the SGini, the absolute inequality index c1 ^
u1 is given by m
^, H^ is
^
given by the empirical generalized LC GðpÞ, and the functional
T is given by
Z 1
TðHÞ ¼ dðd 1Þ
ð1 pÞd2 HðpÞdp:
ð27Þ
0
6
Journal of Business & Economic Statistics, January 2009
^
The relevant scalar estimators for the other indexes are FðzÞ
^
^
^
and GðFðzÞÞ. The additional relevant processes are LðpÞ, p
^
^ FðzÞÞ,
^ FðzÞÞ
^ ^
^
^
^
LðpÞ,
m
^ p GðpÞ,
Gðp
and pGð
Fi1
R 1 GðpaFðzÞÞ.
nally, the other relevant functionals are 2½ 0 HðpÞ dpa and
scaled versions of this and the functional in (27).
There are two important elements in the derivation of
the variances. First, it is the case that for the scalars we can
write
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
pffiffiffiffi
1
N ð^
uj uj Þ ¼ pffiffiffiffi
N
N
X
i¼1
f i ð^
uj Þ þ op ð1Þ;
ð28Þ
uj Þ are often referred to as the
where the random variables fi ð^
influence function and give the effect of an observation on the
estimator. In many estimation problems, the fi ð^uj Þ are iid with
finite variance. Thus, the first term on the right-hand side of
(28) satisfies a central limit theorem and implies that the
asymptotic variance of the estimator is equal to the expectation
of the squared influence function. Hence, the asymptotic variance of the estimator can be estimated by taking the sample
average of the squared influence functions. For example, the
^ is given by
influence function for the estimator FðyÞ
fi ðy; FÞ ¼ 1ðY i # yÞ FðyÞ;
since
N
X
pffiffiffiffi
^ FðyÞÞ ¼ p1ffiffiffiffi
N ðFðyÞ
ð1ðY i # yÞ FðyÞÞ:
N i¼1
Due to the fact that the fi(y; F) are iid mean zero with variance
given by
Eðfi ðy; FÞ2 Þ ¼ FðyÞð1 FðyÞÞ;
pffiffiffiffi
^ FðyÞÞ
one can estimate the asymptotic variance of N ðFðyÞ
^
and the standard error of FðyÞ by using, respectively,
N
1 X
^ ðy; FÞ2
^ FðyÞÞ
^
Vð
¼
f
N i¼1 i
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ FðyÞÞ
^
Vð
^
;
s:e:ðFðyÞÞ
¼
N
^ i ðy; FÞ ¼ 1ðY i # yÞ FðyÞ
^
where f
is the estimated influence
function. In addition to their use in variance estimation,
influence functions have also been used in the study of robust
statistics (Reid, 2005).
Given that the influence function fi ð^
uj Þ terms in (28) are iid
and satisfy a central limit theorem, it follows that
pffiffiffiffi
d
ð29Þ
N ð^
uj uj Þ ! Nð0; Eðfi ðuj Þ2 Þ:
The expression in (28) also follows easily for m
^ , since
N
pffiffiffiffi
1 X
d
N ð^
m mÞ ¼ pffiffiffiffi
ðY i mÞ ! Nð0; VðY i ÞÞ;
N i¼1
^ FðzÞÞ
^
while a result for Gð
follows by application of the Delta
method for scalars. The form of the expression in (28) and
the result in (29) leads to a natural estimator of the asymp^ i ð^uj Þ2 , where
totic variance in (29) by taking the average of f
^ i ð^
f
uj Þ is an estimate of f i ð^
uj Þ.
The second key element in our derivation of asymptotic
variances for the indices is to extend the ideas in the previous
^ in general. Specifically, we
paragraph to functionals TðHÞ
show the steps in deriving a similar expansion to (28) for the
^ where H^ is now a function. We assume that
term TðHÞ,
pffiffiffiffi
N ðH^ HÞ0H
ð30Þ
for some Gaussian process H on [0, 1] that takes its values in a
function space H0 and that there exists an expansion such as in
(28) for each point p 2 [0, 1] so that
N
pffiffiffiffi
1 X
^ þ op ð1Þ;
^
fi ðp; HÞ
N ðHðpÞ
HðpÞÞ ¼ pffiffiffiffi
N i¼1
ð31Þ
^ are the iid influence functions. An implication
where fi ðp; HÞ
of this is that
pffiffiffiffi
d
^ 2 ÞÞ:
^
N ðHðpÞ
HðpÞÞ ! Nð0; Eðfi ðp; HÞ
^ we appeal to a
To derive similar results for the functional TðHÞ,
version of the Delta methods that is appropriate for functionals
and requires that the functional T is Hadamard differentiable at
F. The strict definition of Hadamard differentiability given by
Van der Vaart and Wellner (1994) is as follows.
Definition 1: The map T: H D[0, 1] ! V is Hadamard
differentiable at H tangentially to the set H0 if there is a continuous linear map TH0 : D ½0; 1 ! V such that
TðH þ tn hn Þ TðHÞ
TH0 ðhn Þ ! 0
tn
for all converging sequences tn ! 0, hn ! h 2 H0 with H þ tnhn 2 H.
In using this definition, we let H represent the set of functions to which H^ belongs. In the case of the LC process, for
instance, this is the space of continuous functions on [0, 1]. As
a practical matter, the Hadamard derivative of T(F) can be
found by the following calculation:
d
TH0 ðhÞ ¼ TðH þ thÞÞjt¼0 :
dt
Then once Hadamard differentiability is established, the results
in Fernholz (1983, Theorem 4.4.2; also Van der Vaart and
Wellner 1994, Theorem 3.9.4), imply that
!
N
X
pffiffiffiffi
1
^ þ op ð1Þ
^ TðHÞÞ ¼ TH0 pffiffiffiffi
f i ð:; HÞ
N ðTðHÞ
N i¼1
N
1 X
^ þ op ð1Þ
¼ pffiffiffiffi
TH0 f i ð:; HÞ
N i¼1
0TH0 ðHÞ;
where the second line follows by linearity of the functional TH0
^ denotes the function whose value at p is
and where fi ð:; HÞ
^ Linearity of TH0 and normality of H imply normality
fi ðp; HÞ.
of TH0 ðHÞ. Once we identify the functional TH0 and have the
^ we can then derive the influence
influence functions fi ðp; HÞ,
^ by calculating TH0 fi ð:; HÞ
^ . A technical
functions for TðHÞ
appendix available from the authors’ webpages shows how we
can use this basic method to derive results such as (30) and (31)
^
^ FðzÞÞ,
^
^
^
for the processes LðpÞ,
p LðpÞ,
m
^ :p GðpÞ,
Gðp
and
^ ¼ pGð
^ FðzÞÞ
^ FðzÞÞ
^
^
Y
Gðp
that appear as arguments in the
Barrett and Donald: Statistical Inference with Generalized Gini Indices
7
various indices. There we treat each process as a functional of
the underlying empirical distribution function for which it is
well known that results such as (30) and (31) hold.
^ we can then write
Combining the results for ^
uj and TðHÞ,
N
pffiffiffiffi
1 X
d
^ þ op ð1Þ !
^ 2 ÞÞ
N ðT^ TÞ ¼ pffiffiffiffi
fi ðTÞ
Nð0; Eðfi ðTÞ
N i¼1
^ ¼ c1 fi ð^
^
u1 Þ þ c2 fi ð^
u2 Þ þ T9H ðfi ð:; HÞÞ;
fi ðTÞ
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
^ are the influence functions for the index and
where fi ðTÞ
^ 2Þ
are iid with zero mean and finite variance given by Eðfi ðTÞ
^ Our variance estimator for the index is obtained taking
¼ VðTÞ.
^
the sample average of estimates of the fi ðTÞ,
^ i ðz; FÞ
^ i ðP^d Þ ¼ f
^ dðd 1Þ
f
z
Z 1
^ i ðz; FÞ
^ FðzÞÞ
^
^
3 ð1 pÞd2 ðpQðp
f
0
^
^
þ f i ðpFðzÞ;
GÞÞdp
and
2
^ i ðP^d Þ ¼ ðz Y i Þ 1ðY i # zÞÞ P^d ;
f
z
for d ¼ 1, while for the E-Gini-based index, we have,
^ i ðP^a Þ ¼ f
^ i ðz; FÞ
^
^ 1 f i ðFðzÞ;
^
f
GÞ
z
Z 1
1 z 1
^ FðzÞÞ
^
þ ½T^E a1
ðp:Gð
z
0
N
X
^ ðTÞ
^ 2:
^ TÞ
^ ¼1
f
Vð
N i¼1 i
Note that even though the indexes are constructed using all
the ordinates of the various functions, and not just a small
number of ordinates, there is no cost in terms of convergence
rates—all of the indexes are root-N consistent and asymptotically normal.
^ d . This
As an illustrative example, consider the index W
^ As shown in a technical
index is a linear functional of G:
appendix available from the authors’ web pages, G^ has a
pointwise influence function given by
^ i ðp; GÞ
^ ¼ ðpQðpÞ
^ GðpÞÞ
^
^
^ Y i Þ:
f
1ðY i < QðpÞÞð
QðpÞ
Since linear functionals are trivially Hadamard differentiable,
we can then write
pffiffiffiffi d
^ W dÞ
N ðW
!
Z 1
N
1 X
d2
^
^
pffiffiffiffi
fi ðp; GÞ dp þ op ð1Þ
¼ dðd 1Þ ð1 pÞ
N i¼1
0
Z 1
N
1 X
^ i ðp; GÞdp
^
¼ pffiffiffiffi
dðd 1Þ ð1 pÞd2 f
þ op ð1Þ;
N i¼1
0
so that we have the influence function given by
Z 1
^ i ðW
^ i ðp; GÞdp:
^
^ d Þ ¼ dðd 1Þ
f
ð1 pÞd2 f
a1 ^
^
^ FðzÞÞÞ
^
^
Gðp
YÞdp;
f i ðp; FðzÞ;
where
^ ¼ p z Qðp:
^ i ðp; FðzÞ;
^ i ðz; FÞ
^ FðzÞÞ
^
^
^
f
YÞ
f
^ i ðFðzÞ;
^ i ðp:FðzÞ;
^ f
^
^
^
þ p:f
GÞ
GÞ
Z 1
z
a
^ FðzÞÞ
^ FðzÞÞÞ
^
^
T^E ¼
ðpGð
Gðp
dp:
0
With these influence functions, the standard error for an
^ say, is given by
index, T,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
N
1 X
^ ðTÞ
^2
f
N i¼1 i
^ ¼
pffiffiffiffi
s:e:ðTÞ
;
N
and statistical inference can be performed using the asymptotic
normal distribution. The technical appendix provides more
details on the influence functions and provides arguments
justifying their use for estimating the variance.
4.
0
Similarly, we have
^ i ðW
^ i ðI^d Þ ¼ ðY i m
^ dÞ
^Þ f
f
A
d
^
^ i ðI^d Þ ¼ 1 f
^ i ðI^d Þ I R ðY i m
f
^ Þ:
R
A
m
^
m
^
For the E-Gini indices, the influence functions are given by
Z 1
a
a1 ^
a ^a 1a
^
^
^
ð^
m:p GðpÞÞ
ðfi ðp; GÞ
fi ðI A Þ ¼ 2 ðI A Þ
0
a
pðY i m
^ ÞÞdp
^ i ðW
^ i ðI^a Þ
^ Þ ¼ 2ðY i m
^Þ þ f
f
A
a
^
^ i ðI^a Þ ¼ 1 f
^ i ðI^a Þ I R ðY i m
f
^ Þ:
A
R
m
^
m
^
For the S-Gini poverty indices, we have for d > 1
4.1
MONTE CARLO COMPARISONS
Alternative Inference Procedures
In this section, we outline alternative approaches to inference with the Gini indices, and assess their relative performance in a series of Monte Carlo experiments. The different
approaches considered are based on asymptotic or bootstrap
methods. The objective of the Monte Carlo experiment is to
illuminate the size and power properties of the asymptotic
methods in small samples, as well as to provide an assessment
of their performance relative to alternative methods available in
the literature. The Monte Carlo comparisons use the CI
approach to hypothesis testing, similar to the approach adopted
by Biewen (2002) in studying the relative performance of
asymptotic and bootstrap approaches to inference with momentbased indices of inequality and poverty.
First, it is useful to define some notation and provide
a general outline of bootstrap resampling. As previously defined, Y ¼ fY i gNi¼1 represents the original sample drawn from
8
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
distribution F(y). Bootstrap computations are performed conditionally on Y. Let a random sample drawn with
replacement
of size N* from Y be denoted as Y ¼ fY i gNi¼1 ; and with this
random sample, we can calculate the Gini index I^ and the
associated asymptotic standard error s
^ . By repeatedly drawing
random samples from Y, say, B times, and calculating
fI^b ; s
^ b gBb¼1 , we can approximate the distribution of I^ in a
variety of ways. Through this process of ‘‘bootstrap resampling,’’ the distribution of the quantity of interest can be
simulated, and the approximations provide a method for conducting statistical inference. With this notation, we now present
the range of asymptotic and commonly-used bootstrap
approaches to inference based on the construction of CIs.
1. First-order asymptotic approximation to the sampling distribution of the Gini indices (ASE). As shown previously, generalized Gini indices are asymptotically normally distributed
with a variance that can be readily calculated using results for
their influence functions. The 100(1 – p)% CI is given by
h
p
p
i
I^ s
^ F1 1 ; I^ þ s
^ F1 1
;
2
2
where I^is the index, s
^ is the asymptotic standard error obtained
using the methods described previously, and F denotes the
standard normal CDF.
2. A simple application of the bootstrap is to substitute the
asymptotic standard error with the bootstrap standard error
(BSE). The corresponding CI is given by
h
p
p
i
I^ s
^ B F1 1 ; I^ þ s
^ B F1 1
;
2
2
where bootstrap resampling is used to obtain an estimate of the
sampling variation of the Gini index, with
!!2
B
B
X
1 X
1
2
I^
I^
s
^B ¼
B 1 b¼1 b
B k¼1 k
0
!2 1
B
B
X
1 @X
1
2
I^ A:
I^ B
¼
B 1 b¼1 b
B j¼1 j
This application of the bootstrap only requires the calculation
of the index with each resample (and not the asymptotic
standard error) and in general does not offer a refinement to the
asymptotic approximation.
3. The bootstrap-t (BOOT-T) approach is based on simulating the distribution of the ‘‘studentized’’ or t-test statistic
^ sb . This method combines the asymptotic vartb ¼ ðI^b IÞ=^
iance estimator derived using the influence functions with
bootstrap resampling. The CI for this implementation of the
bootstrap uses quantiles of the simulated distribution of the t
statistic and is given by
h
p
i
p
^
1
I^ s
^ G1
1
;
I
s
^
G
boott
boott
2
2
where Gboot–t is the simulated distribution of the pivotal
quantity tb*. By simulating the distribution of a pivotal statistic,
this method has superior theoretical properties to the asymptotic approach described previously, as well as to naive bootstrap methods, which we do not consider.
Journal of Business & Economic Statistics, January 2009
4. First-order asymptotic approximation to the sampling
distribution of the grouped Gini (GG) estimator. The GG
estimator, I^Gk , is asymptotically normal with associated
asymptotic standard error s
^ Gk based on k-population quantiles.
The corresponding 100(1 – p)% CI is given by
h
p
p
i
I^Gk s
^ Gk F1 1
^ Gk F1 1 ; I^Gk þ s
2
2
5. The iterated bootstrap CI is referred to as double bootstrap (DB). This method was applied to S-Gini and E-Gini
relative inequality indices in Xu (2000). The practical implementation of this method is discussed in Booth and Hall (1994)
and involves two levels of resampling (i.e., resampling from
a randomly drawn sample of the original sample). As shown
in Hall (1992), the second iteration allows one to obtain a
refined approximation to the distribution of statistics that are
not necessarily pivotal. Thus, Xu (2000) argues that one can
apply the method directly to the indexes so that it is no longer
necessary to estimate standard errors. The iterative nature of the
procedure can be quite costly in terms of computational time.
The simulations should shed light on the small sample performance of the different methods. In particular, we are interested in the relative performance of the two methods that allow
one to construct pivotal statistics (ASE and BSE) and in the
relative performance of the two methods that should provide a
refined approximation to the distributions of the statistics
(BOOT-T and DB). The simulations also enable one to gauge
the bias that one might expect from using grouped data (GG).
Any comparisons should also take into account computational
costs. In the case of the methods that involve bootstrapping,
this will clearly depend on the number of replications used, so
our simulations should shed light on the trade-off between cost
and accuracy. Based on our results, it does appear that for the
DB method to perform well one must use numbers of iterations
that make it the most computationally demanding.
4.2.
Monte Carlo Results
The various inference procedures for the generalized Gini
indices were used to construct CIs and test the hypotheses:
H0 : I ¼ I0
H 1 : I 6¼ I 0 ;
where I represents a specific generalized Gini index. The null is
rejected when the CI does not include I0. The experiments for
the inequality and welfare indices were conducted using sample sizes of 50 and 500 observations. The bootstrap procedures
are based on 200 resamples (and 100 second level resamples
for the double bootstrap) with 500 Monte Carlo iterations
performed. In the subsequent tables, the rejection rates of the
null hypothesis for tests with a nominal size of 5% are reported.
For the grouped Gini procedure, we consider aggregation of the
population into quintiles (k ¼ 5) and deciles (k ¼ 10). A range
of values of the inequality aversion parameter for the different
Gini classes of indices are considered to cover a broad range of
normative positions. Specifically, for the S-Gini inequality
indices, the parameter values of d ¼ (1.5, 2, 3, 5) are considered, and for the E-Gini inequality indices, the parameter
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
Barrett and Donald: Statistical Inference with Generalized Gini Indices
values of a ¼ (1, 2, 4, 6) are presented. For the Gini poverty
indices, the S-Gini parameter values of d ¼ (1, 2, 3, 5) and the
E-Gini parameter values of a ¼ (1, 2, 4, 6) are reported.
The design of each experiment attempts to mimic reality by
using a class of distributions with shapes similar to those that
have been found to work well in empirical studies of income
distributions. We used the lognormal distribution whereby the
data in each simulation was generated using Yi ¼ exp(sZ þ m),
where the Zi is a N(0, 1) random variable and (m, s) are parameters that are varied across the different experiments. The
value of the Gini indices under the null hypothesis were calculated by a drawing a random sample of 10 million observations.
Four different cases were considered, with the first two
examining the size properties of the inference procedures, and the
second two examining power properties. For case 1, the values of
the parameters were set to m1 ¼ 9.85 and s1 ¼ 0.6, which are the
moments of the lognormal estimated from the distribution of
gross individual-equivalent income using the March 1998 CPS.
In case 2, the parameter values are specified as m2 ¼ 6.4 and s2 ¼
0.5. Recent applied work on expenditure inequality has found
that the lognormal is a particularly good fit to the empirical
distribution of household expenditure data. These parameter
values correspond to the middle of the range of values reported
by Attanasio, Battistin, and Ichimura (2004) for the distribution
of per-capita nondurable expenditure based on the U.S. Consumer Expenditure Survey over the period 1984–1998.
In case 3, the Monte Carlo samples are generated using m3 ¼
9.85 and s3 ¼ 0.65 but with the false null values corresponding
to (m ¼ m1 ¼ 9.85, s ¼ s1 ¼ 0.60). The difference in the true
and null values approximates the observed change in the dispersion of U.S. gross individual equivalent income over 1980s.
For case 4, the parameters for the simulated data are set as m4 ¼
6.37 and s4 ¼ 0.48, while the false null values were calculated
assuming (m ¼ m2 ¼ 6.4 and s ¼ s2 ¼ 0.5). The difference
between the true and null values approximates the observed
change in the dispersion of real individual-equivalent nondurable consumption expenditure in the United States over
1980s and 1990s. In both cases 3 and 4, the rejection rates should
be well in excess of the nominal significance level for the tests.
For the poverty indices, the parameter values for the simulated data chosen for cases 1 and 3 were repeated, and the
poverty line was set at z ¼ 8,480, which corresponds to the U.S.
Census Bureau poverty threshold for 1998. With this specification, the poverty head-count ratio in the underlying population is 12.4%. Case 2 was conducted with m2 ¼ 6.4 and s2 ¼
1.0 and a poverty line set at z ¼ 300.9225, which is half the
median income of the population distribution. This experiment
attempts to mimic the per-capita expenditure distribution from
a developing country (Deaton 1997: p.157 for South Africa in
1993 based on World Bank Living Standard Measurement
Survey data). This specification generates a poverty head-count
ratio in the underlying distribution of 24.4%. Case 4 was
conducted with m ¼ 6.4 and s ¼ 0.95, which is comparable to
the magnitude of differences in the observed expenditure distribution between demographic groups in South Africa at a
point in time. The Monte Carlo experiments for the poverty
indices considered sample sizes of N ¼ 250 and N ¼ 1,000.
Table 1 reports the rejection rates for inference procedures
for the S-Gini and E–Gini relative indices of inequality, for
9
tests based on a nominal significance level of 5%. The pattern
of results for the Gini absolute inequality and welfare indices
are very similar to those found for the relative inequality indices.
Tables of results for the full set of inequality, welfare, and
poverty indices are available from the authors upon request.
In Table 1, the first panel reports the rejection rates for case 1
when the sample size is N ¼ 50, which is a very small sample in
the context of empirical studies of income distribution. The
actual size of almost all of the tests exceeded the nominal size
of 0.05. The actual size of each test procedure also tended to be
greater the lower the degree of inequality aversion of the
inequality index. Comparing across the methods, the extent of
over-rejection was greatest for the Grouped Gini procedures.
The poor performance of the GG procedure relative to the other
methods is due to the fact that the GG CI is centered around a
biased point estimate of generalized Gini index. The extent of
over-rejection for the GG procedure was greater the smaller the
number of population quantiles (k), reflecting greater aggregation bias. The ASE procedure performed reasonably well
given the small sample size, though it over-rejected more than
the bootstrap procedures. The performances of the BSE and DB
procedure were very similar, while the BOOT-T method clearly
exhibited superior size properties of all the methods examined.
This finding is consistent with theoretical results, which show
that the BOOT-T approach provides a refinement to the ASE
and other bootstrap procedures, which are not based on pivotal
statistics. When the sample size was increased to N ¼ 500, the
size properties of the consistent inference procedures improves
substantially. However, the GG procedure size characteristics
deteriorated due to the greater degree of aggregation bias (the
GG procedure with k ¼ 20 was also examined; while this
generally resulted in better size properties compared with GG
with k ¼ 10, the qualitative pattern of results and the ranking of
alternative inference procedures was not affected). The ASE
procedure performs well; however, the refinement provided by
the BOOT-T procedure resulted in superior size characteristics.
The case 2 experiment attempts to mimic the distribution of
consumption expenditure, and as a consequence, the degree of
inequality in the underlying population distribution is less than
that considered in case 1, which mimics the income distribution. The pattern of results for case 2 is very similar to that
found for case 1, except that the extent of over-rejection is
marginally less for all procedures. With the small sample size
of N ¼ 50, the ASE procedure performs well, the BSE and DB
perform somewhat better, and the BOOT-T is the most superior.
With the larger N ¼ 500 sample, the size performance of the
consistent inference procedures improved, and the overall
superiority of the BOOT-T procedure remains apparent. As
in case 1, the (inconsistent) GG procedure exhibits very poor
size characteristics. Overall, the Monte Carlo experiments of
cases 1 and 2 provide encouraging results for ASE procedure,
while the BOOT-T procedure has the most desirable size
characteristics.
The next series of experiments are designed to provide a
comparison of the power characteristics of the different inference procedures. The procedures were not corrected for size
distortions. In case 3 with N ¼ 50, all the procedures apart from
DB reject the false null at a rate greater than the nominal size of
the test. The power of the DB procedure improves when the
10
Journal of Business & Economic Statistics, January 2009
Table 1. Monte Carlo rejection rates for generalized Gini relative inequality indices, nominal size ¼ 0.05
N ¼ 50
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
ASE
BSE
BOOT-T
Case 1—size comparisons
S-Gini relative inequality indices
d
1.5
0.162
0.138
2
0.126
0.098
3
0.084
0.066
5
0.086
0.072
E-Gini relative inequality indices
a
1
0.126
0.098
2
0.104
0.078
4
0.092
0.070
6
0.086
0.070
Case 2—size comparison
S-Gini relative inequality indices
d
1.5
0.148
0.120
2
0.106
0.082
3
0.084
0.068
5
0.086
0.074
E-Gini Relative Inequality Indices
a
1
0.100
0.082
2
0.092
0.072
4
0.088
0.068
6
0.084
0.066
Case 3—power comparisons
S-Gini relative inequality indices
d
1.5
0.110
0.090
2
0.088
0.090
3
0.090
0.096
5
0.112
0.094
E-Gini relative inequality indices
a
1
0.088
0.090
2
0.080
0.082
4
0.076
0.078
6
0.076
0.078
Case 4—Power comparisons
S-Gini relative inequality indices
d
1.5
0.214
0.206
2
0.158
0.150
3
0.124
0.108
5
0.124
0.102
E-Gini relative inequality indices
a
1
0.158
0.150
2
0.150
0.138
4
0.138
0.124
6
0.130
0.116
N ¼ 500
DB
GG(10)
GG(5)
ASE
BSE
BOOT-T
DB
GG(10)
GG(5)
0.086
0.054
0.046
0.054
0.136
0.098
0.066
0.054
0.188
0.134
0.094
0.100
0.386
0.230
0.140
0.150
0.070
0.066
0.060
0.054
0.066
0.050
0.038
0.038
0.060
0.056
0.050
0.040
0.048
0.038
0.034
0.042
0.242
0.114
0.074
0.094
0.932
0.590
0.306
0.424
0.054
0.050
0.048
0.048
0.098
0.086
0.084
0.086
0.134
0.106
0.088
0.088
0.230
0.104
0.094
0.088
0.066
0.064
0.064
0.062
0.050
0.044
0.040
0.038
0.056
ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20
Statistical Inference with Generalized Gini Indices
of Inequality, Poverty, and Welfare
Garry F. Barrett & Stephen G. Donald
To cite this article: Garry F. Barrett & Stephen G. Donald (2009) Statistical Inference with
Generalized Gini Indices of Inequality, Poverty, and Welfare, Journal of Business & Economic
Statistics, 27:1, 1-17, DOI: 10.1198/jbes.2009.0001
To link to this article: http://dx.doi.org/10.1198/jbes.2009.0001
Published online: 01 Jan 2012.
Submit your article to this journal
Article views: 173
View related articles
Full Terms & Conditions of access and use can be found at
http://www.tandfonline.com/action/journalInformation?journalCode=ubes20
Download by: [Universitas Maritim Raja Ali Haji]
Date: 12 January 2016, At: 01:06
Statistical Inference with Generalized Gini
Indices of Inequality, Poverty, and Welfare
Garry F. BARRETT
School of Economics, University of New South Wales, Sydney, NSW, 2052 ([email protected])
Stephen G. DONALD
Department of Economics, University of Texas at Austin, Austin, TX, 78712 ([email protected])
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
This article considers statistical inference for consistent estimators of generalized Gini indices of inequality, poverty, and welfare. Our method does not require grouping the population into a fixed number
of quantiles. The empirical indices are shown to be asymptotically normally distributed using functional
limit theory. Easily computed asymptotic variance expressions are obtained using influence functions.
Inference based on first-order asymptotics is then compared with the grouped method and various
bootstrap methods in simulations and with U.S. income data. The bootstrap-t method based on our
asymptotic theory is found to have superior size and power properties in small samples.
KEY WORDS: Functional delta method; Generalized Gini index; Influence function.
1.
Important exceptions are the work of Cowell (1989) on the
traditional Gini coefficient and Bishop, Formby, and Zheng
(1997) and Xu (2006) on the ‘‘Sen poverty indices,’’ who use
results from the theory of U-statistics to derive the sampling
properties of consistent estimators of these indices. Zitikis and
Gastwirth (2002) and Zitikis (2003) consider an alternative
approach to the asymptotics of Gini inequality indices from
that pursued in Barrett and Donald (2000)—an earlier draft of
this article. Xu (2000) proposed a bootstrap method for testing
differences in the generalized Gini relative indices of
inequality. Since the Gini indices are not asymptotically pivotal, Xu (2000) considered a bias corrected, iterative variant of
the bootstrap. In this article, we derive asymptotic results using
influence functions that generalize the results derived by
Cowell (1989), Bishop et al. (1997), Zitikis and Gastwirth
(2002), and Zitikis (2003) to all members of the generalized
Gini class of inequality, poverty, and welfare indices.
In this article, we examine the asymptotic distribution of a
variety of inequality, poverty, and welfare measures that can be
written as statistical functionals of either the LC or generalized
Lorenz curve (GLC). We use results concerning the delta
method for statistical functionals to derive the asymptotic
properties of the generalized Gini indices. Our approach to
estimating standard errors, constructing confidence intervals
(CIs), and performing inference is, like that of Beach and
Davidson (1983), nonparametric since we impose no structure
on the underlying income distribution beyond weak regularity
conditions. Our approach to inference differs from that of
Beach and Davidson (1983) in that we use the influence
functions for the various objects. The use of the influence
function to study the statistical properties of estimators was
pioneered by Hampel (1974), and one of the first applications
in econometrics was by Cowell and Victoria-Feser (1996), who
examined the sensitivity of summary measures of inequality to
data contamination. As a by-product of our approach, we
INTRODUCTION
The work of Atkinson (1970), Kolm (1969), and Sen (1973)
generated new interest in the theory of inequality measurement.
Their contributions emphasized the duality between measures
of inequality (and poverty) and welfare, and led to a large literature proposing new indices of inequality explicitly derived
from the desired properties of the underlying social welfare
function (SWF) (see Lambert 1993 for an in depth survey).
More recently, researchers have begun examining the statistical
properties of these normative measures of inequality and
poverty so that they can be used for formal statistical inference.
A key development was the work of Beach and Davidson
(1983), who presented the variance-covariance matrix of a
vector of Lorenz curve (LC) ordinates without placing parametric restrictions on the form of the underlying income distribution. Subsequently, researchers have used the results in
Beach and Davidson to derive the sampling distribution of the
Gini-type inequality (Barrett and Pendakur 1995; Rongve and
Beach 1997; Davidson and Duclos 1997) and welfare indices
(Bishop, Chakraborti, and Thistle 1990) and poverty measures
(Xu and Osberg 1998), which can be expressed as functions of
LC ordinates or income quantiles. In a related strand of work,
Cowell (1989) and Thistle (1990) derived the large sample
distribution of the Atkinson and generalized entropy indices,
which are functions of the raw moments of a distribution.
A fundamental limitation of the current approach to deriving
the sampling distribution of the Gini indices is that the indices
are expressed as a function of a small, finite number of LC
ordinates rather than the complete set of ordinates implied by
an empirical distribution function. Consequently, the estimators of the indices are inconsistent. By considering only a small
number of ordinates, inequality within population quantiles is
ignored and an underestimate of the true level of inequality is
obtained. The inconsistency is greater the greater is the level of
aggregation into quantiles (the smaller the set of ordinates) and
the more unequal is the underlying income distribution. This
research embodies a compromise between using a desirable
measure of inequality and being able to undertake formal
statistical inference.
2009 American Statistical Association
Journal of Business & Economic Statistics
January 2009, Vol. 27, No. 1
DOI 10.1198/jbes.2009.0001
1
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
2
Journal of Business & Economic Statistics, January 2009
provide an alternative means for computing the variancecovariance matrix for a vector of LC and GLC ordinates. The
influence function approach has considerable computational
advantages for obtaining consistent estimates of the asymptotic
variances of the Gini-based indices. Further, our approach does
not require the aggregation of the population in a small set
of Lorenz ordinates for the computation of the indices. Our
asymptotic results show that the desirable normative properties
of quantile-based inequality, poverty, and welfare indices need
not be compromised to undertake inferential analysis. We
compare inference procedures based on our asymptotic results,
including the bootstrap-t method, with those for the grouped
Gini estimator and a variety of alternative bootstrap procedures
in a series of Monte Carlo experiments and in an empirical
application. The simple asymptotic methods are found to perform well, while, in line with theory, the bootstrap-t method,
which uses the asymptotic variance estimator to form a pivotal
test statistic, is found to have superior performance in small
samples.
The article is organized as follows. The next section briefly
reviews the set of inequality, poverty, and welfare indices that
can be written as functionals of the LC or GLC. In Section 3,
the sampling distribution of these measures is derived. Section
4 briefly summarizes the alternative approaches to inference
with the Gini indices based on asymptotic and bootstrap
resampling methods. The alternative approaches are compared
in a series of Monte Carlo experiments and their relative performance evaluated. In Section 5, the methods are illustrated
by examining changes in income inequality and poverty in the
United States between 1988 and 1998. Concluding comments
are provided in the final section.
2.
2.1
MEASURES OF INEQUALITY AND POVERTY
Preliminaries
The objective is to undertake statistical inference with the
normative indices of inequality and poverty, which are generalizations of the Gini coefficient. Let Y denote the random variable income, and let the population CDF be given
by F(y), which is assumed to be continuous and differentiable
to at least second order. Also let Q(p) ¼ F1(p) denote the
pth quantile of income. The LC for the distribution is then
given by
Z
1 QðpÞ
LðpÞ ¼
y:dFðyÞ
m 0
ð1Þ
Z
1 p
QðtÞdt;
¼
m 0
where m is the mean level of income. The LC represents the
proportion of total income going to the bottom p proportion of
the population. Closely related to the LC is the GLC,
GðpÞ ¼ pEðYjY # QðpÞÞ
Z QðpÞ
y:dFðyÞ
¼
0
¼
Z
0
p
QðtÞdt
ð2Þ
which is often used for welfare comparisons when income
distributions have unequal means (Shorrocks 1983). These
curves are popular tools in the literature on inequality and
welfare measurement, and, as shown subsequently, a variety of
indices of inequality and poverty can be represented as functionals of either the GLC or the LC.
2.2 S-Gini Indices
The S-Gini relative indices of inequality are given by
Z 1
ð1 pÞd2 LðpÞdðpÞ;
I dR ¼ 1 dðd 1Þ
0
ð3Þ
where 1 < d < ‘. The index is scale free (homogeneous of
degree 0 in y) and S-Concave (Blackorby and Donaldson,
1978). The ‘‘inequality aversion parameter’’ d is set by the
researcher. Higher values of d place progressively greater
social weight on transfers involving individuals ranked toward
the bottom of the distribution. The special case of d ¼ 2 corresponds to the popular Gini coefficient. A fuller discussion of
the properties of the S-Gini indices are given in Donaldson and
Weymark (1980, 1983), Kakwani (1980), and Yitzhaki (1983).
The S-Gini absolute index of inequality is given by
Z 1
I dA ¼ m dðd 1Þ
ð1 pÞd2 GðpÞdp:
ð4Þ
0
Absolute inequality indices are a function of absolute income
differentials rather than income shares and have been interpreted as measures of relative deprivation (Yitzhaki 1979).
The S-Gini inequality indices imply, and are implied by, the
same class of ordinally equivalent social welfare functions
(SWFs). The welfare index underlying the S-Gini inequality
indices can be represented in two ways using integration by
parts and (2)
Z 1
Z 1
W d ¼ dðd 1Þ ð1 pÞd2 GðpÞdp ¼ d ð1 pÞd1 QðpÞdp
0
0
ð5Þ
with the second being useful from the standpoint of computation (Section 3). It is easy to see that once one has obtained the
welfare index, the inequality indexes can be found using
I dA ¼ m W d and I dR ¼ ðm W d Þ=m: The special case of W d
when d ¼ 2 corresponds to the Sen welfare index. The sampling distribution of this particular index was analyzed by
Bishop et al. (1990): the results presented in Section 3.3 generalize their result to consistent estimators of all members (1 <
d < ‘) of the Gini family of welfare indices.
2.3 E-Gini Indices
Another set of generalized Gini indices are the E-Ginis
presented by Chakravarty (1988). The E-Gini relative index is
Z 1
a1
a
a
IR ¼ 2
ðp LðpÞÞ dp ;
ð6Þ
0
where a $ 1 is the inequality aversion parameter. As a
increases, the index is increasingly sensitive to transfers at the
lower end of the income distribution. When a ¼ 1, the index
Barrett and Donald: Statistical Inference with Generalized Gini Indices
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
corresponds to the Gini coefficient, and as a ! ‘, the index
approaches twice the Schutz index (the maximum distance
between the LC and line of equality).
The key normative distinction between the S-Gini and EGini indices is that only the E-Ginis (for a > 1) satisfy the
‘‘principle of diminishing transfers.’’ The S-Ginis are linear in
incomes, and the social weight attached to individuals is a
function of their rank in the ordered distribution. The effect of a
transfer on the level of measured inequality will be a function
of the ranks of the individuals involved in the transfer. However, the E-Ginis are a CES function of the difference between
the LC and the line of equality, and the effect of a transfer on
measured inequality will be sensitive to the differences in
individuals’ income shares (see Chakravarty, 1988, for a proof
and fuller discussion).
The E-Gini absolute index of inequality is given by
I aA
¼2
Z
0
1
a
ðm:p GðpÞÞ dp
a1
:
ð7Þ
Underlying the E-Gini relative and absolute inequality indices
is a common class of SWFs. One cardinalization of the E-Gini
welfare index is
0
1
Z 1
a1
a
W a ¼ 2@ m
ðm:p GðpÞÞ dp A:
ð8Þ
0
Conceptually, W a represents the level of income, which, if
distributed equally to all members of the population, generates
the same level of social welfare as the actual distribution. The
welfare index is equal to (twice) the mean scaled down by the
level of inequality.
3
W d ðyjzÞ ¼ EðYjY # zÞ
¼
when d ¼ 1, where 1(A) represents the indicator function,
which is equal to 1 when A is true (and 0 otherwise). Consequently, using (9), (10), and (11), the S-Gini poverty index is
Z 1
1
P d ¼ FðzÞ dðd 1Þ
ð1 pÞd2 GðpFðzÞÞdp ð12Þ
z
0
when d > 1, and
Pd ¼
Indices of poverty are closely related to measures of
inequality and welfare (Sen 1976). Following Blackorby and
Donaldson (1980), normative poverty indices can be expressed
as a composite function of the proportion of the population
below the poverty line (the head-count ratio), the average
income of the poor, and the level of income inequality among
the poor:
mðzÞð1 I R ðyjy < zÞÞ
P ¼ FðzÞ 1
z
ð9Þ
z W ðyjy < zÞ
;
¼ FðzÞ
z
where z is the income level representing the ‘‘poverty line,’’
IR(y|y < z) is the level of relative inequality among the poor, and
m(z) is the average income of the poor. Equivalently, the generalized Gini poverty indices can be expressed as a composite
function of the proportion of the population below the poverty
line (F(z)) and the normalized Gini welfare indices defined
over the poor W(y|z).
The S-Gini welfare index defined over the poor is given by
Z 1
1
W d ðyjzÞ ¼
dðd 1Þ
ð1 pÞd2 GðpFðzÞÞdp ð10Þ
FðzÞ
0
when d > 1, and
Eððz YÞ:1ðY # zÞÞ
z
ð13Þ
when d ¼ 1. The S-Gini poverty index in the case of d ¼ 1 is
equal to the head-count ratio times the average income shortfall
of the poor divided by the poverty line. This particular index
can be simply estimated using a scaled average, making estimation and inference straightforward. When d ¼ 2, P d corresponds to the well-known Sen index of poverty; the sampling
distribution of this specific index was analyzed by Bishop et al.
(1997) and Xu (2006).
When the welfare index underlying the E-Gini indices is
adopted, the conditional social welfare of the poor is given by
0
1
Z 1
a1
1
a
a
W ¼ 2@mðzÞ
ðmðzÞ:pFðzÞ GðpFðzÞÞÞ dp A
FðzÞ 0
ð14Þ
and, hence, the E-Gini poverty-based index is defined as
1
1
P ¼ FðzÞ GðFðzÞÞþ
z
z
a
2.4 Gini Poverty Indices
ð11Þ
EðY:1ðY # zÞÞ
FðzÞ
Z
0
1
a
ðpGðFðzÞÞGðpFðzÞÞÞ dp
a1
ð15Þ
where we have used the fact that G(F(z)) ¼ m(z)F(z), and a $ 1
parameterizes the aversion to income inequality among the
poor.
3.
3.1
VARIANCE ESTIMATION FOR THE EMPIRICAL
GENERALIZED GINI ESTIMATORS
Definition of Empirical Indices
Our basic approach uses the property that all of the indices
described previously are functionals of the LC or GLC and the
CDF at a particular point in the case of the poverty indices, and
we can consistently estimate these functionals by plugging in
empirical versions of the LC, GLC, and CDF. We assume that
we have a random sample fY i gNi¼1 drawn from an underlying
distribution with CDF F(y). We assume that F:[yl, yu] ! [0, 1],
where (i) 0 < yl < yu < ‘, (ii) F is twice continuously differentiable with density f (y) ¼ F9(y), and (iii) 0 < infy f (y) < supy
f (y) < ‘. The discussion of the empirical estimators and their
asymptotic standard errors in this section assumes an iid
random sample for simplicity. It is straightforward to allow
for independent but non-identically distributed samples by
constructing consistent, weighted estimates of the components,
which form the indices and the variances (Cameron and
4
Journal of Business & Economic Statistics, January 2009
Trivedi 2005). Indeed, most survey datasets used in distributional analysis, including the Current Population Survey
(CPS) data used in Section 5, are weighted samples, where the
weights often represent the inverse probability of selection
from the population. It is also possible to allow for more
complex sampling schemes in the estimation of the Gini
indices and asymptotic variances using the framework presented in Bhattacharya (2005). Biewen (2002) outlines how
bootstrap procedures may be adapted to replicate complex or
dependent sampling schemes to produce interval estimates and
hypothesis tests.
For the S-Gini indices, we have that
Z 1
d
^
^
I R ¼ 1 dðd 1Þ
ð16Þ
ð1 pÞd2 LðpÞdp
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
0
d
^ dðd 1Þ
I^A ¼ m
^ d ¼ dðd 1Þ
W
1
Z
0
Z
1
0
^
ð1 pÞd2 GðpÞdp
^
ð1 pÞd2 GðpÞdp;
^
^ and GðpÞ
where LðpÞ
are the empirical Lorenz and generalized
LCs, respectively. Given the definitions in (1) and (2), these
can be computed according to the ‘‘analogy principle’’ (Manski
1988) as
Z p
^
^
GðpÞ ¼
QðtÞdt
ð19Þ
0
ð20Þ
^
where FðyÞ
is the empirical distribution for the sample defined by
N
X
^ ¼1
FðyÞ
1ðY i # yÞ:
N i¼1
For the E-Gini indices, we have
a
I^R ¼ 2
Z
1
Z
1
a
I^A ¼ 2
0
0
0
a
^
ðp LðpÞÞ
dp
a1
a
^
ð^
m:p GðpÞÞ
dp
^ a ¼ 2@ m
W
^
Z
0
1
a
^
ð^
m:p GðpÞÞ
dp
ð22Þ
a1
1
A;
ð23Þ
while the Gini-based poverty indices can be written as
Z 1
1
d
^ F^ ðzÞÞdp for d > 1
^
^
P ¼ FðzÞ dðd 1Þ
ð1 pÞd2 Gðp
z
0
ð24Þ
N
1 X
d
P^ ¼
ðz Y i Þ:1ðY i # zÞÞ for d ¼ 1
Nz i¼1
ð25Þ
0
a
^ FðzÞÞ
^ FðzÞÞÞ
^
^
ðpGð
Gðp
dp
a1
ð26Þ
yl # y1 < y2 < < yN_ # yu ;
where N_ is the number of distinct values in the sample
with N_ < N if there are ties in the sample. The esti^
mator of F(y), the function FðyÞ,
is a step function with
increments,
p
^j ¼
N
1 X
1ðY i ¼ yj Þ;
N i¼1
^
occurring at each of the yj values. Therefore, the estimator FðyÞ
Pj
takes on the value l¼1 p
^ j ¼ p^j on the interval [yj, yjþ1). We
can then define the quantile function using similar arguments.
In terms of the yj, the quantile estimator is
^
QðpÞ
¼
N_
X
j¼1
1ð^
pj1 < p # p^j Þyj ;
^ 0 Þ [ p^0 [ 0: Thus, over the range of p,
where we define Fðy
where p 2 ð^
pj1 ; p^j (which has length p
^ j ), the quantile function takes on the value yj. For p # p
^ 1, the quantile function is
^1 < p # p
^1 þ p
^ 2 the quantile function is y2, and so
y1, for p
on.
Given the preceding representation, it is straightforward to
calculate the integrals that define the GLC and LC. In particular, for p such that p 2 ½^
pj1 ; p^j Þ, we have
ð21Þ
a1
1
In this section, we consider the issue of computing the
estimates of the inequality indices. A technical appendix to the
article, which is available on each author’s webpage, details
how one can compute the influence curves for all of the indices
considered in the article. Given our treatment of the indices as
statistical functionals of the GLC or LC, it is instructive to first
consider the computation of the GLC and LC. Denote the
ordered, distinct sample values for the Yi by the notation yj so
that
^
where m
^ ¼ Gð1Þ
¼ Y is the sample mean of the Yi and where
^ is the empirical quantile of the distribution of Yi and can be
QðtÞ
defined as
^
^
QðpÞ
¼ inffy : FðyÞ
$ pg;
Z
3.2 Computational Issues
ð17Þ
ð18Þ
^
^ ¼ GðpÞ ;
LðpÞ
m
^
1^ ^
1
a
^
GðFðzÞÞ þ
P^ ¼ FðzÞ
z
z
^
GðpÞ
¼ ðp p^j1 Þyj þ
¼ pyj þ aj ;
j1
X
p
^ l yl
l¼1
where
aj ¼
j1
X
l¼1
p
^ l ðyl yj Þ;
and defining a1 ¼ 0 so that when p < p^1 the function
^
^ is piecewise linear and continuous. For
GðpÞ
¼ py1 . Thus, GðpÞ
simplicity, we assume that yN_ ¼ yu so that by construction for
p ¼ 1 we have
^
Gð1Þ
¼
N_
X
j¼1
p
^ j yj ¼ m
^ ¼ Y;
:
Barrett and Donald: Statistical Inference with Generalized Gini Indices
5
which is the sample mean of the Yi. Given the estimator of the
GLC, it is then straightforward to see that
^
^ ¼ GðpÞ ;
LðpÞ
^
Gð1Þ
which is how we have defined our LC estimator previously.
^
^ ¼ 0 and Lð1Þ
^ ¼ 1: The LC estimate is
Note that Gð0Þ
¼ Lð0Þ
also piecewise linear and continuous on [0, 1].
In computing the S-Gini indices, it is most convenient to use the
representation in terms of quantiles. Use is made of the fact that
Z 1 X
N_ Z p^j
¼
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
0
j¼1
p^j1
where the relevant functions have been defined over the
intervals ð^
pj1 ; p^j : Using the result in (5) we can estimate the
S-Gini indices using the empirical quantiles, which gives an
^ d by
alternative way of calculating W
Z 1
^
^d ¼
dð1 pÞd1 QðpÞdp
W
0
¼
N_
X
j¼1
yj
Z
p^j
p^j1
dð1 pÞ
d1
dp
N_
n
o
X
¼
yj ð1 p^j1 ÞÞd ð1 p^j Þd :
j¼1
Then using (16)–(18) we can compute the S-Gini-based indices
as follows:
^d
W
d
^
IR ¼ 1
m
^
1
a
I^A ¼ 2½T^E a ;
a
I^
a
I^R ¼ A ;
m
^
and
a
^ a ¼ 2^
W
m I^A :
Computation of the Gini poverty indices is very similar to
d
that for the Gini inequality indices. The computation of P^ in
a
the case where d > 1 and P^ only will be considered, since the
d
case of P^ with d ¼ 1 is a simple average. For the S-Gini-based
index we can use a similar result to (5) to show that the
expression in (24) is equivalent to
^ Z 1
FðzÞ
d
^ FðzÞÞdp
^
^
^
dð1 pÞd1 Qðp
P ¼ FðzÞ
z
0
Z FðzÞ
^
1
^
^
^ pÞd1 QðpÞdp;
¼ FðzÞ
dðFðzÞ
^ d1 0
zFðzÞ
where the second line follows after a change of variables. To
estimate this quantity, we define N(z) to be the value for which
^ ¼ pNðzÞ , which is the number of observations that are less
FðzÞ
than or equal to the poverty line z. Then, similar to the estimation of the S-Gini inequality indices, we can compute the
poverty index by
d
^
P^ ¼ FðzÞ
0
To do this, we use the fact that over the interval ½^
pj1 ; p^j Þ
(
)
j1
X
^
p
^ l yl
m
^ :p GðpÞ
¼m
^ :p ðp p^j1 Þyj þ
l¼1
¼ ð^
m yj Þp þ
¼ bj p aj ;
j1
X
l¼1
p
^ l ðyj yl Þ
m yj Þ; which is a linear function of p. Therefore,
where bj ¼ ð^
N_
1 X
1
T^E ¼
½1ð^
m 6¼ yj Þ fðbj p^j aj Þaþ1 ðbj p^j1 aj Þaþ1 g
a þ 1 j¼1
bj
1ð^
m ¼ yj Þaj ð^
pj p^j1 Þ;
where the second term deals with cases where bj ¼ m
^ yj ¼ 0;
in which case
^
m
^ :p GðpÞ
¼ aj
so that the general integral does not apply. With these calculations, we can then compute the indices by
^ d1
zFðzÞ
NðzÞ n
o
X
^ p^j1 ÞÞd ðFðzÞ
^
yj ðFðzÞ
p^j Þd :
j¼1
Similar arguments for the E-Gini-based index suggest estimating the index in the form
d
^ d:
I^A ¼ m
^ W
Next, for the E-Gini indices, it is necessary to calculate the
quantity
Z 1
a
^
T^E ¼
ð^
m:p GðpÞÞ
dp:
1
^ ^
1
a
^ GðFðzÞÞ þ 1 T^Ez a ;
P^ ¼ FðzÞ
z
z
where, using a change of variable and arguments similar to
those used to derive T^E ,
z
T^E ¼
NðzÞ
1 X
1
½1ð^
mðzÞ 6¼ yj Þ fðbj p^j aj Þaþ1ðbj p^j1 aj Þaþ1 g
a þ 1 j¼1
bj
1ð^
mðzÞ ¼ yj Þaj ð^
pj p^j1 Þ
^ FðzÞÞ=
^
^
with m
^ ðzÞ ¼ Gð
FðzÞ.
3.3
Asymptotic Variance Estimation Using
Influence Functions
All of the objects defined previously, such as those in (16)–
(18) and (21)–(26), can be written in the following general form:
^
T^ ¼ c1 ^u1 þ c2 ^u2 þ TðHÞ;
where cj are scalar constants, ^uj are scalar estimators, and
^ is a scalar valued functional of some process
where TðHÞ
^
H that is defined on [0, 1]. For instance, in the case of the SGini, the absolute inequality index c1 ^
u1 is given by m
^, H^ is
^
given by the empirical generalized LC GðpÞ, and the functional
T is given by
Z 1
TðHÞ ¼ dðd 1Þ
ð1 pÞd2 HðpÞdp:
ð27Þ
0
6
Journal of Business & Economic Statistics, January 2009
^
The relevant scalar estimators for the other indexes are FðzÞ
^
^
^
and GðFðzÞÞ. The additional relevant processes are LðpÞ, p
^
^ FðzÞÞ,
^ FðzÞÞ
^ ^
^
^
^
LðpÞ,
m
^ p GðpÞ,
Gðp
and pGð
Fi1
R 1 GðpaFðzÞÞ.
nally, the other relevant functionals are 2½ 0 HðpÞ dpa and
scaled versions of this and the functional in (27).
There are two important elements in the derivation of
the variances. First, it is the case that for the scalars we can
write
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
pffiffiffiffi
1
N ð^
uj uj Þ ¼ pffiffiffiffi
N
N
X
i¼1
f i ð^
uj Þ þ op ð1Þ;
ð28Þ
uj Þ are often referred to as the
where the random variables fi ð^
influence function and give the effect of an observation on the
estimator. In many estimation problems, the fi ð^uj Þ are iid with
finite variance. Thus, the first term on the right-hand side of
(28) satisfies a central limit theorem and implies that the
asymptotic variance of the estimator is equal to the expectation
of the squared influence function. Hence, the asymptotic variance of the estimator can be estimated by taking the sample
average of the squared influence functions. For example, the
^ is given by
influence function for the estimator FðyÞ
fi ðy; FÞ ¼ 1ðY i # yÞ FðyÞ;
since
N
X
pffiffiffiffi
^ FðyÞÞ ¼ p1ffiffiffiffi
N ðFðyÞ
ð1ðY i # yÞ FðyÞÞ:
N i¼1
Due to the fact that the fi(y; F) are iid mean zero with variance
given by
Eðfi ðy; FÞ2 Þ ¼ FðyÞð1 FðyÞÞ;
pffiffiffiffi
^ FðyÞÞ
one can estimate the asymptotic variance of N ðFðyÞ
^
and the standard error of FðyÞ by using, respectively,
N
1 X
^ ðy; FÞ2
^ FðyÞÞ
^
Vð
¼
f
N i¼1 i
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
^ FðyÞÞ
^
Vð
^
;
s:e:ðFðyÞÞ
¼
N
^ i ðy; FÞ ¼ 1ðY i # yÞ FðyÞ
^
where f
is the estimated influence
function. In addition to their use in variance estimation,
influence functions have also been used in the study of robust
statistics (Reid, 2005).
Given that the influence function fi ð^
uj Þ terms in (28) are iid
and satisfy a central limit theorem, it follows that
pffiffiffiffi
d
ð29Þ
N ð^
uj uj Þ ! Nð0; Eðfi ðuj Þ2 Þ:
The expression in (28) also follows easily for m
^ , since
N
pffiffiffiffi
1 X
d
N ð^
m mÞ ¼ pffiffiffiffi
ðY i mÞ ! Nð0; VðY i ÞÞ;
N i¼1
^ FðzÞÞ
^
while a result for Gð
follows by application of the Delta
method for scalars. The form of the expression in (28) and
the result in (29) leads to a natural estimator of the asymp^ i ð^uj Þ2 , where
totic variance in (29) by taking the average of f
^ i ð^
f
uj Þ is an estimate of f i ð^
uj Þ.
The second key element in our derivation of asymptotic
variances for the indices is to extend the ideas in the previous
^ in general. Specifically, we
paragraph to functionals TðHÞ
show the steps in deriving a similar expansion to (28) for the
^ where H^ is now a function. We assume that
term TðHÞ,
pffiffiffiffi
N ðH^ HÞ0H
ð30Þ
for some Gaussian process H on [0, 1] that takes its values in a
function space H0 and that there exists an expansion such as in
(28) for each point p 2 [0, 1] so that
N
pffiffiffiffi
1 X
^ þ op ð1Þ;
^
fi ðp; HÞ
N ðHðpÞ
HðpÞÞ ¼ pffiffiffiffi
N i¼1
ð31Þ
^ are the iid influence functions. An implication
where fi ðp; HÞ
of this is that
pffiffiffiffi
d
^ 2 ÞÞ:
^
N ðHðpÞ
HðpÞÞ ! Nð0; Eðfi ðp; HÞ
^ we appeal to a
To derive similar results for the functional TðHÞ,
version of the Delta methods that is appropriate for functionals
and requires that the functional T is Hadamard differentiable at
F. The strict definition of Hadamard differentiability given by
Van der Vaart and Wellner (1994) is as follows.
Definition 1: The map T: H D[0, 1] ! V is Hadamard
differentiable at H tangentially to the set H0 if there is a continuous linear map TH0 : D ½0; 1 ! V such that
TðH þ tn hn Þ TðHÞ
TH0 ðhn Þ ! 0
tn
for all converging sequences tn ! 0, hn ! h 2 H0 with H þ tnhn 2 H.
In using this definition, we let H represent the set of functions to which H^ belongs. In the case of the LC process, for
instance, this is the space of continuous functions on [0, 1]. As
a practical matter, the Hadamard derivative of T(F) can be
found by the following calculation:
d
TH0 ðhÞ ¼ TðH þ thÞÞjt¼0 :
dt
Then once Hadamard differentiability is established, the results
in Fernholz (1983, Theorem 4.4.2; also Van der Vaart and
Wellner 1994, Theorem 3.9.4), imply that
!
N
X
pffiffiffiffi
1
^ þ op ð1Þ
^ TðHÞÞ ¼ TH0 pffiffiffiffi
f i ð:; HÞ
N ðTðHÞ
N i¼1
N
1 X
^ þ op ð1Þ
¼ pffiffiffiffi
TH0 f i ð:; HÞ
N i¼1
0TH0 ðHÞ;
where the second line follows by linearity of the functional TH0
^ denotes the function whose value at p is
and where fi ð:; HÞ
^ Linearity of TH0 and normality of H imply normality
fi ðp; HÞ.
of TH0 ðHÞ. Once we identify the functional TH0 and have the
^ we can then derive the influence
influence functions fi ðp; HÞ,
^ by calculating TH0 fi ð:; HÞ
^ . A technical
functions for TðHÞ
appendix available from the authors’ webpages shows how we
can use this basic method to derive results such as (30) and (31)
^
^ FðzÞÞ,
^
^
^
for the processes LðpÞ,
p LðpÞ,
m
^ :p GðpÞ,
Gðp
and
^ ¼ pGð
^ FðzÞÞ
^ FðzÞÞ
^
^
Y
Gðp
that appear as arguments in the
Barrett and Donald: Statistical Inference with Generalized Gini Indices
7
various indices. There we treat each process as a functional of
the underlying empirical distribution function for which it is
well known that results such as (30) and (31) hold.
^ we can then write
Combining the results for ^
uj and TðHÞ,
N
pffiffiffiffi
1 X
d
^ þ op ð1Þ !
^ 2 ÞÞ
N ðT^ TÞ ¼ pffiffiffiffi
fi ðTÞ
Nð0; Eðfi ðTÞ
N i¼1
^ ¼ c1 fi ð^
^
u1 Þ þ c2 fi ð^
u2 Þ þ T9H ðfi ð:; HÞÞ;
fi ðTÞ
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
^ are the influence functions for the index and
where fi ðTÞ
^ 2Þ
are iid with zero mean and finite variance given by Eðfi ðTÞ
^ Our variance estimator for the index is obtained taking
¼ VðTÞ.
^
the sample average of estimates of the fi ðTÞ,
^ i ðz; FÞ
^ i ðP^d Þ ¼ f
^ dðd 1Þ
f
z
Z 1
^ i ðz; FÞ
^ FðzÞÞ
^
^
3 ð1 pÞd2 ðpQðp
f
0
^
^
þ f i ðpFðzÞ;
GÞÞdp
and
2
^ i ðP^d Þ ¼ ðz Y i Þ 1ðY i # zÞÞ P^d ;
f
z
for d ¼ 1, while for the E-Gini-based index, we have,
^ i ðP^a Þ ¼ f
^ i ðz; FÞ
^
^ 1 f i ðFðzÞ;
^
f
GÞ
z
Z 1
1 z 1
^ FðzÞÞ
^
þ ½T^E a1
ðp:Gð
z
0
N
X
^ ðTÞ
^ 2:
^ TÞ
^ ¼1
f
Vð
N i¼1 i
Note that even though the indexes are constructed using all
the ordinates of the various functions, and not just a small
number of ordinates, there is no cost in terms of convergence
rates—all of the indexes are root-N consistent and asymptotically normal.
^ d . This
As an illustrative example, consider the index W
^ As shown in a technical
index is a linear functional of G:
appendix available from the authors’ web pages, G^ has a
pointwise influence function given by
^ i ðp; GÞ
^ ¼ ðpQðpÞ
^ GðpÞÞ
^
^
^ Y i Þ:
f
1ðY i < QðpÞÞð
QðpÞ
Since linear functionals are trivially Hadamard differentiable,
we can then write
pffiffiffiffi d
^ W dÞ
N ðW
!
Z 1
N
1 X
d2
^
^
pffiffiffiffi
fi ðp; GÞ dp þ op ð1Þ
¼ dðd 1Þ ð1 pÞ
N i¼1
0
Z 1
N
1 X
^ i ðp; GÞdp
^
¼ pffiffiffiffi
dðd 1Þ ð1 pÞd2 f
þ op ð1Þ;
N i¼1
0
so that we have the influence function given by
Z 1
^ i ðW
^ i ðp; GÞdp:
^
^ d Þ ¼ dðd 1Þ
f
ð1 pÞd2 f
a1 ^
^
^ FðzÞÞÞ
^
^
Gðp
YÞdp;
f i ðp; FðzÞ;
where
^ ¼ p z Qðp:
^ i ðp; FðzÞ;
^ i ðz; FÞ
^ FðzÞÞ
^
^
^
f
YÞ
f
^ i ðFðzÞ;
^ i ðp:FðzÞ;
^ f
^
^
^
þ p:f
GÞ
GÞ
Z 1
z
a
^ FðzÞÞ
^ FðzÞÞÞ
^
^
T^E ¼
ðpGð
Gðp
dp:
0
With these influence functions, the standard error for an
^ say, is given by
index, T,
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
N
1 X
^ ðTÞ
^2
f
N i¼1 i
^ ¼
pffiffiffiffi
s:e:ðTÞ
;
N
and statistical inference can be performed using the asymptotic
normal distribution. The technical appendix provides more
details on the influence functions and provides arguments
justifying their use for estimating the variance.
4.
0
Similarly, we have
^ i ðW
^ i ðI^d Þ ¼ ðY i m
^ dÞ
^Þ f
f
A
d
^
^ i ðI^d Þ ¼ 1 f
^ i ðI^d Þ I R ðY i m
f
^ Þ:
R
A
m
^
m
^
For the E-Gini indices, the influence functions are given by
Z 1
a
a1 ^
a ^a 1a
^
^
^
ð^
m:p GðpÞÞ
ðfi ðp; GÞ
fi ðI A Þ ¼ 2 ðI A Þ
0
a
pðY i m
^ ÞÞdp
^ i ðW
^ i ðI^a Þ
^ Þ ¼ 2ðY i m
^Þ þ f
f
A
a
^
^ i ðI^a Þ ¼ 1 f
^ i ðI^a Þ I R ðY i m
f
^ Þ:
A
R
m
^
m
^
For the S-Gini poverty indices, we have for d > 1
4.1
MONTE CARLO COMPARISONS
Alternative Inference Procedures
In this section, we outline alternative approaches to inference with the Gini indices, and assess their relative performance in a series of Monte Carlo experiments. The different
approaches considered are based on asymptotic or bootstrap
methods. The objective of the Monte Carlo experiment is to
illuminate the size and power properties of the asymptotic
methods in small samples, as well as to provide an assessment
of their performance relative to alternative methods available in
the literature. The Monte Carlo comparisons use the CI
approach to hypothesis testing, similar to the approach adopted
by Biewen (2002) in studying the relative performance of
asymptotic and bootstrap approaches to inference with momentbased indices of inequality and poverty.
First, it is useful to define some notation and provide
a general outline of bootstrap resampling. As previously defined, Y ¼ fY i gNi¼1 represents the original sample drawn from
8
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
distribution F(y). Bootstrap computations are performed conditionally on Y. Let a random sample drawn with
replacement
of size N* from Y be denoted as Y ¼ fY i gNi¼1 ; and with this
random sample, we can calculate the Gini index I^ and the
associated asymptotic standard error s
^ . By repeatedly drawing
random samples from Y, say, B times, and calculating
fI^b ; s
^ b gBb¼1 , we can approximate the distribution of I^ in a
variety of ways. Through this process of ‘‘bootstrap resampling,’’ the distribution of the quantity of interest can be
simulated, and the approximations provide a method for conducting statistical inference. With this notation, we now present
the range of asymptotic and commonly-used bootstrap
approaches to inference based on the construction of CIs.
1. First-order asymptotic approximation to the sampling distribution of the Gini indices (ASE). As shown previously, generalized Gini indices are asymptotically normally distributed
with a variance that can be readily calculated using results for
their influence functions. The 100(1 – p)% CI is given by
h
p
p
i
I^ s
^ F1 1 ; I^ þ s
^ F1 1
;
2
2
where I^is the index, s
^ is the asymptotic standard error obtained
using the methods described previously, and F denotes the
standard normal CDF.
2. A simple application of the bootstrap is to substitute the
asymptotic standard error with the bootstrap standard error
(BSE). The corresponding CI is given by
h
p
p
i
I^ s
^ B F1 1 ; I^ þ s
^ B F1 1
;
2
2
where bootstrap resampling is used to obtain an estimate of the
sampling variation of the Gini index, with
!!2
B
B
X
1 X
1
2
I^
I^
s
^B ¼
B 1 b¼1 b
B k¼1 k
0
!2 1
B
B
X
1 @X
1
2
I^ A:
I^ B
¼
B 1 b¼1 b
B j¼1 j
This application of the bootstrap only requires the calculation
of the index with each resample (and not the asymptotic
standard error) and in general does not offer a refinement to the
asymptotic approximation.
3. The bootstrap-t (BOOT-T) approach is based on simulating the distribution of the ‘‘studentized’’ or t-test statistic
^ sb . This method combines the asymptotic vartb ¼ ðI^b IÞ=^
iance estimator derived using the influence functions with
bootstrap resampling. The CI for this implementation of the
bootstrap uses quantiles of the simulated distribution of the t
statistic and is given by
h
p
i
p
^
1
I^ s
^ G1
1
;
I
s
^
G
boott
boott
2
2
where Gboot–t is the simulated distribution of the pivotal
quantity tb*. By simulating the distribution of a pivotal statistic,
this method has superior theoretical properties to the asymptotic approach described previously, as well as to naive bootstrap methods, which we do not consider.
Journal of Business & Economic Statistics, January 2009
4. First-order asymptotic approximation to the sampling
distribution of the grouped Gini (GG) estimator. The GG
estimator, I^Gk , is asymptotically normal with associated
asymptotic standard error s
^ Gk based on k-population quantiles.
The corresponding 100(1 – p)% CI is given by
h
p
p
i
I^Gk s
^ Gk F1 1
^ Gk F1 1 ; I^Gk þ s
2
2
5. The iterated bootstrap CI is referred to as double bootstrap (DB). This method was applied to S-Gini and E-Gini
relative inequality indices in Xu (2000). The practical implementation of this method is discussed in Booth and Hall (1994)
and involves two levels of resampling (i.e., resampling from
a randomly drawn sample of the original sample). As shown
in Hall (1992), the second iteration allows one to obtain a
refined approximation to the distribution of statistics that are
not necessarily pivotal. Thus, Xu (2000) argues that one can
apply the method directly to the indexes so that it is no longer
necessary to estimate standard errors. The iterative nature of the
procedure can be quite costly in terms of computational time.
The simulations should shed light on the small sample performance of the different methods. In particular, we are interested in the relative performance of the two methods that allow
one to construct pivotal statistics (ASE and BSE) and in the
relative performance of the two methods that should provide a
refined approximation to the distributions of the statistics
(BOOT-T and DB). The simulations also enable one to gauge
the bias that one might expect from using grouped data (GG).
Any comparisons should also take into account computational
costs. In the case of the methods that involve bootstrapping,
this will clearly depend on the number of replications used, so
our simulations should shed light on the trade-off between cost
and accuracy. Based on our results, it does appear that for the
DB method to perform well one must use numbers of iterations
that make it the most computationally demanding.
4.2.
Monte Carlo Results
The various inference procedures for the generalized Gini
indices were used to construct CIs and test the hypotheses:
H0 : I ¼ I0
H 1 : I 6¼ I 0 ;
where I represents a specific generalized Gini index. The null is
rejected when the CI does not include I0. The experiments for
the inequality and welfare indices were conducted using sample sizes of 50 and 500 observations. The bootstrap procedures
are based on 200 resamples (and 100 second level resamples
for the double bootstrap) with 500 Monte Carlo iterations
performed. In the subsequent tables, the rejection rates of the
null hypothesis for tests with a nominal size of 5% are reported.
For the grouped Gini procedure, we consider aggregation of the
population into quintiles (k ¼ 5) and deciles (k ¼ 10). A range
of values of the inequality aversion parameter for the different
Gini classes of indices are considered to cover a broad range of
normative positions. Specifically, for the S-Gini inequality
indices, the parameter values of d ¼ (1.5, 2, 3, 5) are considered, and for the E-Gini inequality indices, the parameter
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
Barrett and Donald: Statistical Inference with Generalized Gini Indices
values of a ¼ (1, 2, 4, 6) are presented. For the Gini poverty
indices, the S-Gini parameter values of d ¼ (1, 2, 3, 5) and the
E-Gini parameter values of a ¼ (1, 2, 4, 6) are reported.
The design of each experiment attempts to mimic reality by
using a class of distributions with shapes similar to those that
have been found to work well in empirical studies of income
distributions. We used the lognormal distribution whereby the
data in each simulation was generated using Yi ¼ exp(sZ þ m),
where the Zi is a N(0, 1) random variable and (m, s) are parameters that are varied across the different experiments. The
value of the Gini indices under the null hypothesis were calculated by a drawing a random sample of 10 million observations.
Four different cases were considered, with the first two
examining the size properties of the inference procedures, and the
second two examining power properties. For case 1, the values of
the parameters were set to m1 ¼ 9.85 and s1 ¼ 0.6, which are the
moments of the lognormal estimated from the distribution of
gross individual-equivalent income using the March 1998 CPS.
In case 2, the parameter values are specified as m2 ¼ 6.4 and s2 ¼
0.5. Recent applied work on expenditure inequality has found
that the lognormal is a particularly good fit to the empirical
distribution of household expenditure data. These parameter
values correspond to the middle of the range of values reported
by Attanasio, Battistin, and Ichimura (2004) for the distribution
of per-capita nondurable expenditure based on the U.S. Consumer Expenditure Survey over the period 1984–1998.
In case 3, the Monte Carlo samples are generated using m3 ¼
9.85 and s3 ¼ 0.65 but with the false null values corresponding
to (m ¼ m1 ¼ 9.85, s ¼ s1 ¼ 0.60). The difference in the true
and null values approximates the observed change in the dispersion of U.S. gross individual equivalent income over 1980s.
For case 4, the parameters for the simulated data are set as m4 ¼
6.37 and s4 ¼ 0.48, while the false null values were calculated
assuming (m ¼ m2 ¼ 6.4 and s ¼ s2 ¼ 0.5). The difference
between the true and null values approximates the observed
change in the dispersion of real individual-equivalent nondurable consumption expenditure in the United States over
1980s and 1990s. In both cases 3 and 4, the rejection rates should
be well in excess of the nominal significance level for the tests.
For the poverty indices, the parameter values for the simulated data chosen for cases 1 and 3 were repeated, and the
poverty line was set at z ¼ 8,480, which corresponds to the U.S.
Census Bureau poverty threshold for 1998. With this specification, the poverty head-count ratio in the underlying population is 12.4%. Case 2 was conducted with m2 ¼ 6.4 and s2 ¼
1.0 and a poverty line set at z ¼ 300.9225, which is half the
median income of the population distribution. This experiment
attempts to mimic the per-capita expenditure distribution from
a developing country (Deaton 1997: p.157 for South Africa in
1993 based on World Bank Living Standard Measurement
Survey data). This specification generates a poverty head-count
ratio in the underlying distribution of 24.4%. Case 4 was
conducted with m ¼ 6.4 and s ¼ 0.95, which is comparable to
the magnitude of differences in the observed expenditure distribution between demographic groups in South Africa at a
point in time. The Monte Carlo experiments for the poverty
indices considered sample sizes of N ¼ 250 and N ¼ 1,000.
Table 1 reports the rejection rates for inference procedures
for the S-Gini and E–Gini relative indices of inequality, for
9
tests based on a nominal significance level of 5%. The pattern
of results for the Gini absolute inequality and welfare indices
are very similar to those found for the relative inequality indices.
Tables of results for the full set of inequality, welfare, and
poverty indices are available from the authors upon request.
In Table 1, the first panel reports the rejection rates for case 1
when the sample size is N ¼ 50, which is a very small sample in
the context of empirical studies of income distribution. The
actual size of almost all of the tests exceeded the nominal size
of 0.05. The actual size of each test procedure also tended to be
greater the lower the degree of inequality aversion of the
inequality index. Comparing across the methods, the extent of
over-rejection was greatest for the Grouped Gini procedures.
The poor performance of the GG procedure relative to the other
methods is due to the fact that the GG CI is centered around a
biased point estimate of generalized Gini index. The extent of
over-rejection for the GG procedure was greater the smaller the
number of population quantiles (k), reflecting greater aggregation bias. The ASE procedure performed reasonably well
given the small sample size, though it over-rejected more than
the bootstrap procedures. The performances of the BSE and DB
procedure were very similar, while the BOOT-T method clearly
exhibited superior size properties of all the methods examined.
This finding is consistent with theoretical results, which show
that the BOOT-T approach provides a refinement to the ASE
and other bootstrap procedures, which are not based on pivotal
statistics. When the sample size was increased to N ¼ 500, the
size properties of the consistent inference procedures improves
substantially. However, the GG procedure size characteristics
deteriorated due to the greater degree of aggregation bias (the
GG procedure with k ¼ 20 was also examined; while this
generally resulted in better size properties compared with GG
with k ¼ 10, the qualitative pattern of results and the ranking of
alternative inference procedures was not affected). The ASE
procedure performs well; however, the refinement provided by
the BOOT-T procedure resulted in superior size characteristics.
The case 2 experiment attempts to mimic the distribution of
consumption expenditure, and as a consequence, the degree of
inequality in the underlying population distribution is less than
that considered in case 1, which mimics the income distribution. The pattern of results for case 2 is very similar to that
found for case 1, except that the extent of over-rejection is
marginally less for all procedures. With the small sample size
of N ¼ 50, the ASE procedure performs well, the BSE and DB
perform somewhat better, and the BOOT-T is the most superior.
With the larger N ¼ 500 sample, the size performance of the
consistent inference procedures improved, and the overall
superiority of the BOOT-T procedure remains apparent. As
in case 1, the (inconsistent) GG procedure exhibits very poor
size characteristics. Overall, the Monte Carlo experiments of
cases 1 and 2 provide encouraging results for ASE procedure,
while the BOOT-T procedure has the most desirable size
characteristics.
The next series of experiments are designed to provide a
comparison of the power characteristics of the different inference procedures. The procedures were not corrected for size
distortions. In case 3 with N ¼ 50, all the procedures apart from
DB reject the false null at a rate greater than the nominal size of
the test. The power of the DB procedure improves when the
10
Journal of Business & Economic Statistics, January 2009
Table 1. Monte Carlo rejection rates for generalized Gini relative inequality indices, nominal size ¼ 0.05
N ¼ 50
Downloaded by [Universitas Maritim Raja Ali Haji] at 01:06 12 January 2016
ASE
BSE
BOOT-T
Case 1—size comparisons
S-Gini relative inequality indices
d
1.5
0.162
0.138
2
0.126
0.098
3
0.084
0.066
5
0.086
0.072
E-Gini relative inequality indices
a
1
0.126
0.098
2
0.104
0.078
4
0.092
0.070
6
0.086
0.070
Case 2—size comparison
S-Gini relative inequality indices
d
1.5
0.148
0.120
2
0.106
0.082
3
0.084
0.068
5
0.086
0.074
E-Gini Relative Inequality Indices
a
1
0.100
0.082
2
0.092
0.072
4
0.088
0.068
6
0.084
0.066
Case 3—power comparisons
S-Gini relative inequality indices
d
1.5
0.110
0.090
2
0.088
0.090
3
0.090
0.096
5
0.112
0.094
E-Gini relative inequality indices
a
1
0.088
0.090
2
0.080
0.082
4
0.076
0.078
6
0.076
0.078
Case 4—Power comparisons
S-Gini relative inequality indices
d
1.5
0.214
0.206
2
0.158
0.150
3
0.124
0.108
5
0.124
0.102
E-Gini relative inequality indices
a
1
0.158
0.150
2
0.150
0.138
4
0.138
0.124
6
0.130
0.116
N ¼ 500
DB
GG(10)
GG(5)
ASE
BSE
BOOT-T
DB
GG(10)
GG(5)
0.086
0.054
0.046
0.054
0.136
0.098
0.066
0.054
0.188
0.134
0.094
0.100
0.386
0.230
0.140
0.150
0.070
0.066
0.060
0.054
0.066
0.050
0.038
0.038
0.060
0.056
0.050
0.040
0.048
0.038
0.034
0.042
0.242
0.114
0.074
0.094
0.932
0.590
0.306
0.424
0.054
0.050
0.048
0.048
0.098
0.086
0.084
0.086
0.134
0.106
0.088
0.088
0.230
0.104
0.094
0.088
0.066
0.064
0.064
0.062
0.050
0.044
0.040
0.038
0.056