Manajemen | Fakultas Ekonomi Universitas Maritim Raja Ali Haji 073500106000000233

(1)

Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ubes20

Download by: [Universitas Maritim Raja Ali Haji] Date: 12 January 2016, At: 23:07

Journal of Business & Economic Statistics

ISSN: 0735-0015 (Print) 1537-2707 (Online) Journal homepage: http://www.tandfonline.com/loi/ubes20

Calculating Comparable Statistics From

Incomparable Surveys, With an Application to

Poverty in India

Alessandro Tarozzi

To cite this article: Alessandro Tarozzi (2007) Calculating Comparable Statistics From

Incomparable Surveys, With an Application to Poverty in India, Journal of Business & Economic Statistics, 25:3, 314-336, DOI: 10.1198/073500106000000233

To link to this article: http://dx.doi.org/10.1198/073500106000000233

Published online: 01 Jan 2012.

Submit your article to this journal

Article views: 142

View related articles


(2)

Calculating Comparable Statistics From

Incomparable Surveys, With an

Application to Poverty in India

Alessandro TAROZZI

Department of Economics, Duke University, Durham, NC 27708 (taroz@econ.duke.edu)

Applied economists are often interested in studying trends in economic indicators such as inequality or poverty; however, comparisons over time can be made impossible by changes in data collection method-ology. We describe an easily implemented procedure, based on inverse probability weighting, that allows recovery of comparability of estimated parameters identified implicitly by a moment condition. The va-lidity of the procedure requires the existence of a set of auxiliary variables whose reports are not affected by the different survey design and whose relationship with the main variable of interest is stable over time. We analyze the asymptotic properties of the estimator when data belong to a stratified and clustered survey. The main empirical motivation of the article is provided by a recent controversy regarding the extent of poverty reduction in India in the 1990s. Due to changes in the expenditure questionnaire adopted for data collection in the 1999–2000 round of the Indian National Sample Survey, poverty is likely to be understated relative to previous rounds. We use previous waves of the same survey to provide evidence supporting the plausibility of the identifying assumptions and conclude that most, but not all, of the very large reduction in poverty implied by the official figures appears to be real, not a statistical artifact. KEY WORDS: India; Inequality; Method of moments; Missing data; Poverty; Survey methods. 1. INTRODUCTION

Applied economists and policy makers are often interested in studying changes over time in important economic indicators, such as inequality, poverty, and consumption measures. Such indicators are routinely calculated using data from household surveys. But whereas comparisons over time are meaningful only insofar as the necessary data are collected consistently across different surveys rounds, statistical agencies often intro-duce questionnaire changes that can raise doubts on their com-parability. In fact, the survey literature convincingly shows that questionnaire revisions can affect respondents’ reports in im-portant ways (for an overview see Deaton and Grosh 2000). For instance, retrospective reports on expenditure appear to be heavily influenced by the choice ofrecall period(say, 7 or 30 days before the interview), or by the level of disaggregation of the expenditure items whose consumption is being recorded. Hence, in calculating time-series of statistics from multiple cross-sections, it is important to detect whether changes in questionnaire design lead to noncomparability issues. If such issues arise, then it is important to identify tools that can help in recovering comparability, or else changes in economic in-dicators may end up reflecting revisions in the survey rather than real changes in the economic environment. In this article we describe a simple procedure, based on inverse probability weighting, that under specific conditions can be used to recover comparability of estimated parameters that are identified im-plicitly by a moment condition. The validity of the procedure requires the existence of a set of auxiliary variables whose re-ports are not affected by the different survey design, and whose relationship with the main variable of interest is stable over time.

The central empirical motivation of this article is the calcu-lation of poverty rates in India, which provides a recent and compelling example of the inconsistencies (and controversies) that may arise because of changes in questionnaire design. Be-cause India still accounts for a large proportion of the world’s

poor, the Indian “poverty numbers” are widely discussed by economists and policy makers both nationally and internation-ally, especially at the World Bank. Much discussion about In-dian poverty centers around the head count ratios, calculated as the proportion of the population living in households with expenditure per head below a given poverty line. Expenditure data are collected by the National Sample Survey Organization (NSSO) approximately every 5 years in “quinquennial” large-scale household surveys. The evolution of Indian poverty ac-quired even greater relevance during the 1990s because of the intellectual and political debates about the consequences of a wide spectrum of liberalizing policy reforms that began in 1991 (see Sachs, Varshney, and Bajpai 1999 for an overview of the reasons and nature of the reforms). Unfortunately, reaching a consensus has been complicated considerably by changes in the survey methodology introduced in the latest quinquennial survey (the 55th NSS round), completed in 1999–2000. The of-ficial head counts show an impressive decline in poverty with respect to the previous quinquennial NSS round (the 50th), car-ried out in 1993–1994: from 37% of the population to 27% in rural areas, and from 33% to 24% in urban areas. However, sev-eral researchers have suggested that at least part of this decline is likely a statistical artifact associated with the change in the set of recall periods used in the 55th NSS round to retrieve ex-penditure data. (For a survey of the whole debate on the Indian poverty numbers the interested reader is referred to Deaton and Kozel 2005.)

The simple adjustment procedure described in this article can be easily adapted to estimate poverty counts for India in 1999– 2000 that are comparable with those calculated with a uniform methodology from previous NSS rounds. The intuition underly-ing the adjustment is very simple. LetP(y<z)denote the head © 2007 American Statistical Association Journal of Business & Economic Statistics July 2007, Vol. 25, No. 3 DOI 10.1198/073500106000000233 314


(3)

count poverty ratio, that is, the fraction of the population whose expenditure per head,y, remains below the poverty linez. Sup-pose that a change in data collection methodology makes the latest available data onynoncomparable with analogous data collected in previous periods. Suppose also that the probabil-ity of being poor conditional on a set of auxiliary (proxy) vari-ablesvis stable over time, and that the auxiliary variables are recorded in a consistent way across both the standard and the revised survey methodologies. Then, because by the law of it-erated expectationsP(y<z)=E[P(y<z|v)], a “comparable” head count for the revised survey can be estimated by merg-ing information on the distribution of the proxy variables in the revised survey itself with information on the conditional proba-bility obtained from a “standard” survey. In other words, iden-tification requires the existence of a set of auxiliary variables whose reports are not affected by the change in survey design and whose ability to predict whether the respondent is poor or not is stable over time.

An analogous identification strategy has been used by Deaton and Drèze (2002) and Deaton (2003), who calculated adjusted poverty counts for India in 1999–2000 using a unique proxy variable, that is, the expenditure in a list of miscellaneous items whose reports were collected using anunchangedrecall period acrossallsurveys. The use of a single auxiliary variable allowed those authors to easily calculate revised poverty counts using fully nonparametric methods. For urban areas, their adjusted estimates were very similar to the official figures, but they found that about one-third of the officially measured decline in poverty in rural areas was a statistical artifact. In the empiri-cal section of this article we complement the results of Deaton and Drèze (2002) and Deaton (2003) along four dimensions. First, our simple parametric estimator allows for the inclusion of multiple auxiliary variables. Our results are similar to those of Deaton and Drèze (2002) and Deaton (2003), and provide further support for a large decline in poverty during the 1990s, even if the official poverty numbers do appear to overstate the extent of the reduction in poverty, especially in rural areas. Second, we use a set of recent experimental NSS expenditure surveys to present a battery of formal tests that provide indi-rect evidence in support of the validity of the assumptions re-quired for the identification of comparable poverty estimates. These experimental surveys are part of smaller (“thin”) NSS rounds completed in the time interval between the two latest quinquennial rounds and have been carried out by assigning randomly either a standard or a revised questionnaire to respon-dents. Third, we use the thin rounds to analyze the performance of the proposed estimator. In each thin round, poverty can be es-timated using either the standard questionnaires or our adjust-ment procedure to recover a “comparable” estimate from the revised questionnaires. If the estimator performs well, then the two poverty counts should differ only due to sampling error. Overall, the evidence suggests that our estimator is useful for recovering comparability. Finally, we calculate standard errors for the estimates, explicitly taking into account the stratified and clustered survey design of the NSS.

The application to head count poverty rates is just a special case of the methodology developed in the article. Our estima-tor can be adopted more generally to recover comparability of estimates of parameters that are implicitly identified by a mo-ment condition, as long as appropriate auxiliary variables exist.

We show that adjusted estimates can be obtained using a sim-ple two-step estimator, with the first step a parametric estimate of thepropensity score(in the sense of Rubin 1976), here in-terpreted as the probability that an observation belongs to the revised survey conditional on the auxiliary variables. We prove consistency and asymptotic normality of the estimator when data are from surveys whose designs involve stratification and clustering (as in the vast majority of household surveys), and we describe a consistent estimator for the variance.

The strong relationship between interview responses and questionnaire design has been part of the survey literature for decades (see, e.g., Mahalanobis and Sen 1954; Neter and Waksberg 1964). However, the analysis of its relevance in eco-nomics is more recent. Browning, Crossley, and Weber (2003) and Battistin, Miniaci, and Weber (2003) discussed the use of recall versus diary expenditure data for the estimation of ex-penditure, income, and savings in different household surveys. Battistin (2003) showed how different data collection method-ologies within the U.S. Consumer Expenditure Survey lead to very different conclusions when testing the permanent income hypothesis and when evaluating the evolution of inequality in consumption in the Unites States. This latter topic was also an-alyzed by Attanasio, Battistin, and Ichimura (2004). The conse-quences of changes in questionnaire design on the estimation of poverty and inequality in different countries have been studied by Gibson (1999), Gibson, Huang, and Rozelle (2001, 2003), Jolliffe (2001), and Lanjouw and Lanjouw (2001). Others have analyzed the effect of the design of expenditure surveys on the estimation of elasticities (Ghose and Bhattacharya 1995) and of economies of scale at the household level (Gibson 2002). How-ever, none of these authors has proposed tools for recovering comparability over time of statistics calculated using surveys of different designs.

The statistics literature has proposed Bayesian techniques to deal with certain comparability issues arising from changes in data collection methodology. Such issues are treated as spe-cial cases of missing-data problems, because some relevant variables as they would have been measured under the old stan-dards are not observed (for a recent survey on statistical analy-sis with missing data, see Little and Rubin 2002). Clogg, Rubin, Schenker, Schultz, and Weidman (1991) used multiple imputa-tion (MI) to recalibrate industry and occupaimputa-tion codes in 1970 U.S. Census public-use samples to the different 1980 standard. MI (Rubin 1978) is a procedure that replaces each missing value with two or more values imputed based on a missing-data model. Point estimates and standard errors of parameters of in-terest are then calculated combining the estimates from each completed dataset. (For a discussion and an extended bibliog-raphy, see Rubin 1996.) Each imputation is based on random draws from the posterior distribution of the coefficients of lo-gistic models that describe the probability, conditional on a set of proxy variables, that observations with a given 1970 code would have been recorded as belonging to a certain 1980 cate-gory if the new standards had been adopted. Identification is al-lowed by the presence of an auxiliary subsample in which both codes are recorded. The method of Clogg et al. (1991) is specif-ically designed to allow mapping categorical variables from an old classification system to a new one when the presence of categories with few or no observations causes maximum likeli-hood estimators to have existence or computational problems.


(4)

316 Journal of Business & Economic Statistics, July 2007 Although in the work of Clogg et al. (1991) the objects of

inter-est are the population frequencies of categorical variables, the methodology that we propose in the present article allows more generally the estimation of parameters identified by a moment condition. Also, our method is easier to implement, as it does not require the use of multiple draws from a posterior distri-bution, and it allows the simple calculation of standard errors robust to the presence of complex survey design.

MI has also been used to bridge the transition from single to multiple race reporting in the U.S. Census, after the op-tion of reporting one’s race using multiple categories (such as, say, black-Hispanic) was introduced in 1997 (Schenker and Parker 2003). Schenker (2003) calculated standard errors for the bridged estimates adapting a methodology developed by Schafer and Schenker (2000). Their methodology does not re-quire the availability of MIs, and it can be used for estima-tors that, with no missing data, can be calculated as smooth functions of means. The standard errors are calculated using first-order approximations to MI with an infinite number of im-putations. However, the results of Schafer and Schenker (2000) are derived under the assumption that observations are iid, and that the fraction of missing data is bounded away from one. Both conditions typically do not hold in situations such as the one considered in this article, where data come from household surveys with complex survey design and some of the variables of interest areneverobserved. Moreover, we do not require that the estimator be a smooth function of sample means, even if this assumption does hold in our empirical application.

The missing-data literature also includes several regression-based procedures in which the missing data are imputed as fit-ted values of a first-stage regression and then estimates from the resulting complete dataset are obtained with standard methods (see Little and Rubin 2002, chaps. 4 and 5, for an overview). But although imputation would be appropriate if the parame-ter of inparame-terest were, for instance, mean expenditure, inconsis-tent estimates would generally result if one were to estimate poverty or inequality measures. In fact, even if the first-stage regression model were perfectly specified—indeed, even if the regression coefficients were known—the second stage would estimate a feature of the distribution of thefitted values, not of expenditure itself. This would lead to understating inequality, whereas poverty would generally be under (over) estimated if the poverty line lay to the left (right) of the mode of the distrib-ution.

The rest of the article is organized as follows. Section 2 describes in more detail the empirical problem that motivates this article. Section 3 delineates the general econometric frame-work and discusses the estimator and its asymptotic properties. A small Monte Carlo simulation analyzes the performance of the estimator. Section 3 also describes how the general setting specializes to the empirical application, which is covered in Section 4. Section 5 concludes, discussing possible alternative applications of the results developed in this article, particularly in regard to the literature on nonclassical measurement error in nonlinear models and on small-area statistics.

2. THE EMPIRICAL FRAMEWORK: POVERTY IN INDIA

In this section we provide a brief overview of the main issues involved in the estimation of poverty indexes in India and of

the reasons and consequences of the noncomparability across surveys that represent the empirical motivation of our article. (For a more detailed account, see the collected papers in Deaton and Kozel 2005.)

For decades, the Planning Commission of the Government of India has regularly published “official” head count poverty ratios, calculated as the fraction of the population living in households with per-head consumption below a poverty line. The head counts are routinely presented separately for the rural and the urban “sectors.” The poverty lines have been estimated as the minimum monthly expenditure per head associated on average with a sector-specific minimum calorie intake, recom-mended by the Indian National Institute of Nutrition. Price changes are taken into account by inflating the lines with sector-specific price indexes: the Consumer Price Index for Agricul-tural Labourers (CPIAL) for rural areas and the Consumer Price Index for Industrial Workers (CPIIW) for the urban sector (for a more thorough discussion on the Indian poverty lines, see Gov-ernment of India 1993 or Deaton and Tarozzi 2005 who also proposed alternative price indexes to measure inflation).

The official head counts are calculated approximately every 5 years using expenditure data collected in large household sur-veys carried out by the Indian National Sample Survey Organi-zation (NSSO). Each NSS “round” is completed over a 1-year period (from July to June of the following year) and includes responses from approximately 120,000 households sampled in-dependently in each separate round; thus the NSS is not a true longitudinal panel. Each survey contains information on a wide spectrum of socioeconomic variables, but the largest section of the database comprises records of household consumption of a very detailed list of items.

Until the 50th round, carried out in 1993–1994, all NSS sur-veys adopted a 30-day recall period for all expenditure items. This choice of recall period is unusual; most statistical agen-cies use a shorter reporting period for items that are typi-cally purchased frequently, like food, and a longer period for more infrequent expenditures, such as clothing, footwear, ed-ucational expenses, and durables. Several experimental stud-ies find that expenditure reports for frequently purchased items are on average proportionally lower when the recall period be-comes longer (Scott and Amenuvegbe 1990; Deaton and Grosh 2000). The unconventional choice of recall period adopted by the NSSO was a result of a small-scale early experimental study by Mahalanobis and Sen (1954), who found that reports based on a 7-day recall period for a list of staples were too high. 2.1 The Thin Rounds

The uniform 30-day recall period became more controver-sial in the early 1990s, especially after the first quinquennial NSS round of the decade (the 50th round, completed in 1993– 1994) showed little poverty reduction with respect to the pre-vious quinquennial round (the 43rd, 1987–1988), especially in rural areas. This result stood in seeming contrast with National Accounts figures showing rapid growth in consumption (but see Sen 2000).

To explore the consequences of a possible move toward more standard recall periods, the NSSO designed a series of experi-ments within the smaller (“thin”) NSS rounds that followed the


(5)

Table 1. Recall periods by round and item category

51st–54th rounds (“thin” rounds)a 50th round (1993–1994)

and previous Schedule 1 Schedule 2 55th round

(standard) (standard) (experimental) (1999–2000)

Food and other high-frequency itemsb 30 days 30 days 7 days 7 and 30 dayse

Miscellaneous itemsc 30 days 30 days 30 days 30 days

Durables and other low-frequency itemsd 30 days 30 days 365 days 365 days

aOnly one of the two schedule types was randomly assigned to each sampled household. bIncludes food, beverages, tobacco, and intoxicants.

cIncludes fuel and light, miscellaneous goods and services, rents and consumer taxes, and certain medical expenses. dIncludes footwear and clothing, durables, education, and institutional medical expenses.

eEach respondent was asked to report expenditure with both recall periods, and the responses were recorded into two parallel columns, printed next to each other in the questionnaire.

1993–1994 survey (rounds 51–54). Even if the thin rounds were not specifically designed for poverty monitoring—with doubts remaining as to the comparability of their sampling frames— each wave included an expenditure questionnaire as detailed as those adopted in the larger quinquennial rounds. In each thin round, the NSSO randomly assigned to all households in a given primary stage unit one of two questionnaire types. The first questionnaire type (Schedule 1) was the standard one, with a 30-day recall period for all items. The second type (Sched-ule 2) included instead a 7-day recall for food, beverages, and a few other items generally bought frequently, and a 365-day recall for durables, clothing, footwear, and some other low-frequency purchases. Even in Schedule 2, however, the 30-day recall was maintained for a list of items that included fuel and light, miscellaneous goods and services, rents and consumer taxes, and certain medical expenses. Table 1 summarizes the recall period used for each item category in all NSS rounds rel-evant for this article, that is, rounds 50–55.

Because the schedule type was assigned completely at ran-dom, any systematic difference in estimates between the two subsamples can be attributed to the different questionnaires as-signed to each. Consistent with previous findings in the survey literature, the results of the thin rounds showed significantly higher reported expenditure in food when the short 7-day re-call period of Schedule 2 was used rather than the standard 30-day recall of Schedule 1. For most durables, the longer (1 year) recall period of Schedule 2 led instead to lower mean expenditure than the standard 30-day recall (Sen 2000; Deaton 2001). Overall, given the large fraction of the budget spent on average in food, the net effect in all rounds and in both sectors was a larger estimate of total per capita expenditure (pcehereinafter) when the experimental Schedule 2 was used. Table 2 contains summary statistics calculated from the thin rounds. The figures were calculated using only the major In-dian states, which account for more than 95% of the total popu-lation: Andhra Pradesh, Assam, Bihar, Gujarat, Haryana (urban sector only), Jammu and Kashmir, Karnataka, Kerala, Madhya Pradesh, Maharashtra, Orissa, Punjab, Rajasthan, Tamil Nadu, Uttar Pradesh, West Bengal, and Delhi. In all surveys and in both sectors, average total pce was 10–20% higher for households in the experimental group. In all but one case, the differences were significant even using a 1% level. The only ex-ception was the urban sector in the 54th round, where the null was rejected using a 5% level. Row 6 shows that, keeping the

poverty line constant, one can “achieve” a 50% drop in poverty simply by changing the survey methodology.

Several findings emerge from the figures in rows 3–5, which refer to expenditures in those items for which a 30-day recall was maintained in both schedule types (“30-day items” here-inafter). First, mean expenditure in 30-day items differed much less between the two schedules than mean totalpce, as the ra-tio of means ranged from .93 to 1.09. In all but two cases the differences were not significant at standard levels. The two ex-ceptions were the rural sector in round 51 and the urban sec-tor in round 52, but even in these cases the null of equality of means cannot be rejected using a 1% significance level. Note also that even in these smaller surveys the sample sizes were considerable, so that the tests of equality of means have large power even when the alternative is close to the null. Overall, the evidence suggests that expenditure reports on 30-day items were only marginally affected by the different reports for food and durables in the two schedule types. Note also that expen-diture in 30-day items is a good predictor of totalpce, because it accounts for a large share of total budget. A quartic polyno-mial of (log)pcein 30-day items explains 50–60% of the total variation in (log)pcein rural areas and about 70–80% in urban areas. These two observations are important for the implemen-tation of our methodology, because they provide some prelim-inary evidence in support of using expenditure in 30-day items as a proxy variable in calculating comparable poverty estimates for the NSS round 55. But a close examination of the figures in row 3 reveals that in the rural sector mean expenditure in 30-day items was systematically higher (even if only slightly) when computed using 30-day recall data, with the opposite pat-tern seen in urban areas. This empirical regularity is likely re-lated to differences in consumption patterns and in household characteristics between the two sectors. Indeed, the survey lit-erature shows that the cognitive processes adopted to remember expenditure in a given item are associated with the characteris-tics of both the item and the respondent (see part II of Deaton and Grosh 2000). This suggests that it might be important to in-clude household characteristics among the auxiliary variables. Many of the household characteristics reported in the NSS, in-cluding household size, education, and land holdings, should also be useful predictors, and their reports are unlikely to be affected by changes in the recall periods adopted in an expen-diture survey.


(6)

318

Jour

nal

of

Business

&

Economic

Statistics

,

J

uly

2007

Table 2. Summary statistics, Indian National Sample Survey, rounds 51–54

NSS 51 NSS 52 NSS 53 NSS 54

(July 1994–June 1995) (July 1995–June 1996) (January–December 1997) (January–June 1998)

Rural Urban Rural Urban Rural Urban Rural Urban

Schedule (questionnaire type)a S1 S2 S1 S2 S1 S2 S1 S2 S1 S2 S1 S2 S1 S2 S1 S2

Sample size (no. 0 households) 13,606 13,415 9,283 9,214 12,253 12,047 8,870 8,749 12,313 9,214 16,418 10,555 8,676 8,545 2,946 2,911

Mean totalpce 273.4 310.5 462 546 279 328 495 560 294 328 469 547.4 266 316 457 521

(13.2) (3.5) (18.8) (20.1) (3.9) (3.4) (10.8) (7.5) (4.59) (4.53) (7.55) (8.50) (3.0) (2.8) (14.4) (21.4)

Ratio S2/S1 1.14 1.18 1.18 1.13 1.12 1.17 1.19 1.14

t-ratio (H0:S1=S2) 2.73 3.06 9.53 4.93 5.29 6.92 12.37 2.46

Meanpcein 30-day items 54.4 50.8 131.4 133.4 57.9 56.6 130.4 138.0 61.9 58.9 133.1 139.7 58.0 57.3 131.8 143.2 (1.5) (.8) (7.8) (7.6) (2.5) (1.1) (2.3) (2.8) (1.12) (1.31) (3.61) (3.37) (.8) (.8) (4.7) (17.8)

Ratio S2/S1 0.93 1.02 .98 1.06 .95 1.05 .99 1.09

t-ratio (H0:S1=S2) 2.12 .18 .48 2.10 1.74 1.34 .56 .62

Mean budget share 20.1 16.0 26.3 21.7 20.3 16.4 26.1 22.4 21.1 17.3 27.3 23.4 21.6 17.4 28.3 23.0 of 30-day itemsb (.22) (.17) (.40) (.34) (.16) (.14) (.17) (.18) (.23) (.22) (.21) (.21) (.14) (.12) (.39) (.33) R2OLS regression of (ln) total

pceon (ln)pcein 30-day itemsc .615 .533 .760 .787 .567 .585 .703 .742 .606 .592 .721 .743 .612 .606 .732 .761 Head count poverty ratio 41.8 22.7 36.3 18.5 38.2 18.4 30.7 15.4 35.7 21.1 33.1 17.5 41.8 22.4 35.3 21.3 Source: Author’s computations from the NSS. Robust standard errors are in parentheses. All values are in 1993–1994 rupees. The deflators are state-specific CPIIW for urban sector and CPIAL for rural sector. For NSS 54, we use sector-specific deflators. Only the major Indian states are included. All statistics are weighted using inflation factors. The poverty counts are the proportion of individuals living in households where per capita expenditure (pce) is below the poverty line. The real poverty lines are the official ones for all India published by the Planning Commission for 1993–1994 (Rs 205.7 for the rural sector, and Rs 283.4 for the urban sector).

aS1 is the standard questionnaire, with a 30-day reference period for all items. S2 is the experimental questionnaire (see Table 1 for details). bThe mean budget shares are averages of household-specific ratios between expenditure in 30-day items and total expenditure.

cTheR2is calculated from a regression of (log) total monthlypceon a polynomial of degree four of (log) monthlypcein 30-day items.


(7)

2.2 The 55th Round

The poverty estimates resulting from the four thin rounds did not help in reaching a consensus on poverty trends in India dur-ing the 1990s. In fact, the rapid GDP growth measured in the National Accounts was not reflected by poverty declines. The figures in row 6 of Table 2 show no apparent trend in poverty reduction during the period, irrespective of the schedule format. However, the thin rounds were not specifically designed as ex-penditure surveys. The relatively small samples, coupled with the choice of sampling frames more suited to the different main purposes of these surveys, caused many observers to view these poverty figures with some suspicion and to wait for the next quinquennial expenditure survey—the 55th wave of the NSS, which was completed between July 1999 and June 2000.

But the questionnaire format adopted in the 55th round was different from any other used in previous NSS surveys, com-bining both sets of recall periods used in the thin rounds (see Table 1). The new questionnaire asked all households to re-port expenditure in food and other frequently purchased items with both a 30-day and a 7-day recall, and adopted a 365-day recall for durables and other infrequent purchases. As in the thin rounds, a 30-day recall was maintained for the miscella-neous items with intermediate purchase frequency. Of the two sets of reports for high-frequency purchases, only the 30-day recall data were used by the Indian Planning Commission to calculate the official poverty counts. The results showed an impressive reduction in poverty in comparison to the early 1990s; in rural areas the head counts dropped from 37.2% in 1993–1994 to 27.1% 6 years later, whereas in urban areas the head counts dropped from 32.6% to 23.6% in the same period. But the changes in the questionnaire cast serious doubts on the comparability of the more recent figures with previous poverty estimates, especially when considering the results of the thin experimental rounds.

On the one hand, the thin rounds showed that reports on durables are on average lower when a 1-year recall period is used, so that the new questionnaire would overstate poverty. At the same time, more respondents reported some expenditure in durables, with the result that the corresponding distribution is much more spread out when the shorter recall period is used. Keeping the average report constant, this would cause the op-posite result of lower poverty estimates when the new ques-tionnaire is used. The two conflicting effects combine with the fact that durables typically account for a small share of the total budget, especially among poor households, making it unlikely that important comparability issues arise as a consequence. On the other hand, the new questionnaire recorded the two sepa-rate reports on food expenditure in two parallel columns printed next to one another. Thus this format could be expected to have prompted the respondents (or the interviewers) to reconcile the two different reports. So consumption of food reported with the traditional 30-day recall period would be disproportionately high (because the respondent would tend to avoid large discrep-ancies with the 7-day reports, which are typically higher) and/or the corresponding reports based on a 7-day recall would be disproportionately low (by a symmetric argument). The plau-sibility of this argument is strengthened by the fact that in the 55th round, averagepcein food as estimated with a 7-day recall

exceeded the corresponding figure calculated using the 30-day recall by about 6%, whereas in all of the thin rounds the gap was consistently above 30%. Because for most Indian house-holds food accounts for a very large share of the total budget, these arguments lead to the expectation that the unadjusted (of-ficial) figures overstate total expenditure, and thus understate poverty. Consequently, a methodology to estimate poverty rates comparable with previous approaches is needed. The next sec-tion describes the general econometric problem and explains how the specific empirical application (which we develop fully in Sec. 4) fits into the general framework.

3. THE MODEL AND THE ESTIMATOR

The population sampled using a revised methodology is referred to as the target population, and the one sampled us-ing a standard questionnaire as theauxiliarypopulation. Target and auxiliary surveys are defined analogously. In the empiri-cal application, NSS round 55 is the target survey, whereas we use different previous rounds as auxiliary surveys. Let Dbe a binary variable equal to 1 when an observation is drawn from the target population. Throughout the article, bold type denotes vectors and matrices and the superscript “′” indicates transpo-sition. All vectors are defined in column form.

The researcher is interested in estimating a parameterφ0in a target population, whereφ0satisfies the following population moment condition:

E[ng(y;φ0)|D=1] =0, (1) whereg(·)is a moment function,nis household size, andyis the main variable of interest as measured in a standard question-naire. In poverty or inequality measurement, ytypically mea-sures expenditure or income per head. The population moment condition (1) refers explicitly to the common situation in which the parameter of interest is defined in terms of individuals but data are sampled at the household level. If the sampling unit is the same as the unit in terms of which φ0 is defined, then all of the results that follow can be obtained as a straight-forward special case with n=1. In (1) we abstract from is-sues of intrahousehold allocation of resources, so that each individual within a household is treated equally. (See Deaton 1997, chap. 4, for an overview of the issues involved in wel-fare evaluation when household scale economies and equiv-alence scales are taken into account.) The moment condition (1) encompasses a broad set of commonly used poverty and inequality measures. (For an introduction to the theory and practice of poverty measurement, see Deaton 1997, chap. 3, or Ravallion 1992.) For example, ifφ0represents a Foster–Greer– Thorbecke poverty index and z is a fixed poverty line, then g(y;φ0)=1(y<z)(1zy)α−φ0,whereα≥0,and1(E)is an indicator equal to 1 when eventEis true. Whenα=0,the index becomes the head count poverty ratio, whereasα=1 character-izes the poverty gap ratio. A higher parameterαindicates that large poverty gaps (1−y/z)are given a larger weight in the calculation, so that the poverty index becomes more sensitive to the distribution ofyamong the poor. Equation (1) also iden-tifies well-known inequality measures, such as the Variance of the Logarithms, ifg(y;φ0)= [(lny−φ02)2−φ01 lny−φ02]′, where φ0= [φ01 φ02]′, or the Theil index, withg(y0)=

y02log y

φ02−φ01 y−φ02]

.


(8)

320 Journal of Business & Economic Statistics, July 2007

3.1 Identification

Estimation of the parameter φ0 through the sample analog of (1) is clearly infeasible if the target survey does not include data on y. This is precisely the case if the survey question-naire changed in such a way that the respondents’ reports are no longer comparable with those from previous surveys, so that the researcher observes only a different variable, y, but not˜ y. In our empirical setting,yis total expenditure per head when a 30-day recall is used for all items, and y˜ is the expenditure observed when a revised questionnaire is adopted.

Let v denote a set of auxiliary variables, as recorded by a standard methodology, and letv denote the same variables as measured using a revised methodology. The set v will in-clude variables that can be used as proxies for the unob-servedy. Each observation is then characterized by the set of variables(y,y˜,v,v,D),but the econometrician observes only either(y,v),whenD=0, or(y˜,v),ifD=1. This makes clear that the parameter φ0 in (1) is not identified by the sampling process without further assumptions. The following proposition formally describes the fundamental conditions for identification that we assume throughout the article.

Proposition 1. Suppose that there exist a set of auxiliary variables, v, that include household size nand are distributed according todP(v), and assume that the following conditions hold: (A1)dP(v|D=1)=dP(v|D=1)a.s.; (A2)E[g(y;φ0)| v,D = 1] = E[g(y;φ0)|v,D = 0] a.s.; (A3) dP(v|D = 1) is absolutely continuous with respect to dP(v|D=0), and (A4) supp(v|D=1)supp(v|D=0). Then φ0 satisfies the following modified population moment condition:

EnR(v)g(y;φ0)|D=0

=0, (2) whereR(v)is thereweighting function, defined as

R(v)=dP(v|D=1) dP(v|D=0)=

P(D=1|v)P(D=0)

P(D=0|v)P(D=1). (3) Moreover,R(v)is nonparametrically identified by the sampling process.

For the proof see Appendix A.

The last two assumptions, (A3) and (A4), ensure that the reweighting functionR(v)exists and is bounded for each value of v. [Note that even if (A4) does not hold, one can still es-timate bounds for the parameter of interest, treating observa-tions with v outside the common support as missing values, using the setting described in Horowitz and Manski 1995.] Assumption (A1) requires the existence of a set of proxy vari-ables v whose marginal distribution is identified by the sam-pling process in both the auxiliary and target populations. In other words, v should include variables whose distribution of reports is left unaffected by the change in survey design (note, however, that we do not require that v=v). Variables that satisfy (A1) are likely to be available in most empir-ical settings, as questionnaire revisions generally leave sev-eral questions unchanged. In our empirical application, sevsev-eral household characteristics—such as household size, education, or main economic activity—should be good candidates for in-clusion inv. Similarly, (A1) may be satisfied by reported ex-penditure in items for which the 30-day recall was retained in

all questionnaire types. Assumption (A2) requires that the con-ditional expectation of the functiong(y;φ0)is the same in the target and the auxiliary surveys. This is clearly a crucial and substantive assumption whose credibility depends on the empir-ical context and should always be carefully scrutinized. When φ0is a poverty count, (A2) amounts to assuming that the frac-tion of households to be counted as poor condifrac-tional onv re-mains constant across the two surveys. In Section 4 we devote considerable space to probe the plausibility of both (A1) and (A2) in our specific empirical setting.

The reweighting functionR(v)in (3) transforms the condi-tional expectation ofE[ng(y;φ0)|v]from the auxiliary survey into the unconditional expectation in the target survey, down-weighting (up-down-weighting) households whose auxiliary variables have a relatively high (low) density in the auxiliary survey. The conditional probability,P(D=1|v), in (3) can be interpreted as the probability that a household belongs to the target population conditional on observingv, if the household is sampled from a population that encompasses both the target and auxiliary pop-ulations. The other probabilities are defined accordingly. Note that the functionR(v)is identified by the sampling process even if the researcher only observesvˆ=Dv+(1−D)v. Indeed, in Appendix A we prove that the assumptions of Proposition 1 are sufficient to ensure thatP(D=1|ˆv)=P(D=1|v)a.s.

This form ofinverse probability weighting (IPW) has been used extensively in several settings in statistics and economet-rics. (For a textbook treatment of IPW, see Wooldridge 2001.) Horvitz and Thompson (1952) introduced weighing to account for the nonconstant probability of selection of different observa-tions within a sample (see also Wooldridge 2002a). Several con-tributions in the program evaluation literature use IPW-based estimators for the estimation of mean treatment effects, under the assumption that thepropensity score—that is, the probabil-ity that an individual participates to a program conditional on observed covariates—is the same for treated and untreated in-dividuals (see, e.g., Hahn 1998; Heckman, Ichimura, and Todd 1998; Hirano, Imbens, and Ridder 2003; Abadie 2005). Several authors have used IPW for estimation with missing data, un-der the assumption that the probability of an observation being missing depends only on some observed covariates (see, e.g., Robins, Rotnitzky, and Zhao 1994, 1995; Wooldridge 2002b). Chen, Hong, and Tarozzi (2005b) studied IPW estimators in nonlinear models of nonclassical measurement error when aux-iliary data are available.

In some cases, a researcher may be interested in recover-ing an estimate of the whole distribution of a variabley(e.g., income or expenditure per head) that is comparable with es-timates obtained before a questionnaire change took place. It is relatively straightforward to describe sufficient conditions analogous to those in Proposition 1 for identifying the density f(y|D=1). Clearly,f(y|D=1)would also identify poverty and inequality measures based ony. As in moment condition (1), we proceed under the assumption that the object of interest is the density of aper capitaquantityy, but the data are collected at thehouseholdlevel. If everyone within the household is as-sumed to be treated equally, then the individual-based popula-tion density ofy, denoted byfn(y|D=1), is described by the following expression:

fn(y|D=1)=

E[nf(y|n,D=1)] E[n|D=1] ,


(9)

wheref(y|n,D=1)is the density defined overhouseholds[it is easy to check thatfn(y|D=1)actually integrates to 1]. The following proposition formalizes conditions for identification.

Proposition 2. Suppose that there exist a set of auxiliary variables,v, including household sizen, distributed according todP(v), such that (A1), (A3), and (A4) hold. Letvnbe a vec-tor of observed variables including all of the variables inv ex-ceptn. Suppose also that (A2b)f(y|v,D=1)=f(y|v,D=0). Then

f(y|n,D=1)=f(y|n,D=0)E[R(vn)|y,n,D=0], (4) where the reweighting function is now defined as

R(vn)=

P(D=1|v)P(D=0|n) P(D=0|v)P(D=1|n). For the proof see Appendix A.

We do not proceed to analyze the estimation of (4), because we are interested mainly in estimating the parameters identified by a moment condition such as that described in (1). (For an example of reweighting in the context of density estimation see DiNardo, Fortin and Lemieux 1996.)

3.2 Estimation

The modified moment condition (2) suggests that the para-meter of interest, φ0, can be estimated using a two-step pro-cedure. In the first step, the unknown probabilities P(D= 1|v) and P(D = 0) are estimated [indeed, the estimation may be further simplified noting that φ0 is identified even if P(D=0)/P(D=1)is dropped from the modified moment con-dition (2)]. In the second step,φˆ0is calculated as the solution of the sample analog of (2), after replacingR(v)with its estimate from the first step. In what follows, we proceed assuming that the sample is drawn from a “superpopulation” that encompasses both the target and the auxiliary population. ThenP(D=0) de-notes the proportion of households belonging to the auxiliary population, whereasP(D=1|v)represents the probability that a household with covariatesvsampled at random from the su-perpopulation belongs to the target population. We emphasize that both probabilities refer to the distribution of households, not individuals, and so must be estimated without inflating ob-servations by household size.

The unconditional probability P(D=0), which we denote by θ00, can be easily estimated as the fraction of households in the sample that belongs to the auxiliary population. For es-timating P(D=1|v), we assume that the conditional proba-bility is correctly described by a known parametric model, so thatP(D=1|v)=P(D=1|v;θ10), whereθ10is a vector of pa-rameters estimated with maximum likelihood. In the empiri-cal application, we model the conditional probability using a logit model. An analogous strategy was adopted by Wooldridge (2002b), Robins et al. (1995), Abadie (2005). The latter also analyzes nonparametric first-step estimators, as done by Hahn (1998), Hirano et al. (2003), or Chen et al. (2005b). A flexible functional form can be achieved using polynomials, and in any case we are interested only in obtaining good predictions for the conditional probabilities, and the parameters estimated in the binary variable model will be of little or no intrinsic interest. Moreover, it is well known that the choice of functional form

in a binary dependent variable model rarely has important con-sequences for the predicted probabilities (see, e.g., Amemiya 1985, chap. 9).

If the observations are a simple random sample, then stan-dard errors can be estimated using stanstan-dard asymptotic the-ory for two-step method-of-moments estimators (see Newey and McFadden 1994). But virtually all widely used household surveys—including the Indian NSS, the World Bank’s LSMS, the CPS and the PSID in the United States, or the Demo-graphic and Health Surveys—adopt a stratified and clustered design, making the assumption of iid observations untenable. (For an overview of the issues involved in estimation and infer-ence in multistage surveys see Deaton 1997, chap. 1.) In strat-ified and clustered surveys, the population is first divided into a fixed number ofstrata, usually defined following geographi-cal and/or socioeconomic criteria. Then a predetermined num-ber ofclusters(typically villages or urban blocks) are sampled independently from each stratum. Finally, households are se-lected independently within each cluster or, as in the NSS, from separate second-stage strata created within clusters. The use of stratification in survey design typically leads to lower standard errors, because, by construction, all possible samples become more similar to each other as a fixed proportion of observations are selected from different areas. Instead, clustering frequently leads to standard errors that are considerablyhigherthan those calculated assuming simple random sampling. This is a conse-quence of the positive correlation that is common for variables recorded in the same cluster. In most cases, the net effect of clustering and stratification is anincreasein standard errors, so that ignoring the multistage design of a survey can lead to se-riously misleading inference. Bhattacharya (2005) showed that this is indeed the case when one calculates standard errors for inequality measures in a given cross-section of the Indian NSS. In addition, because in most surveys the sampling scheme is such that the ex ante probability of selection is not the same for each households, consistent estimation of population parame-ters requires the use of sampling weights.

We estimate standard errors that take into account the presence of a complex survey design, and we derive the as-ymptotic results letting the total number of clusters grow to infinity, keeping both the number of households selected in a cluster and the proportion of clusters selected in each stra-tum constant. This setting is appropriate for our purposes, be-cause in the NSS the number of clusters is much higher than the number of households selected per cluster. Because we use observations sampled from two different databases, we need to be explicit about how the sampling from the two popula-tions is done. Here we assume that the first-stage strata are the same across the two subpopulations. Related results have been given by Bhattacharya (2005), who studied asymptotic prop-erties of generalized method-of-moments (GMM) estimators in presence of multistage surveys, in the standard situation in which all observations belong to the same population.

Letβ0= [φ0 θ10′θ00]′ denote the vector containing the true value of the parameters to be estimated (including those esti-mated in the first step), and letg(y;v,D;β0)denote the set of moments identifying all such parameters. These moments also include the scores corresponding to the conditional likelihood ofDas a function ofθ10.


(1)

Table 8. Adjusted poverty counts, 55th NSS round (1999–2000)

y30and y30, household y30, household

Sample household size and basic size and all

Auxiliary survey size sizea controlsb controlsc Unadjustedd

Rural

50, 7/93–6/94 118,079 31.8(.421) 32.6(.462) 32.3(.650) 28.4(.40)

51, 7/94–6/95 73,488 30.7(1.042) 32.2(1.189) 32.5(1.188)

52, 7/95–6/96 71,942 30.4(.813) 32.4(1.135) 32.1(.964)

53, 1/97–12/97 72,327 30.6(1.114) 31.1(1.079) 31.2(1.066)

Urban

50, 7/93–6/94 79,733 25.8(.591) 26.6(.684) 25.9(.636) 24.5(.58)

51, 7/94–6/95 50,173 24.8(1.249) 27.0(1.397) 27.1(2.214)

52, 7/95–6/96 49,748 24.1(.701) 25.2(.749) 24.0(.751)

53, 1/97–12/97 56,644 26.6(.757) 27.4(.772) 27.1(.768)

Source: Author’s computation from NSS, rounds 50–53 and 55. Robust standard errors are in parentheses. The poverty lines are the official ones for the 50th round (205.67 for the rural sector and 283.44 for the urban sector). All monetary values from subsequent rounds are deflated using state and sector specific official Consumer Price Indexes (CPIAL for households living in rural areas and CPIIW for those living in urban areas). All estimates are computed for all major Indian states: Andhra Pradesh, Assam, Bihar, Gujarat, Haryana (urban only), Karnataka, Kerala, Madhya Pradesh, Maharashtra, Orissa, Punjab, Rajasthan, Tamil Nadu, Uttar Pradesh, West Bengal, and Delhi (urban only). The adjusted poverty counts are estimated using the estimator developed in the text, using a logit first step. In both columns, the logit first step also includes a polynomial in household size.

ay

30is (log) per capita expenditure in miscellaneous items, in 1993–1994 rupees.

bBasic controls are categorical variables for education of the household head, main economic activity of the household, whether the household belongs to a “scheduled caste or tribe,” and land ownership.

cIncludes all basic controls in column 2, as well as the following: age and gender of the household head, number of household members in different age group by gender (under 5 years old, between 5 and 15, between 16 and 60, and over 60), categorical variables for main energy source for cooking and for lighting.

dUnadjusted poverty counts, calculated only for larger Indian states (excluding Jammu & Kashmir) and using 30-day recall for food and other high frequency items.

The results in Table 8 are somehow surprising. As discussed in Section 2.2, the fact that expenditure reports for food in 1999–2000 were being recorded using both a 30-day recall and a 7-day recall led to the expectation that the 30-day recall

re-ports would be pulled up toward the 7-day rere-ports (usually pro-portionally larger) and that the 7-day reports would be pulled down instead. This expectation was also consistent with the ob-servation that the ratio between 7-day and 30-day reports for

Table 9. Propensity score diagnostics

y30, household size, y30, household size,

y30and household size and basic controls and all controls

Interval Interval Interval

Sample 1st–99th Observations 1st–99th Observations 1st–99th Observations

Auxiliary survey size percentiles withR>100 percentiles withR>100 percentiles withR>100 Rural

50, 7/93–6/94 11,8079 D=1 [.385 .659] 0 [.273 .810] 1 [.259 .857] 1

D=0 [.380 .649] [.243 .777] [.230 .784]

51, 7/94–6/95 73,488 D=1 [.678 .923] 2 [.564 .941] 3 [.541 .944] 6

D=0 [.685 .929] [.678 .923] [.585 .961]

52, 7/95–6/96 71,942 D=1 [.745 .932] 4 [.628 .941] 2 [.606 .946] 3

D=0 [.745 .923] [.667 .954] [.650 .963]

53, 1/97–12/97 72,327 D=1 [.776 .895] 0 [.643 .923] 1 [.605 .930] 0

D=0 [.775 .893] [.686 .936] [.669 .944]

Urban

50, 7/93–6/94 79,733 D=1 [.361 .737] 0 [.199 .823] 1 [.175 .853] 2

D=0 [.346 .706] [.159 .784] [.117 .818]

51, 7/94–6/95 50,173 D=1 [.632 .898] 0 [.559 .943] 0 [.547 .947] 0

D=0 [.592 .891] [.466 .929] [.433 .929]

52, 7/95–6/96 49,748 D=1 [.752 .930] 0 [.681 .946] 0 [.665 .947] 0

D=0 [.751 .922] [.637 .932] [.597 .935]

53, 1/97–12/97 56,644 D=1 [.626 .867] 0 [.497 .879] 0 [.536 .902] 0

D=0 [.617 .859] [.546 .895] [.459 .887]

Source: Author’s computation from NSS, rounds 50–53 and 55. For a description of the proxy variables included in each cell, see Table 8. Each interval reports the 1st and 99th percentiles of the distribution of the estimated propensity scores, defined here as the conditional probability that an observation belongs to the target survey (the 55th round) given the auxiliary variables. The intervals are reported separately for the target survey (D=1) and for the auxiliary survey (D=0). The column next to each interval reports the number of observations whose reweighting function [calculated as in (3) in the text] is larger than 100. These observations, if any, are dropped before calculating the adjusted poverty counts reported in Table 8.


(2)

food, even if still larger than 1, was much smaller than the ra-tios observed in the thin rounds, where different households were assigned different questionnaires. The use of the 30-day reports for the calculation of official poverty head counts could then have resulted in a significant underestimation of poverty with respect to previous rounds. Instead, the adjusted and un-adjusted estimates are rather close, suggesting that most of the reconciliation between 7-day and 30-day reports stems from a bias of the 7-day reports toward the 30-day ones, and not vice versa.

In the previous large quinquennial round, carried out in 1993–1994, the proportion of the population in poverty in the states considered here was 33.4% in urban areas and 38.2% in rural areas. Thus, even if the adjustment delivers, as expected, higher head count ratios than the official ones, our estimates confirm a very significant poverty reduction in India during the 1990s. This conclusion is consistent with the results reported by Deaton (2003) and Sundaram and Tendulkar (2003). The latter authors, together with Deaton and Drèze, also argued that such a large decline in poverty is consistent with evidence from employment surveys, the National Accounts, and data on agri-cultural wages. If one accepts the results presented in Table 8, then the conclusion is that most of the poverty reduction mea-sured by using NSS data is real, not simply a statistical artifact due to a change in the survey design. Even so, the small dif-ference between the official and the adjusted head counts does not diminish the importance of taking comparability issues seri-ously. The results from the thin rounds, as well as from several other studies, clearly show that changes in questionnaire design can have an enormous impact on the estimated distribution of expenditure. Even for 1999–2000, there were strong a priori reasons to expect the official figures to understate poverty con-siderably, and in principle the adjustment could have mattered more.

5. CONCLUSIONS AND EXTENSIONS

In this article we have described how an IPW procedure can be used to recover comparability over time for statistics made otherwise incomparable because of changes in data collection methodology. The adjustment can be applied to estimators of parameters identified by a population moment condition of the form E[g(y;φ0)] =0, where y is the variable

mea-sured in a noncomparable way in the different surveys. This framework encompasses a broad set of poverty and inequality measures. The estimator requires the existence of a set of aux-iliary variables whose reports are not affected by the different survey design and whose relationship with the main variable of interest is stable across the surveys. The reliability of the adjusted estimates depends crucially on the reliability of the necessary identifying assumptions, which should be carefully evaluated by the researcher on a case-by-case basis. With this caveat, the procedure introduced here should be a very useful tool for a researcher interested in the evolution over time of welfare or aggregate economic indicators, because changes in survey methodology are frequent and can easily lead to non-comparability issues.

We estimated adjusted poverty counts from the 55th round of the Indian NSS, a large expenditure survey carried out in

1999–2000 for which comparability issues arose due to changes in the questionnaire. The identifying assumptions needed for the good performance of our estimator involve unobserved variables and thus cannot be tested directly. However, using previous waves of the NSS, we have provided indirect evi-dence substantially supporting their validity. According to our estimates, in 1999–2000 the poor accounted for approximately 30–33% of the rural population and 24–27% of the urban pop-ulation. Almost all of the alternative estimates that we calculate are higher than the official poverty counts, but they still show an impressive poverty decline in the 1990s, because the previ-ous estimates—from the 1993–1994 round of the NSS—were approximately 30% higher.

As a caveat to the empirical application, we stress that our re-sults should not be interpreted to suggest the superiority of one recall period for expenditure over the others. When the identi-fying assumptions hold, the reweighting procedure will recover statistics that are comparable with others calculated previously using different methodologies. Ascertaining which question-naire type is more appropriate is an important task we do not address in this article. Another caveat is the fact that the identi-fying assumptions will typically not hold if the target and aux-iliary surveys refer to very different populations. For instance, our methodology is unlikely to be useful if one needs to make cross-country comparisons of welfare indicators, or if the two surveys are separated by a wide temporal gap.

Even if our emphasis on comparability issues over time is justified by the empirical application, the IPW procedure described here can be fruitfully applied to several other missing-data problems. The asymptotic results will be particularly use-ful when such problems arise in the context of data collected with multistage surveys.

One important application is the estimation of parameters identified by possibly nonlinear moment conditions in non-classical measurement error models with auxiliary data. This framework was studied by Chen, Hong, and Tamer (2005a) and Chen et al. (2005b). Suppose that the researcher needs to estimate a parameter, φ0, identified by a moment condition, E[g(y;φ0)] =0, and suppose also that in the full (primary)

sam-ple,yis measured with error. In some cases, the researcher may have available an auxiliary sample that contains both true values ofyand other variablesvthat are also recorded in the primary sample and may include the mismeasured variables y. If the conditional distribution ofygivenvis the same in the primary and the auxiliary databases, then one can use the reweighting procedure described in this article to consistently estimateφ0,

even in the context ofarbitrarycorrelation betweenyand the measurement error. There are two important cases in which the equality of the conditional expectation across the primary and the auxiliary sample is naturally satisfied: when the auxiliary sample has been obtained with a stratified sampling design, where a nonrandom response-based subsample of the primary data is validated, and when the auxiliary sample is a true valida-tion sample, that is, a random subsample from the full primary dataset. Chen et al. (2005a) proposed an alternative semipara-metric two-step estimator in which the first step is a sieve-based nonparametric estimation of the conditional expectation of g(y;φ0)givenv. Chen et al. (2005b) also considered

esti-mation with a parametric or nonparametric inverse probabil-ity weighting first step and studied semiparametric efficiency;


(3)

however, all of their results were obtained under the assumption of iid observations. The estimator described in this article offers a simple parametric alternative in which the standard errors are calculated taking into account the presence of clustering and stratification that characterize virtually all household surveys.

The IPW approach illustrated in this article can also be useful in calculating welfare measures for small areas, such as villages or towns. Household surveys seldom contain sufficient observa-tions from small areas to allow the estimation of poverty or in-equality measures with acceptable precision. To recover precise estimates of welfare measures, Elbers, Lanjouw and Lanjouw (2003) proposed merging information on auxiliary variables from a census (which commonly does not record expenditure) with data on the same variablesandexpenditure from a house-hold survey. But their estimator requires fairly complicated simulation techniques. An IPW estimator provides a simpler alternative without requiring stronger identifying assumptions. One can easily apply the methodology described here using the census as the target population and household surveys as auxil-iary data.

ACKNOWLEDGMENTS

The author is deeply grateful to Angus Deaton and Bo Hon-oré for comments, support, and conversations that led to this project. For insightful comments and useful suggestions thanks are also due to Alberto Abadie, Dwayne Benjamin, Debopam Bhattacharya, Phil Cross, Han Hong, Allen Kelley, Aprajit Mahajan, Robert McMillan, Jonathan Morduch, Jerry Reiter, Barbara Rossi, Elie Tamer, seminar participants at various insti-tutions, and especially to two anonymous referees. Of course, the author is solely responsible for any remaining error or omis-sion.

APPENDIX A: PROOFS OF PROPOSITIONS Proof of Proposition 1

Using the law of iterated expectations and assumption (A2), we can rewrite the initial moment condition as

E[ng(y;φ0)|D=1] =

nE[g(y;φ0)|v,D=0]dP(v|D=1)

[by (A3) and (A4)] =

nE[g(y;φ0)|v,D=0]R(v)dP(v|D=0)

=E[nR(v)g(y;φ0)|D=0].

Note that ifvdoes not includen, then (A2) should be modified as

E[ng(y;φ0)|v,D=0] =E[ng(y;φ0)|v,D=1].

For the identification of R(v), it is sufficient to show that

P(D=1|v)is identified by the sampling process. First, note that

the econometrician observesvˆ=Dv+(1−D)v. Thus the sam-pling process identifies P(D=1v)=P(D=1,v)/P(vˆ). Ap-plying Bayes’s theorem, we have

P(D=1|ˆv)

= P(v|D=1)P(D=1)

P(v|D=1)P(D=1)+P(v|D=0)P(D=0)

=P(D=1|v) a.s.,

where the last equality follows from (A1).

Proof of Proposition 2 Letv= [nvn]′; then

f(y|n,D=1)

=

vn

dP(y,vn|n,D=1)

=

vn

f(y|vn,n,D=1)dP(vn|n,D=1) [by (A2b)]

=

vn

f(y|vn,n,D=0)dP(vn|n,D=1)

=

vn

R(vn)f(y|vn,n,D=0)dP(vn|n,D=0)

=

vn

R(vn)dP(y,vn|n,D=0) =f(y|n,D=0)E[R(vn)|y,n,D=0].

Note that here the reweighting function has a different form, because all probabilities are now conditional. Thus

R(vn)=

dP(vn,n,D=1)

dP(n,D=1)

dP(n,D=0) dP(vn,n,D=0) =P(D=1|vn,n)dP(vn,n)

P(D=0|vn,n)dP(vn,n)

P(D=0|n)dP(n) P(D=1|n)dP(n)

=P(D=1|v)P(D=0|n)

P(D=0|v)P(D=1|n).

The Complete Proposition 3 and Its Proof

First, note that the sampling is done from a population that encompasses both the auxiliary and target surveys. The popu-lation is divided into strata that are common to both subpopu-lations. A random sample of clusters is then drawn from each stratum. This allows us to treat the variable Dsih,the dummy equal to 1 when a household belongs to the target population, in the same way as all other variables involved in the estima-tion. Definezsih=(ysih,vsih,Dsih,nsih).Letgi(β)be defined as in (7),

gi(β)= S s=1

1(is)

msi

h=1

ωsihg(ysih,vsih,Dsih;β).


(4)

A0. Fors,s=1, . . . ,S, (zsih,zsih′)are independent unless

s=s′ andi=i′. For eachs, (zsih)are identically dis-tributed. For s=s′, zs and zs′ are independent (but

not necessarily identically distributed), where zs ≡ {zsih}i=1,...,ns;h=1,...,msi.

A1. gji(z;β)is continuous at eachβwith probability 1, for eachj=1, . . . ,J.

A2. ∃d(·)withE[d(·)]<such thatgji(t;β)d(t)for eachj=1, . . . ,J, for allt.

A3. The parameter spaceis compact.

A4. β0int()andβ0solves (5) in the text uniquely. A5. E[gi(z;β)]is continuously differentiable atβ0andŴis

nonsingular, where

Ŵ=plim1

n

n

i=1 ∂

∂β′E(gi(β0)).

A6. The sequence νn(β)= √1nni=1[gi(β)−E(gi(β))] is stochastically equicontinuous.

A7. supβ∈E|gi(β)|3<∞.

A8. limn→∞1nni=1var(gi(β))=W<∞. Under assumptions A0–A8,

n(βˆ β0)d N(0,Ŵ−1W(Ŵ−1)′).

For the proof, it is sufficient to note that assumptions A1–A8 correspond to assumptions A0–A7 and A8b of Bhattacharya (2005). Then the conclusion follows from Bhattacharya’s Proposition 2, for the case in which the number of moments is equal to the number of estimated parameters.

APPENDIX B: ASYMPTOTIC VARIANCE FOR HEAD COUNTS

In the estimation of poverty counts, the expression in (2) spe-cializes to

EnR(v)(1(y<z)φ0)|D=0=0, (B.1) whereR(v)= [P(D=0|v)(1−θ00)]−1P(D=1|v00andθ00= P(D=0).In the derivation of the variance, it is useful to rewrite the left side of (B.1) as

EnR(v)(1(y<z)φ0)|D=0

=EnR(v)1(y<z)|D=0−φ0E[nR(v)|D=0] =EnR(v)1(y<z)φ0η1|D=0

,

whereη1=E[n|D=1]is average household size in the target survey. The last equality follows as

E[nR(v)|D=0] =

nR(v)dF(v|D=0)

=

ndF(v|D=1)

dF(v|D=0)dF(v|D=0)

=E[n|D=1].

We assume a parametric single-index model for the propensity score. Letv¯indicate the column vector of all auxiliary variables and their powers and interactions, as entered in the first-step re-gression (including the constant). ThenP(D=1|v)=P(v¯′θ10).

We estimate the parametersθ01using maximum likelihood. Let

β≡ [φθ1′θ0η1]′.Theng(y,v,D;β0)in (6) becomes g(y,v,D;β0)

= ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

1(D=0)n P(v¯′θ

1 0)θ00

[1P(v¯′θ10)](1θ0 0)1

(y<z)φ0η01

DP(v¯′θ1

0) P(v¯′θ10)[1−P(v¯′θ10)]

P(v¯′θ1 0)

∂(v¯′θ10) v¯

[(1−D)θ00]

1(D=1)[nη10]

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

. (B.2)

The second row of (B.2) represents the first order conditions (FOC) from the maximum likelihood estimation of θ10. Note that in (B.2 ) household size appearswithing(·), so that when calculating (9) and (10) one must use sampling weights for households, not individuals. In the empirical application we use a logit model for the propensity score; then (B.2) specializes to g(ysih,vsih,Dsih; ˆβ)

= ⎡ ⎢ ⎢ ⎣

1(Dsih=0)[nsihR(vsih; ˆθ1,θˆ0)1(ysih<z)−φη1] [DsihP(v¯′sihθˆ1)]¯vsih

[(1−Dsih)− ˆθ0]

1(Dsih=1)[nsih− ˆη1]

⎤ ⎥ ⎥ ⎦, where

R(vsih; ˆθ1,θˆ0)=(1−P(v¯sihθˆ1))(1− ˆθ0)−1P(v¯sihθˆ1)θˆ0.

Following the estimator in (10), these expressions are used di-rectly to estimate the matrix W in (8). The elements of the matrixŴin (8) are estimated using (9). Then, if we let gdl=

gd(ysih,vsih,Dsih; ˆβ)/∂βl, and dropping the stratum-cluster-household–specific subscripts for simplicity, the nonzero ele-ments ofŴˆ are calculated using the following expressions:

g11= −1(D=0)η1,ˆ

g12=1(D=0)nR(v; ˆθ1,θˆ0)1(y<z)v¯′, g13=1(D=0)nR(v; ˆθ

1,θˆ0)

ˆ

θ0(1− ˆθ0)1(y<z), g14= −1(D=0)φ,ˆ

g22= −P(v¯′θˆ1)[1−P(v¯′θˆ1)]¯vv¯′, g33= −1,

and

g44= −1(D=1).

Finally, in the empirical application we take the states to repre-sent different strata. In the NSS, strata are actually reprerepre-sented by smaller geographical units (usually districts in rural areas and towns of given population in urban areas).

APPENDIX C: DETAILS OF MONTE CARLO SIMULATION

We want to show that P(y<5.326|D=1), P(y<5.326|

D=0),E(y|D=1), andE(y|D=0)are identified by


(5)

tions (11), (12), and (14). We consider the head count ratio for the target population first. This can be rewritten as

P(y<z|D=1)

=

v

P(y<z|v)f(v|D=1)dv

=

v

P(y<z|v)P(D=1|v)f(v) P(D=1) dv

=

v

P(y<z|v) P(D=1|v)f(v) sP(D=1|s)f(s)ds

dv,

where now each element is identified by assumptions on the DGP, and the head count can be recovered by numerical inte-gration. The resulting true value of the head count ratio in the target population is equal to .2256, whereas the head count ratio in the auxiliary population (calculated in an analogous way) is .2759. The mean value ofyin the target and auxiliary popula-tions can be calculated in a similar fashion.

E(y|D=1)=

v

E(y|v) P(D=1|v)f(v) sP(D=1|s)f(s)ds

dv,

where again all elements are identified by the assumptions on the DGP. The resulting values are 8.224 for the target popu-lation and 7.600 for the auxiliary popupopu-lation. All integrals are calculated over the interval[−2 12], using a grid with bin width of .001. Using a wider interval and/or a finer grid leaves the re-sult identical up to the sixth decimal place.

Each simulation is completed as follows. First, we draw a sample from the distribution ofv. Then we generate the corre-sponding values ofyusing (13), where we draw the errors,ǫ, from a logistic distribution. Finally, we assign the value of the binary variableDto each observation drawing from a Bernoulli distribution with observation-specific probabilities of success equal toP(Di=1|vi), where the probabilities are calculated us-ing (14).

[Received October 2004. Revised January 2005.]

REFERENCES

Abadie, A. (2005), “Semiparametric Difference-in-Differences Estimators,”

Review of Economic Studies, 72, 1–19.

Amemiya, T. (1985),Advanced Econometrics, Cambridge, MA: Harvard Uni-versity Press.

Attanasio, O., Battistin, E., and Ichimura, H. (2004), “What Really Happened to Consumption Inequality in the U.S.?” NBER Working Paper 10338, National Bureau of Economic Research.

Battistin, E. (2003), “Errors in Survey Reports of Consumption Expenditures,” Working Paper W03/07, Institute for Fiscal Studies, London.

Battistin, E., Miniaci, R., and Weber, G. (2003), “What Do We Learn From Recall Consumption Data?”Journal of Human Resources, 38, 354–385. Bhattacharya, D. (2005), “Asymptotic Inference From Multistage Samples,”

Journal of Econometrics, 126, 145–171.

Browning, M., Crossley, T. F., and Weber, G. (2003), “Asking Consumption Questions in General Purpose Surveys,”Economic Journal, 113, F540–F567. Chen, X., Hong, H., and Tamer, E. (2005a), “Measurement Error Models With

Auxiliary Data,”Review of Economic Studies, 72, 343–366.

Chen, X., Hong, H., and Tarozzi, A. (2005b), “Semiparametric Efficiency in GMM Models With Nonclassical Measurement Error,” working paper, Duke University and New York University.

Chow, G. (1983),Econometrics, New York: McGraw-Hill.

Clogg, C., Rubin, D., Schenker, N., Schultz, B., and Weidman, L. (1991), “Mul-tiple Imputation of Industry and Occupation Codes in Census Public-Use Samples Using Bayesian Logistic Regression,”Journal of the American Sta-tistical Association, 86, 68–78.

Deaton, A. (1997),The Analysis of Household Surveys: A Microeconometric Approach to Development Policy, Baltimore: Johns Hopkins University Press (for the World Bank).

(2001), “Survey Design and Poverty Monitoring in India,” working paper, Research Program in Development Studies, Princeton University.

(2003), “Adjusted Indian Poverty Estimates for 1999–2000,” Eco-nomic and Political Weekly, January 25, 322–326.

Deaton, A., and Drèze, J. (2002), “Poverty and Inequality in India: A Reexam-ination,”Economic and Political Weekly, September 7, 3729–3748. Deaton, A., and Grosh, M. (2000), “Consumption,” inDesigning Household

Survey Questionnaires for Developing Countries: Lessons From 15 Years of the Living Standards Measurement Study, eds. M. Grosh and P. Glewwe, Oxford, U.K.: Oxford University Press (for the World Bank), pp. 91–133. Deaton, A., and Kozel, V. (eds.) (2005),Data and Dogma: The Great Indian

Poverty Debate, New Delhi, India: MacMillian.

Deaton, A., and Tarozzi, A. (2005), “Prices and Poverty in India,” inData and Dogma: The Great Indian Poverty Debate, eds. A. Deaton and V. Kozel, New Delhi, India: MacMillian, Chap. 16.

Dehejia, R., and Wahba, S. (1999), “Causal Effects in Nonexperimental Studies: Reevaluating the Evaluation of Training Programs,”Journal of the American Statistical Association, 94, 1053–1062.

DiNardo, J., Fortin, N., and Lemieux, T. (1996), “Labor Market Institutions and the Distribution of Wages, 1973–1992: A Semiparametric Approach,”

Econometrica, 64, 1001–1044.

Elbers, C., Lanjouw, J., and Lanjouw, P. (2003), “Micro-Level Estimation of Poverty and Inequality,”Econometrica, 71, 355–364.

Fan, J. (1992), “Design-Adaptive Nonparametric Regression,”Journal of the American Statistical Association, 87, 998–1004.

Frölich, M. (2004), “Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators,”Review of Economics and Statistics, 86, 77–90. Ghose, S., and Bhattacharya, N. (1995), “Effects of Reference Period on

En-gel Elasticities of Clothing and Other Items: Further Results,”The Indian Journal of Statistics, Ser. B, 57, 433–449.

Gibson, J. (1999), “How Robust Are Poverty Comparisons to Changes in Household Survey Methods? A Test Using Papua New Guinea Data,” mimeo, University of Waikato, Hamilton, New Zealand, Dept. of Economics.

(2002), “Why Does the Engel Method Work? Food Demand, Economies of Size, and Household Survey Methods,”Oxford Bulletin of Eco-nomics and Statistics, 64, 341–359.

Gibson, J., Huang, J., and Rozelle, S. (2001), “Why Is Income Inequality so Low in China Compared to Other Countries? The Effect of Household Sur-vey Methods,”Economic Letters, 71, 329–333.

(2003), “Improving Estimates of Inequality and Poverty From Urban China’s Household Income and Expenditure Survey,”Review of Income and Wealth, 49, 53–68.

Government of India (1993), “Report of the Expert Group on the Estimation of Proportion and Number of Poor,” discussion paper, Planning Commission, Delhi.

Hahn, J. (1998), “On the Role of the Propensity Score in Efficient Semiparamet-ric Estimation of Average Treatment Effects,”Econometrica, 66, 315–332. Heckman, J., Ichimura, H., and Todd, P. (1998), “Matching as an Econometric

Evaluation Estimator,”Review of Economic Studies, 65, 261–294. Hirano, K., Imbens, G., and Ridder, G. (2003), “Efficient Estimation of Average

Treatment Effects Using the Estimated Propensity Score,”Econometrica, 71, 1161–1189.

Horowitz, J., and Manski, C. (1995), “Identification and Robustness With Con-taminated and Corrupt Data,”Econometrica, 63, 281–302.

Horvitz, D., and Thompson, D. (1952), “A Generalization of Sampling Without Replacement From a Finite Universe,”Journal of the American Statistical Association, 47, 663–685.

Jolliffe, D. (2001), “Measuring Absolute and Relative Poverty: The Sensitivity of Estimated Household Consumption to Survey Design,”Journal of Eco-nomic and Social Measurement, 27, 1–23.

Lanjouw, J. O., and Lanjouw, P. (2001), “How to Compare Apples and Oranges: Poverty Measurement Based on Different Definition of Consumption,” Re-view of Income and Wealth, 47, 25–42.

Little, R., and Rubin, D. (2002), Statistical Analysis With Missing Data

(2nd ed.), New York: Wiley.

Mahalanobis, P. C., and Sen, S. B. (1954), “On Some Aspects of the Indian National Sample Survey,”Bulletin of the International Statistical Institute, 34, 5–14.

Neter, J., and Waksberg, J. (1964), “A Study of Response Errors in Expendi-tures Data From Household Interviews,”Journal of the American Statistical Association, 59, 18–55.

Newey, W. K., and McFadden, D. (1994), “Large-Sample Estimation and Hypothesis Testing,” in Handbook of Econometrics, eds. R. Engle and D. McFadden, New York: Elsevier Science, pp. 2111–2245.


(6)

Rao, J. N. K., and Scott, A. J. (1984), “On Chi-Squared Tests for Multiway Contingency Tables With Cell Proportions Estimated From Survey Data,”

The Annals of Statistics, 12, 46–60.

Ravallion, M. (1992), “Poverty Comparisons: A Guide to Concepts and Meth-ods,” Working Paper 88, World Bank.

Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994), “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed,”Journal of the American Statistical Association, 89, 846–866.

(1995), “Analysis of Semiparametric Regression Models for Repeated Outcomes in the Presence of Missing Data,”Journal of the American Statis-tical Association, 90, 106–121.

Rubin, D. (1976), “Inference and Missing Data,”Biometrika, 63, 581–592. (1978), “Multiple Imputations in Sample Surveys,” in Proceedings of the Survey Research Methods Section, American Statistical Association, pp. 20–34.

(1996), “Multiple Imputation After 18+Years,”Journal of the Ameri-can Statistical Association, 91, 473–489.

Sachs, J. D., Varshney, A., and Bajpai, N. (1999),India in the Era of Economic Reforms, Oxford, U.K.: Oxford University Press.

Schafer, J., and Schenker, N. (2000), “Inference With Imputed Conditional Means,”Journal of the American Statistical Association, 95, 144–154. Schenker, N. (2003), “Assessing Variability Due to Race Bridging: Application

to Census Counts and Vital Rates for the Year 2000,”Journal of the American Statistical Association, 98, 818–828.

Schenker, N., and Parker, J. (2003), “From Single-Race Reporting to Multiple-Race Reporting: Using Imputation Methods to Bridge the Transition,” Sta-tistics and Medicine, 22, 1571–1587.

Schwarz, G. (1978), “Estimating the Dimension of a Model,”The Annals of Statistics, 6, 461–464.

Scott, C., and Amenuvegbe, B. (1990), “Effects of Recall Duration on Report-ing of Household Expenditures: An Experimental Study in Ghana,” Social Dimensions of Adjustment in Sub-Saharan Africa Working Paper 6, World Bank.

Sen, A. (2000), “Estimates of Consumer Expenditure and Its Distribution: Sta-tistical Priorities After the 55th Round,”Economic and Political Weekly, 35, 4499–4518.

Silverman, B. W. (1986),Density Estimation for Statistics and Data Analysis, New York: Chapman & Hall.

Stata Corporation (2003),Survey Data Reference Manual: Release 8.0, College Station, TX: Stata Press.

Sundaram, K., and Tendulkar, S. D. (2003), “Poverty Has Declined in the 1990s: A Resolution of Comparability Problems of NSS on Consumer Ex-penditure,”Economic and Political Weekly, 322–326.

Wooldridge, J. (2001),Econometrics of Cross Section and Panel Data, Cam-bridge, MA: MIT Press.

(2002a), “Asymptotic Properties of Weighted M-Estimators for Vari-able Probability Samples,”Econometrica, 67, 1385–1406.

(2002b), “Inverse Probability Weighted M-Estimators for Sample Selection, Attrition and Stratification,”Portuguese Economic Journal, 1, 117–139.