Model development ALCS 2013 14 Main Report English 20151222

282

VI.2 Model development

The exercise proceeds in three steps.  First, we identified common non-consumption variables across surveys, including variables that correlate well with household consumption data . Obviously, the model’s ability to estimate changes in household consumption and poverty depends on changes in the explanatory variables over the same time. If the final model includes only variables that do not change – things such as type of dwelling and construction material of walls and floor – the model will predict a poverty rate the same as the poverty rate for the base year. The model therefore must include variables that change more over time, such as household head’s employment status and level of conflict. We reviewed each question across the three surveys to ensure that the common variables used remained comparable across surveys. 69 For instance, although the question ‘type of toilet facility used by household ’ is in all three surveys, the list of the categories of toilet changed in ALCS 2013-14. Therefore, we selected only categories within the question that are comparable. Similarly, the labour module covers a different length of time in the different surveys, altering female labour participation rates across surveys; as such, we restrict labour outcome variables such as employment status and employment type to ‘head of household’ and ‘adult male members between the ages of 25 and 50 ’, categories not affected by the time range used. Table VI.1 summarises the variables.  Secondly, we developed a model following the Yoshida et al. 2015 SWIFT approach. The model assumes a linear relationship between household consumption and its correlates, and the model assumes a projection error. 70 The equation representing the consumption model is: ln ℎ = ℎ ′� + � ℎ 1 Where, ��� ℎ is the log of per capita consumption of household h , ℎ is a k×1 vector of poverty correlates of household h, β is a k×1 vector of coefficients of poverty correlates, k is a number of variables and � ℎ is the projection. The explanatory variables in the right-hand side of the model capture variation in household consumption, thus differentiating poor from non- poor households. For the equation 1 we use NRVA 2011-12 survey, which has the consumption data. We then impose the estimated variables of the model onto the ALCS 2013- 14 dataset to predict household consumption and the poverty rate. The SWIFT modelling process includes multiple steps to improve the ability of the formula to project household income or expenditures by adjusting the coefficients β and estimating the distributions of both the coefficients and the projection errors. 71 No formula is perfect; so inclusion of the projection error is essential and estimating the distribution of the projection error is key for estimating poverty rates and their standard errors. 69 Beegle et al. 2011 show empirical evidence of how responses of households to questions can change based on the questionnaire designs. 70 This does not mean SWIFT does not use a non- linear model. However, SWIFT’s formula is linear in variables created in the dataset. Since some variables can be squares of other variables, SWIFT’s formula can be non-linear. One of typical examples is that SWIFT uses household size and household size squared in a formula. 71 The approach adopted by the SWIFT team is rather conservative in that the team did not adopt some approaches discussed at the frontier of research on modelling because the team thought evidence for these approaches is not yet strong enough. However, the team has been exploring such new techniques and may update the SWIFT modelling process once enough supportive evidence for these methodologies is provided. 283

VI.3 Model selection: cross-validation