Sampling errors Afghanistan - National Risk and Vulnerability Survey 2011-2012, Living Conditions Survey NRVA 2011 12 report

187 ANNEX IX QUALITY ASSURANCE AND QUALITY ASSESSMENT IX.4 Non-sampling errors IX.4.1 Overview of possible non-sampling errors Aside from the sampling error associated with the process of selecting a sample, a survey is subject to a wide variety of non-sampling errors. These errors may – and unavoidably do – occur in all stages of the survey process. Non-sampling errors are usually classiied into two groups: random errors and systematic errors. Random errors are unpredictable errors that are generally cancelled out if a large enough sample is used. Since NRVA has a large sample size, random errors are a priori not considered to be an issue of large concern. Systematic errors are those errors that tend to accumulate over the entire sample and may bias the survey results to a considerable extent. Therefore, this category of non-sampling errors is a principal cause for concern. The following overview elaborates the main types of systematic non-sampling errors. Coverage errors Coverage errors occur when households are omitted, duplicated or wrongly included in the population or sample. Such errors are caused by defects in the sampling frame, such as inaccuracy, incompleteness, duplications, inadequacy or obsolescence. Coverage errors may also occur in ield procedures, for instance when omitting speciic households or persons. The sampling frames used for NRVA 2011-12 included the 2003-05 pre-census household listing and the 2003-04 National Multi-sectoral Assessment of Kuchi NMAK-2004. Both listings were outdated by the time of ieldwork implementation and it is likely that in the intervening period considerable changes occurred with respect to the number and geographic distribution of households. This will have had particular effect on newly built-up urban areas and squatter settlements, including areas with high density of internally displaced persons and returnees. Such areas will have been systematically underrepresented in the sample selection. With regard to the Kuchi coverage, besides the observed, but un-quantiied rate of settlement of Kuchi households and natural population growth, changing migration patterns will have caused a population distribution in 2011-12 that is different to the one represented in the NMAK list. Non-response errors There are two types of non-response: unit non-response and item non-response. Unit non-response implies that no information is obtained from a given sample unit, while item non-response refers to a situation where some but not all the information is collected for the unit. Item non-response occurs when respondents provide incomplete information, because of respondents’ refusal or incapacity to answer, or omissions by interviewers. Often non-response is not evenly spread across the sample units but is concentrated among sub-groups. As a result, the distribution of the characteristics of subgroups may deviate from that of the selected sample. Unit non-response in NRVA 2011-12 occurred to the extent that sampled clusters were not visited, or that sampled households in selected clusters were not interviewed. Out of the 2,100 originally scheduled clusters, 150 7.1 percent were not visited. For 133 of these non-visited clusters 6.3 percent, replacement clusters were sampled and visited. Although this ensured the approximation of the targeted sample size, it could not avoid the likely introduction of some bias as the omitted clusters probably have a different proile than included clusters. In the visited clusters – including replacement clusters – 797 households 3.8 percent of the total could not be interviewed because – mostly – they were not found or because they refused or were unable to participate. For 779 of these non-response households 3.7 percent of the total, replacement households were sampled and interviewed. Since the household non-response is low and it can be expected that the replacement households provide a reasonable representation of the non-response households, this non-response error is considered of minor importance. The overall unit non-response rate – including non-visited clusters and non-interviewed households, without replacement – is 10.9 percent. With regard to item non-response, the close to 800 variables in the NRVA household and Shura questionnaires each reveal different levels of missing values. During the data-processing stages of manual checking, computerised batch editing and inal editing these levels were reduced by edit strategies. For some key variables, 2 missing values were 2 All household identiication variables – Cluster code Q1.1, Residence code Q1.2, Province code Q1.3, District code Q1.4, Nahia code Q1.5, Control and Enumeration area code Q1.6 and Village code Q1.7, as well as individual-level variables Relationship to the head of household Q3.3, Sex Q3.4, Age Q3.5 and Marital status Q3.6. 188 illed in for 100 percent. For other variables, missing values were only illed in when convincing evidence could be found for assigning a speciic value. Section X.4.2 gives information about missing values for selected variables. This overview relects the inding that generally the percentage of missing values is low. 3 For household-level variables the level of missing values is typically well below 0.5 percent. The percentage of missing values in individual-level variables is somewhat higher, but generally below 3 percent. Occurrence of higher levels are exceptional and relate to misunderstanding of skipping patterns. Response errors Response errors result when questions are incorrectly asked, or information is incorrectly provided, received or recorded. These errors may occur because of inappropriate questionnaire design, inadequate interviewer training, incompetence or irresponsible interviewer behaviour, time pressure, or shortcomings on the side of the respondent, such as misunderstanding, inaccuracy, ignorance, recollection problems or reluctance to provide a correct answer. The NRVA 2011-12 survey variables showed different rates of response errors, according to consistency and range checks performed. During manual checking and batch- and inal editing all key variables see above with evidently incorrect values were corrected based on available information, referral back to the ield staff or most plausible inference. Other variables were corrected as far as circumstantial evidence allowed. If this was not possible, incorrect values were converted to missing values. Data-processing errors In principle, each of the stages of data processing – manual questionnaire checking, data capture, batch editing and inal editing – and general data management can add to the number of errors included in the inal dataset. However, usually the major source of data-processing errors is data capture. Elaborate data-checking procedures and data-editing programmes can to a signiicant degree correct data errors, but no dataset is ever completely error-free. NRVA 2011-12 used double data entry with independent veriication for data capture. As this in principle eliminates any data typing mistakes, the only data errors in the data ile are response errors. A series of computerised checks provided the information where to remedy essential data structure and data integrity problems. In addition, a limited number of consistency and range checks were performed before the raw dataset was delivered. The main thrust of data editing was done as part of the analysis phase. Although apparent data errors still exist in the inal dataset, these are very few and statistically insigniicant.

IX.4.2 Missing values

Table IX.3 Provides information about the percentage of missing values for selected variables. Variables were purposely selected from all household-questionnaire modules and cover both key and secondary variables. 3 Missing values also include values that were found to be incorrect, but for which no justiiable valid value could be deduced see also the section on response errors. ANNEX IX QUALITY ASSURANCE AND QUALITY ASSESSMENT 189 Table IX.3 Percentage missing values for selected variables a Variable Base population Percent missing values Household-level variables Construction material of walls 20,828 1.9 Number of rooms in the dwelling 20,828 0.2 Main source of cooking fuel 20,828 0.4 Type of toilet facility used 20,828 0.0 Main source of drinking water 20,828 0.1 Household owning livestock 20,828 0.0 Number of goats vaccinated 2,969 0.3 Type of veterinary service provider 3,071 0.1 Households owning farm land 20,828 0.0 Jeribs of irrigated land owned 12,604 0.0 Jeribs of irrigated land cultivated 10,145 0.0 Main crop produced on irrigated land 9,993 0.5 Amount of most important crop produced 9,993 0.3 Number of mobile phones owned 20,828 0.4 Value of household debt 10,920 0.1 First household income source 20,828 0.0 Income from most important income source 20,828 0.4 Expenditure on food at home 20,828 0.0 Expenditure on childrens clothing 20,828 0.0 Reduced drinking water quantity shock 20,828 0.1 Male assessment of economic situation 20,828 0.5 Suficiency of food supply in household 20,828 0.0 Assessment of economic situation 20,828 0.80.4 Household members consuming dinners 20,828 0.0 Number of days consumed wheat 20,828 0.0 Amount of wheat consumed 19,828 0.2 Individual-level variables Person worked last week 84,023 1.1 Person worked last month 42,986 0.4 Economic activity status 84,023 1.1 Occupation 40,144 0.2 Place of birth 159,224 0.2 Place of usual residence in 1983 117,944 0.6 Years lived elsewhere 7,442 18.7 Lived elsewhere for seasonal work 84,023 1.7 Literacy 124,209 0.6 Attended formal school 124,209 2.5 Highest education grade completed 46,181 0.5 Currently attending school 36,595 1.7 Ever had a live birth b 24,774 3.0 Number of girls born b 21,457 0.0 Birth attendance b 21,457 0.5 Number of children under ive b 21,457 2.1 Children under-ive with birth certiicate b 15,817 1.2 a Based on unweighted observations b Percentages refer to, respectively, all observation units and to all units minus those from Zabul province, where female interviewers could not administer the survey. ANNEX IX QUALITY ASSURANCE AND QUALITY ASSESSMENT