Sampling errors Afghanistan - National Risk and Vulnerability Survey 2011-2012, Living Conditions Survey NRVA 2011 12 report
187
ANNEX IX QUALITY ASSURANCE AND QUALITY ASSESSMENT
IX.4 Non-sampling errors IX.4.1 Overview of possible non-sampling errors
Aside from the sampling error associated with the process of selecting a sample, a survey is subject to a wide variety of non-sampling errors. These errors may – and unavoidably do – occur in all stages of the survey process. Non-sampling
errors are usually classiied into two groups: random errors and systematic errors. Random errors are unpredictable errors that are generally cancelled out if a large enough sample is used. Since NRVA has a large sample size, random
errors are a priori not considered to be an issue of large concern. Systematic errors are those errors that tend to accumulate over the entire sample and may bias the survey results to a considerable extent. Therefore, this category
of non-sampling errors is a principal cause for concern. The following overview elaborates the main types of systematic non-sampling errors.
Coverage errors Coverage errors occur when households are omitted, duplicated or wrongly included in the population or sample. Such
errors are caused by defects in the sampling frame, such as inaccuracy, incompleteness, duplications, inadequacy or obsolescence. Coverage errors may also occur in ield procedures, for instance when omitting speciic households
or persons. The sampling frames used for NRVA 2011-12 included the 2003-05 pre-census household listing and the 2003-04 National
Multi-sectoral Assessment of Kuchi NMAK-2004. Both listings were outdated by the time of ieldwork implementation and it is likely that in the intervening period considerable changes occurred with respect to the number and geographic
distribution of households. This will have had particular effect on newly built-up urban areas and squatter settlements, including areas with high density of internally displaced persons and returnees. Such areas will have been systematically
underrepresented in the sample selection. With regard to the Kuchi coverage, besides the observed, but un-quantiied rate of settlement of Kuchi households and natural population growth, changing migration patterns will have caused a
population distribution in 2011-12 that is different to the one represented in the NMAK list.
Non-response errors There are two types of non-response: unit non-response and item non-response. Unit non-response implies that no
information is obtained from a given sample unit, while item non-response refers to a situation where some but not all the information is collected for the unit. Item non-response occurs when respondents provide incomplete information,
because of respondents’ refusal or incapacity to answer, or omissions by interviewers. Often non-response is not evenly spread across the sample units but is concentrated among sub-groups. As a result, the distribution of the characteristics
of subgroups may deviate from that of the selected sample.
Unit non-response in NRVA 2011-12 occurred to the extent that sampled clusters were not visited, or that sampled households in selected clusters were not interviewed. Out of the 2,100 originally scheduled clusters, 150 7.1 percent
were not visited. For 133 of these non-visited clusters 6.3 percent, replacement clusters were sampled and visited. Although this ensured the approximation of the targeted sample size, it could not avoid the likely introduction of some
bias as the omitted clusters probably have a different proile than included clusters. In the visited clusters – including replacement clusters – 797 households 3.8 percent of the total could not be
interviewed because – mostly – they were not found or because they refused or were unable to participate. For 779 of these non-response households 3.7 percent of the total, replacement households were sampled and interviewed.
Since the household non-response is low and it can be expected that the replacement households provide a reasonable representation of the non-response households, this non-response error is considered of minor importance. The
overall unit non-response rate – including non-visited clusters and non-interviewed households, without replacement – is 10.9 percent.
With regard to item non-response, the close to 800 variables in the NRVA household and Shura questionnaires each reveal different levels of missing values. During the data-processing stages of manual checking, computerised batch
editing and inal editing these levels were reduced by edit strategies. For some key variables,
2
missing values were
2 All household identiication variables – Cluster code Q1.1, Residence code Q1.2, Province code Q1.3, District code Q1.4, Nahia code Q1.5, Control and
Enumeration area code Q1.6 and Village code Q1.7, as well as individual-level variables Relationship to the head of household Q3.3, Sex Q3.4, Age Q3.5 and Marital status Q3.6.
188
illed in for 100 percent. For other variables, missing values were only illed in when convincing evidence could be found for assigning a speciic value. Section X.4.2 gives information about missing values for selected variables. This
overview relects the inding that generally the percentage of missing values is low.
3
For household-level variables the level of missing values is typically well below 0.5 percent. The percentage of missing values in individual-level
variables is somewhat higher, but generally below 3 percent. Occurrence of higher levels are exceptional and relate to misunderstanding of skipping patterns.
Response errors Response errors result when questions are incorrectly asked, or information is incorrectly provided, received or
recorded. These errors may occur because of inappropriate questionnaire design, inadequate interviewer training, incompetence or irresponsible interviewer behaviour, time pressure, or shortcomings on the side of the respondent, such
as misunderstanding, inaccuracy, ignorance, recollection problems or reluctance to provide a correct answer.
The NRVA 2011-12 survey variables showed different rates of response errors, according to consistency and range checks performed. During manual checking and batch- and inal editing all key variables see above with evidently
incorrect values were corrected based on available information, referral back to the ield staff or most plausible inference. Other variables were corrected as far as circumstantial evidence allowed. If this was not possible, incorrect values were
converted to missing values.
Data-processing errors In principle, each of the stages of data processing – manual questionnaire checking, data capture, batch editing and
inal editing – and general data management can add to the number of errors included in the inal dataset. However, usually the major source of data-processing errors is data capture. Elaborate data-checking procedures and data-editing
programmes can to a signiicant degree correct data errors, but no dataset is ever completely error-free. NRVA 2011-12 used double data entry with independent veriication for data capture. As this in principle eliminates any
data typing mistakes, the only data errors in the data ile are response errors. A series of computerised checks provided the information where to remedy essential data structure and data integrity problems. In addition, a limited number of
consistency and range checks were performed before the raw dataset was delivered. The main thrust of data editing was done as part of the analysis phase. Although apparent data errors still exist in the inal dataset, these are very few
and statistically insigniicant.