are inverse distance weighting IDW, kernel smoothing, and kriging Allepuz 2008.
2.4.2 Exploratory Analysis
Exploratory analysis has the specific objective of using a statistical hypothesis – testing framework for the identification of spatial clusters of disease.
The term clusters refers to locations at which disease occurrence is higher or lower than would have been expected if disease were randomly distributed in
space. The statistical methods can be grouped into global and local statistics depending on whether they generate a single statistic for the whole area or
statistics for individual locations within that area Durr and Gatrell 2004. Clustering of a disease can occur for a variety of reasons including the
infectious spread of disease, the occurrence of disease vectors in specific locations, the clustering of a risk factor or combination of risk factors, or the
existence of potential health hazards such as localized pollution sources scattered throughout a region, each creating an increased risk of disease in its immediate
vicinity. The investigation of possible disease clustering is fundamental to epidemiology, with one of the aims being to determine whether the clustering is
statistically significant and worthy or further investigation, or whether it is likely to be a change occurrence, or is simply a reflection of the distribution of the
population at risk Pfeiffer et al. 2008. Statistical tests applied to the detection of spatial clusters can be either
global cluster detection tests, were a summary statistic identifies whether or not clustering is present in a region under investigation, or local cluster detection tests
which seek to define the spatial location of clusters within a given region. The cluster detection tests that are available differ in that some use complete
population counts to characterize the population at risk whereas other use a sample of controls Stevenson 2009.
Global cluster detection tests can be done by using Moran’s I. Moran’s I statistic gives a formal indication of the degree of linear association between a
vector of observed values and a weighted average of its neighbouring values Stevenson 2009. Moran’s I is approximately normally distributed and has a
expected value of – 1N – 1 where N equals the number of area units within a study region, when no correlation exists between neighbouring values. The
expected value of the coefficient therefore approaches zero as N increases. A Moran’s I of zero indicates the null hypothesis of no clustering, a positive
Moran’s I indicates positive spatial autocorrelation i.e. clustering of areas of similar attribute values, while negative coefficient indicates negative spatial
autocorrelation i.e. that neighbouring areas tend to have dissimilar attribute values Pfeiffer et al. 2008.
The Moran’s I statistic is calculated as follow:
Moran’s I =
Where: n: the number of polygons in the study area
w
ij
: the values of the spatial proximity matrix y
i
: the attribute under investigation ǔWKHPHDQRIWKHDWWULEXWHXQGHULQYHVWLJDWLRQ
Local cluster detection test can be conducted by Kulldorff’s spatial scan statistic. Spatial scan statistic uses a likelihood ratio test for the number of cases
found in the study region population the null hypothesis to a model that has different disease risk depending on being inside or outside a circular zone
Stevenson 2009. The test can be used for spatially aggregated data as well as when the exact geographic coordinates are known for each individual. Therefore it
can be used for lattice or point spatial data Allepuz 2008. When data is aggregated into census districts the measure will be concentrated at the central
coordinates of those districts. The scan statistic is commonly used to test if a one dimensional point process is purely random, or if any clusters can be detected by
using a variable circular window Kulldorf 1997. The number of cases is compared to the background population data and
the expected number of cases in each unit is proportional to the size of the
population at risk. Circle centers are defined either by the case and controlpopulation data or by specifying an array of grid coordinates. Secondary
clusters are computed, based on the degree of overlap allowed in the cluster circles, and includes the options no geographical overlap, and no cluster centers in
other cluster Pfeiffer et al. 2008.
2.4.3 Spatial Modeling