Exploratory Analysis Methods for Spatial Analysis of Animal Disease

are inverse distance weighting IDW, kernel smoothing, and kriging Allepuz 2008.

2.4.2 Exploratory Analysis

Exploratory analysis has the specific objective of using a statistical hypothesis – testing framework for the identification of spatial clusters of disease. The term clusters refers to locations at which disease occurrence is higher or lower than would have been expected if disease were randomly distributed in space. The statistical methods can be grouped into global and local statistics depending on whether they generate a single statistic for the whole area or statistics for individual locations within that area Durr and Gatrell 2004. Clustering of a disease can occur for a variety of reasons including the infectious spread of disease, the occurrence of disease vectors in specific locations, the clustering of a risk factor or combination of risk factors, or the existence of potential health hazards such as localized pollution sources scattered throughout a region, each creating an increased risk of disease in its immediate vicinity. The investigation of possible disease clustering is fundamental to epidemiology, with one of the aims being to determine whether the clustering is statistically significant and worthy or further investigation, or whether it is likely to be a change occurrence, or is simply a reflection of the distribution of the population at risk Pfeiffer et al. 2008. Statistical tests applied to the detection of spatial clusters can be either global cluster detection tests, were a summary statistic identifies whether or not clustering is present in a region under investigation, or local cluster detection tests which seek to define the spatial location of clusters within a given region. The cluster detection tests that are available differ in that some use complete population counts to characterize the population at risk whereas other use a sample of controls Stevenson 2009. Global cluster detection tests can be done by using Moran’s I. Moran’s I statistic gives a formal indication of the degree of linear association between a vector of observed values and a weighted average of its neighbouring values Stevenson 2009. Moran’s I is approximately normally distributed and has a expected value of – 1N – 1 where N equals the number of area units within a study region, when no correlation exists between neighbouring values. The expected value of the coefficient therefore approaches zero as N increases. A Moran’s I of zero indicates the null hypothesis of no clustering, a positive Moran’s I indicates positive spatial autocorrelation i.e. clustering of areas of similar attribute values, while negative coefficient indicates negative spatial autocorrelation i.e. that neighbouring areas tend to have dissimilar attribute values Pfeiffer et al. 2008. The Moran’s I statistic is calculated as follow: Moran’s I = Where: n: the number of polygons in the study area w ij : the values of the spatial proximity matrix y i : the attribute under investigation ǔWKHPHDQRIWKHDWWULEXWHXQGHULQYHVWLJDWLRQ Local cluster detection test can be conducted by Kulldorff’s spatial scan statistic. Spatial scan statistic uses a likelihood ratio test for the number of cases found in the study region population the null hypothesis to a model that has different disease risk depending on being inside or outside a circular zone Stevenson 2009. The test can be used for spatially aggregated data as well as when the exact geographic coordinates are known for each individual. Therefore it can be used for lattice or point spatial data Allepuz 2008. When data is aggregated into census districts the measure will be concentrated at the central coordinates of those districts. The scan statistic is commonly used to test if a one dimensional point process is purely random, or if any clusters can be detected by using a variable circular window Kulldorf 1997. The number of cases is compared to the background population data and the expected number of cases in each unit is proportional to the size of the population at risk. Circle centers are defined either by the case and controlpopulation data or by specifying an array of grid coordinates. Secondary clusters are computed, based on the degree of overlap allowed in the cluster circles, and includes the options no geographical overlap, and no cluster centers in other cluster Pfeiffer et al. 2008.

2.4.3 Spatial Modeling