Hotspot Detection Method Scan Statistics Satscan

categories, which are caused by repeated or cyclical factors and temporary or unpredicted factors.

2.4. Hotspot

Hotspot is defined as something unusual, anomaly, aberration, outbreak, elevated cluster, or critical area Patil and Taillie 2004. Meanwhile according to Harran e.t. all 2006 hotspots are locations or regions that have consistently high levels of occurrences such as the total amount of poor, unemployed, or people that suffer from food scarcity and may have characteristics unlike those of surrounding areas. Hotspot clusters were generated by setting the relative risk in some areas to be larger than one and Song and Kulldorff 2003. Furthermore a poverty hotspot represents an area characterized by certain local characteristics which could also expand and affect other neighbouring areas Betti et. all 2006.

2.5. Hotspot Detection Method

Hotspot detection method contains three components, which include a identifying hotspot candidate, b evaluating the statistical significant hotspot, and c estimate the covariance related with hotspot. In Indonesia nowadays, the most recent method used to identify a candidate hotspot is spatial scan statistics. In Bungsu 2006 it is stated that spatial scan statistics suffers from several limitations, such as the circles that have been used for the scanning window caused low power for detection of arbitrarily shaped cluster. Hence Upper Level Satcan ULS will be used as a comparison to detect arbitrarily shaped hotspots. Likelihood ratio, relative risk, and hypothesis testing based on montecarlo simulation are techniques used to evaluate a candidate hotspot.

2.6. Scan Statistics Satscan

Scan statistic is a statistical method used to detect clusters in a cluster process. Spatial scan statistic is used to determine whether a spatial cluster process contains a localized cluster of points somewhere in a region of interest. The spatial scan statistic deals with the following situation. A region R of euclidian space is subdivided into cells defined denote by A. Data are available in the form of a count on each cell A. In addition, A size value P A is associated with each cell. The cell sizes P A are assumed to be known and fixed, while the cell counts N A are independent random variables. The spatial scan statistic seeks to identify clusters of cells that have an elevated response compared with the rest of the region. Elevated response means large values for the rates, r A = N A P A , instead of for the raw counts N A . Cell counts are thus adjusted for cell sizes before comparing cell responses. Kulldorf 1997 presented the following algorithm for a circular window of fixed diameter d on a homogeneous PoissonBernoulli assuming homogeneous variance process: 1. Pick a grid point. Calculate the distance to the different population points and sort those in increasing order. Memorize the sorted population points in an array 2. Repeat step 1 for each grid point 3. Pick a grid point 4. Create a circle cantered at the grid point and continuously increase the radius. For each population entering the circle, update the number of cases n and measure the population N W inside the circular area W 5. Repeat step 3 and 4 for each grid point. Report the largest likelihood based on all n, N W pairs as the scan statistics, where the likelihood is calculated according to equation 6. Repeat steps 3 to 5 for each monte carlo replication The relative risk is a non-negative number, representing how much more common a case is in the location and time period compared to the baseline. Setting a value of one is equivalent of not doing any adjustments and a value of less than one to adjust for lower risk A value of greater than one is used to adjust for an increased risk. A cluster with a relative risk RR value greater than one is defined as a candidate of hotspot. A relative risk of zero is used to adjust for missing data for that particular time and location Kulldorff 2006. The relative risk is calculated by Kulldorff 2006 c E n RR Z = where z n is the number of observed cases, and c E is the expected number of cases in a location which is calculated by × = P C p c E where p is the number of population in the cluster of interest, while C and P are the total number of cases and total number of population. Available scan statistic software is known to have several limitations. First, circles have been used for the scanning window, resulting in low power for detection of irregularly shaped clusters Figure 1. Second, the response variable has been defined on the cells of a tessellated geographic region, preventing application to responses defined on a network stream network, water distribution system, highway system, etc.. Third, response distributions have been taken as discrete specifically, binomial or Poisson. Finally, the traditional scan statistic gives only a cluster estimate for the hotspot but does not attempt to assess estimation uncertainty Patil 2006 Figure 1 Scan statistic zonation for circles left and space-time cylinders right

2.7. Upper Level Set ULS Scan Statistics