ULS Scan Statistic Theoretical Study

39 expected number of cases given the null hypothesis and 0 otherwise Kulldorff 2010. The process of scan statistics hotspot detection is shown by Figure 13. The radius of a circle becomes larger if a center of a closest district combines to that circle, then a new circle revealed. In each district combining, the radius of circle, the number of cases, and the population size are bigger. A circle with the highest likelihood ratio value is identified as a potential hotspot. An associated p-value based on Monte Carlo simulations is computed. For each simulation under null hypothesis, the likelihood ratio statistic is computed and the actual value is compared with the set of simulated values to find significance probability. P-value is determined as follows, say the simulation under null hypothesis is built for x = 9999 times. If the rank of likelihood ratio value of a circle from actual data is 400 then its p-value is 400x+1 = 0.04. The method produces a set of hotspot, the relative risk of event for the different hotspot, and a corresponding p-value for each hotspot based on the Monte-Carlo simulations. Districts or cells are identified as cluster hotspot if they are associated with a cluster with p-value less than 0.05 Aamodt, Samuelsen, Skrondal 2006. Figure 13 A part of circle based hotspot detection process

3.2.4 ULS Scan Statistic

ULS Scan Statistic is another method to detect hotspot of an interest on study area. The method works as follows: the spatial scan statistic seeks to identify “hotspots” or “clusters” of cells that have an elevated response compared with the rest of region. Elevated response means large values for the rates or intensities, 40 , where Y a is the number of cases in cell a. For the Poisson model, A a is the size perhaps area or some adjusted population size of cell a, and Y a is a realization of a Poisson process with intensity a across the cell. In each setting, the responses Y a. are independent; it is assumed that spatial variability can be accounted by cell- to-cell variation in the model parameters. Statistically, it is written as Y a ~ Poisson a A a , where A a is a positive real number and a 0 is an unknown parameter attached to cell a. Several geometric properties should be satisfied by a collection of cells from tessellation or study area before it could be considered as a candidate for a hotspot or cluster Taillie Patil, 2004. 1 The union of the cells should comprise a geographically connected subset of the region R. Such collections of cells will be referred to as zones and the set of all zones is denoted by . A zone Z  , is a collection of cells that are connected. 2 The zone should not be excessively large. This restriction is generally achieved by limiting the search for hotspot to zones that do not comprise more than fifty percent of the region. The previous traditional spatial scan statistic uses expanding circles to determine a reduced list  of candidate zones Z. By their very construction, these candidate zones tend to be compact in shape and may do a poor job of approximating actual clusters. The circular scan statistic has a reduce parameter space that is determined entirely by the geometry of the tessellation and does not involve the data in any way. The ULS scan statistic takes an adaptive point of view in which  depends very much upon the data. The adjusted rates define a piece- wise constant surface over the tessellation, and the reduce parameter space  =  ULS consists of all connected components of all upper level sets ULS of this surface. The cardinality of  ULS does not exceed the number of cells tessellation. Furthermore,  ULS has the structure of a tree under set inclusion, which is useful for visualization purposes and for expressing uncertainty of cluster determination 41 in the form of a hotspot confidence set on the tree. Since  ULS is data-dependent, this reduced parameter space must be recomputed for each replicate data set when simulating null distributions Patil and Taillie 2004. Special features of the ULS connectivity problem permit enhance efficiency. Cell adjacency is represented by a zero-one adjacency matrix M whose rows and columns are labeled with the cells of the tessellation. Entry M ab equals 1 if cells a and b are the same or are adjacent in the tessellation. Otherwise, M ab vanishes or equals 0. The cells rows and columns label are arranged in order of decreasing intensity so that the adjacency matrix for any upper level set is a square submatrix in the northwest corner of the full adjacency matrix M. This reordering of the rows and columns of M is the only data dependent part of the algorithm. As the level drops, cells are added one after another and one has to keep track of how the connectivity changes with each addition of a cell Patil and Taillie 2004. Figure 14 is a map and adjacent matrix with decreasing intensity. 10 11 8 3 4 1 2 7 9 12 5 6 10 1 1 1 1 1 0.160 11 1 1 1 1 1 0.133 8 1 1 1 1 1 1 1 0.120 3 1 1 1 1 1 0.084 4 1 1 1 1 1 1 1 0.082 1 1 1 1 1 1 0.081 1 1 1 0.077 2 1 1 1 1 1 1 0.070 7 1 1 1 1 1 0.069 9 1 1 1 1 0.052 12 1 1 0.048 5 1 1 1 1 0.047 6 1 1 1 1 1 0.043 Figure 14 A map and its adjacent matrix 42 The steps of ULS hotspot detection is shown by Figure 15 based on adjacent matrix in Figure 14. Part a, b, c at Figure 15 show combination process of hotspot from highest intensity written in column G a . Combination process continue to map d, the new cell cell 3 is not adjacent to the earlier hotspot, it means there are two hotspots. These two hotspots are combined due to a new detected cell with high intensity. Finally the last hotspot is adjacent shown by map e as the result of ULS hotspot detection. a b c d e cell hotspot detection process 10 0.160 a {10} 11 0.133 b {10,11} 8 0.120 c {10,11,8} 3 0.084 d {10,11,8} {3} 4 0.082 e {10, 11, 8, 3, 4} Figure 15 ULS hotspot detection process dark color is the hotspot

3.3 The Methods