39 expected number of cases given the null hypothesis and 0 otherwise Kulldorff
2010. The process of scan statistics hotspot detection is shown by Figure 13. The radius of a circle becomes larger if a center of a closest district combines to that
circle, then a new circle revealed. In each district combining, the radius of circle, the number of cases, and the population size are bigger. A circle with the highest
likelihood ratio value is identified as a potential hotspot. An associated p-value based on Monte Carlo simulations is computed. For each simulation under null
hypothesis, the likelihood ratio statistic is computed and the actual value is compared with the set of simulated values to find significance probability. P-value
is determined as follows, say the simulation under null hypothesis is built for x = 9999 times. If the rank of likelihood ratio value of a circle from actual data is 400
then its p-value is 400x+1 = 0.04. The method produces a set of hotspot, the relative risk of event for the different hotspot, and a corresponding p-value for
each hotspot based on the Monte-Carlo simulations. Districts or cells are identified as cluster hotspot if they are associated with a cluster with p-value less than 0.05
Aamodt, Samuelsen, Skrondal 2006.
Figure 13 A part of circle based hotspot detection process
3.2.4 ULS Scan Statistic
ULS Scan Statistic is another method to detect hotspot of an interest on study area. The method works as follows: the spatial scan statistic seeks to identify
“hotspots” or “clusters” of cells that have an elevated response compared with the rest of region. Elevated response means large values for the rates or intensities,
40 ,
where Y
a
is the number of cases in cell a. For the Poisson model, A
a
is the size perhaps area or some adjusted population size of cell a, and Y
a
is a realization of a Poisson process with intensity
a
across the cell. In each setting, the responses Y
a.
are independent; it is assumed that spatial variability can be accounted by cell- to-cell variation in the model parameters. Statistically, it is written as Y
a
~ Poisson
a
A
a
, where A
a
is a positive real number and
a
0 is an unknown parameter attached to cell a.
Several geometric properties should be satisfied by a collection of cells from tessellation or study area before it could be considered as a candidate for a hotspot
or cluster Taillie Patil, 2004. 1 The union of the cells should comprise a geographically connected subset
of the region R. Such collections of cells will be referred to as zones and the set of all zones is denoted by
. A zone Z , is a collection of cells that are connected.
2 The zone should not be excessively large. This restriction is generally achieved by limiting the search for hotspot to zones that do not comprise
more than fifty percent of the region.
The previous traditional spatial scan statistic uses expanding circles to determine a reduced list
of candidate zones Z. By their very construction, these
candidate zones tend to be compact in shape and may do a poor job of approximating actual clusters. The circular scan statistic has a reduce parameter
space that is determined entirely by the geometry of the tessellation and does not involve the data in any way. The ULS scan statistic takes an adaptive point of view
in which
depends very much upon the data. The adjusted rates define a piece- wise constant surface over the tessellation, and the reduce parameter space
=
ULS
consists of all connected components of all upper level sets ULS of this surface. The cardinality of
ULS
does not exceed the number of cells tessellation. Furthermore,
ULS
has the structure of a tree under set inclusion, which is useful for visualization purposes and for expressing uncertainty of cluster determination
41 in the form of a hotspot confidence set on the tree. Since
ULS
is data-dependent, this reduced parameter space must be recomputed for each replicate data set when
simulating null distributions Patil and Taillie 2004. Special features of the ULS connectivity problem permit enhance efficiency.
Cell adjacency is represented by a zero-one adjacency matrix M whose rows and columns are labeled with the cells of the tessellation. Entry M
ab
equals 1 if cells a
and b are the same or are adjacent in the tessellation. Otherwise, M
ab
vanishes or equals 0. The cells rows and columns label are arranged in order of decreasing
intensity so that the adjacency matrix for any upper level set is a
square submatrix in the northwest corner of the full adjacency matrix M. This
reordering of the rows and columns of M is the only data dependent part of the algorithm. As the level drops, cells are added one after another and one has to keep
track of how the connectivity changes with each addition of a cell Patil and Taillie 2004. Figure 14 is a map and adjacent matrix with decreasing intensity.
10 11
8 3
4 1
2 7
9 12
5 6
10 1
1 1
1 1
0.160 11
1 1
1 1
1 0.133
8 1
1 1
1 1
1 1
0.120 3
1 1
1 1
1 0.084
4 1
1 1
1 1
1 1
0.082 1
1 1
1 1
1 0.081
1 1
1 0.077
2 1
1 1
1 1
1 0.070
7 1
1 1
1 1
0.069 9
1 1
1 1
0.052 12
1 1
0.048 5
1 1
1 1
0.047 6
1 1
1 1
1 0.043
Figure 14 A map and its adjacent matrix
42 The steps of ULS hotspot detection is shown by Figure 15 based on adjacent
matrix in Figure 14. Part a, b, c at Figure 15 show combination process of hotspot from highest intensity written in column G
a
. Combination process continue to map d, the new cell cell 3 is not adjacent to the earlier hotspot, it means there
are two hotspots. These two hotspots are combined due to a new detected cell with high intensity. Finally the last hotspot is adjacent shown by map e as the result of
ULS hotspot detection.
a b
c
d e
cell hotspot detection process
10 0.160
a {10}
11 0.133
b {10,11}
8 0.120
c {10,11,8}
3 0.084
d {10,11,8} {3}
4 0.082
e {10, 11, 8, 3, 4}
Figure 15 ULS hotspot detection process dark color is the hotspot
3.3 The Methods