nodes are typically singleton vertices at which the response rate is a local
maximum; the root node consists of all vertices in the abstract graph.
Finding the connected components for an upper level set is essentially the issue of determining the transitive closure of the adjacency relation defined by the
edges of the graph. Several generic algorithms are available in the computer science.
Figure 3 A confidence set of hotspots on the ULS tree. The different connected components correspond to different hotspot loci while the nodes within
a connected component correspond to different delineations of that hotspot.
2.8. Bernoulli Models
According to Kulldorff 1997, let X denote a spatial cluster process where X
A
is the random number of clusters in the set . As the window, moves over
the study area it defines a collection of zones Interchangeably, Z will be
used to denote both a subset of G and a set of parameters defining the zones. For the Bernoulli model we consider only measures N such that N
A
is an integer for all subsets
. Each unit of measures corresponds to an ‘entity’ or ‘individual’ that could be either one of two states. Individuals in one of these
states are considered as clusters and the location of those individuals constitute the cluster process. In the model there is exactly one zone
such that each individual within that zone has probability
1
of being a cluster, while the probability for individuals outside the cluster is
2
. The probability for any one individual is independent from all the others. The null hypothesis is H
:
1
=
2
MLE Junction Node
Alternative Hotspot Locus
Alternative Hotspot Delineation
Tessellated Region R
MLE Junction Node
Alternative Hotspot Locus
Alternative Hotspot Delineation
Tessellated Region R
and the alternative hypothesis is H
1
:
1 2
, Z
∈
. Under H , X
A
~ BinN
A
,
1
∀
A. Under H
1
, X
A
~ BinN
A
,
1
∀
A
⊂
Z, and X
A
~ BinN
A
,
2
∀
A
⊂
Z
c
. Let n
z
denote the observed number of cases in zone Z and n
G
is the total number of cases. N
Z
is the total population in zone Z while N
G
is the total population. Hence the likelihood function for the Bernoulli model is expressed as
To detect the zone that is most likely to be a cluster, we find the zone that maximizes the likelihood function. We do this in two steps. First we maximize the
likelihood function conditioned on Z.
+,
-
.
-
+ ,
.
-
+ ,
1
,
-
.
1
.
-
2 + ,
1
,
-
.
1
.
-
3456 ,
-
.
-
7 ,
1
,
-
.
1
.
-
+ ,
1
.
1
+ .
1
,
1
.
1
8 9:;=? A
Next we find the solution
B
C B D E. We are also interested in making statistical inference.
F5G
H I
J K J
L
K and the likelihood ratio test
statistic can be written as
M
N -O N
P
N -
QRQ STU
QVQ STU
N -
In order to find the value of the statistic test, we need a way to calculate the likelihood ratio as it is maximized over the collection of cluster in the alternative
hypothesis. This might seem like a daunting task since the number of cluster could easily be infinite. Two properties allows us to reduce it to a finite problem. The
number of observed clusters is always finite and for a fixed number of clusters the likelihood decreases as the measure of the moving window increases.
2.9. Monte Carlo-Based Hypothesis Testing