and the alternative hypothesis is H
1
:
1 2
, Z
∈
. Under H , X
A
~ BinN
A
,
1
∀
A. Under H
1
, X
A
~ BinN
A
,
1
∀
A
⊂
Z, and X
A
~ BinN
A
,
2
∀
A
⊂
Z
c
. Let n
z
denote the observed number of cases in zone Z and n
G
is the total number of cases. N
Z
is the total population in zone Z while N
G
is the total population. Hence the likelihood function for the Bernoulli model is expressed as
To detect the zone that is most likely to be a cluster, we find the zone that maximizes the likelihood function. We do this in two steps. First we maximize the
likelihood function conditioned on Z.
+,
-
.
-
+ ,
.
-
+ ,
1
,
-
.
1
.
-
2 + ,
1
,
-
.
1
.
-
3456 ,
-
.
-
7 ,
1
,
-
.
1
.
-
+ ,
1
.
1
+ .
1
,
1
.
1
8 9:;=? A
Next we find the solution
B
C B D E. We are also interested in making statistical inference.
F5G
H I
J K J
L
K and the likelihood ratio test
statistic can be written as
M
N -O N
P
N -
QRQ STU
QVQ STU
N -
In order to find the value of the statistic test, we need a way to calculate the likelihood ratio as it is maximized over the collection of cluster in the alternative
hypothesis. This might seem like a daunting task since the number of cluster could easily be infinite. Two properties allows us to reduce it to a finite problem. The
number of observed clusters is always finite and for a fixed number of clusters the likelihood decreases as the measure of the moving window increases.
2.9. Monte Carlo-Based Hypothesis Testing
Simulation is analytical method meant to imitate a real-life system, especially when other analyzes are too mathematically complex or too difficult to
reproduce adithmc02 2008. Monte Carlo simulation can be defined as a method to generate random sample data based on some known distribution for numerical
experiments Teknomo 2008. Once the value of the test statistic has been calculated, it easy to do the inference. We can’t expect to find the distribution of
the test statistic in closed analytical form. Instead we rely on Monte Carlo-Based hypothesis testing to test the hypothesis.
With a Monte Carlo test, the significance of an observed test statistic calculated from a set data is assessed by comparing it with a distribution obtained
by generating alternative sets of data from some assumed model. If the assumed model implies that all data orderings are equally likely then this amounts to a
randomization distribution. In Kulldorff 1997 it was known that Monte Carlo-based hypothesis testing
was proposed by Dwass 1957, who pointed out that probability of falsely rejecting the null hypothesis is exactly according to the significance level, in spite
of the simulation involved. Mantel 1967 proposed its use in terms of spatial clusters processes, while Turnbull et. all 1990 was the first to use in the context
of a multidimensional scan statistic. Monte Carlo hypothesis testing for a scan statistic is a four-step procedure:
1. Calculate the value of the test statistic for the real data.
2. Create a large number of random data sets generated under the null
hypothesis. 3.
Calculate the value of the test statistic for each of the random replications. 4.
Sort the values of the test statistic from the real and random data sets , and note the rank of the one calculated from the real data sets. If it is ranked in the
highest percent, then reject the null hypothesis at percent significance level.
For example, when we condition on the total number of clusters n
R
. with 9999 such replications, the test is significant at the 5 percent level of if the value
of the test statistic for the real data sets is among the 500 highest values of the test statistic coming from the replications.
The p-value is obtained through Monte Carlo hypothesis testing Kulldorf 1997 by comparing the rank of the maximum likelihood from the real data sets
with the maximum likelihood from the random data sets. If this rank is R, then p- value = R 1+ number of simulation.
2.10. Hotspot Evaluation