Monte Carlo-Based Hypothesis Testing

and the alternative hypothesis is H 1 : 1 2 , Z ∈ . Under H , X A ~ BinN A , 1 ∀ A. Under H 1 , X A ~ BinN A , 1 ∀ A ⊂ Z, and X A ~ BinN A , 2 ∀ A ⊂ Z c . Let n z denote the observed number of cases in zone Z and n G is the total number of cases. N Z is the total population in zone Z while N G is the total population. Hence the likelihood function for the Bernoulli model is expressed as To detect the zone that is most likely to be a cluster, we find the zone that maximizes the likelihood function. We do this in two steps. First we maximize the likelihood function conditioned on Z. +, - . - + , . - + , 1 , - . 1 . - 2 + , 1 , - . 1 . - 3456 , - . - 7 , 1 , - . 1 . - + , 1 . 1 + . 1 , 1 . 1 8 9:;=? A Next we find the solution B C B D E. We are also interested in making statistical inference. F5G H I J K J L K and the likelihood ratio test statistic can be written as M N -O N P N - QRQ STU QVQ STU N - In order to find the value of the statistic test, we need a way to calculate the likelihood ratio as it is maximized over the collection of cluster in the alternative hypothesis. This might seem like a daunting task since the number of cluster could easily be infinite. Two properties allows us to reduce it to a finite problem. The number of observed clusters is always finite and for a fixed number of clusters the likelihood decreases as the measure of the moving window increases.

2.9. Monte Carlo-Based Hypothesis Testing

Simulation is analytical method meant to imitate a real-life system, especially when other analyzes are too mathematically complex or too difficult to reproduce adithmc02 2008. Monte Carlo simulation can be defined as a method to generate random sample data based on some known distribution for numerical experiments Teknomo 2008. Once the value of the test statistic has been calculated, it easy to do the inference. We can’t expect to find the distribution of the test statistic in closed analytical form. Instead we rely on Monte Carlo-Based hypothesis testing to test the hypothesis. With a Monte Carlo test, the significance of an observed test statistic calculated from a set data is assessed by comparing it with a distribution obtained by generating alternative sets of data from some assumed model. If the assumed model implies that all data orderings are equally likely then this amounts to a randomization distribution. In Kulldorff 1997 it was known that Monte Carlo-based hypothesis testing was proposed by Dwass 1957, who pointed out that probability of falsely rejecting the null hypothesis is exactly according to the significance level, in spite of the simulation involved. Mantel 1967 proposed its use in terms of spatial clusters processes, while Turnbull et. all 1990 was the first to use in the context of a multidimensional scan statistic. Monte Carlo hypothesis testing for a scan statistic is a four-step procedure: 1. Calculate the value of the test statistic for the real data. 2. Create a large number of random data sets generated under the null hypothesis. 3. Calculate the value of the test statistic for each of the random replications. 4. Sort the values of the test statistic from the real and random data sets , and note the rank of the one calculated from the real data sets. If it is ranked in the highest percent, then reject the null hypothesis at percent significance level. For example, when we condition on the total number of clusters n R . with 9999 such replications, the test is significant at the 5 percent level of if the value of the test statistic for the real data sets is among the 500 highest values of the test statistic coming from the replications. The p-value is obtained through Monte Carlo hypothesis testing Kulldorf 1997 by comparing the rank of the maximum likelihood from the real data sets with the maximum likelihood from the random data sets. If this rank is R, then p- value = R 1+ number of simulation.

2.10. Hotspot Evaluation