Bernoulli Models LITERATURE REVIEW

nodes are typically singleton vertices at which the response rate is a local maximum; the root node consists of all vertices in the abstract graph. Finding the connected components for an upper level set is essentially the issue of determining the transitive closure of the adjacency relation defined by the edges of the graph. Several generic algorithms are available in the computer science. Figure 3 A confidence set of hotspots on the ULS tree. The different connected components correspond to different hotspot loci while the nodes within a connected component correspond to different delineations of that hotspot.

2.8. Bernoulli Models

According to Kulldorff 1997, let X denote a spatial cluster process where X A is the random number of clusters in the set . As the window, moves over the study area it defines a collection of zones Interchangeably, Z will be used to denote both a subset of G and a set of parameters defining the zones. For the Bernoulli model we consider only measures N such that N A is an integer for all subsets . Each unit of measures corresponds to an ‘entity’ or ‘individual’ that could be either one of two states. Individuals in one of these states are considered as clusters and the location of those individuals constitute the cluster process. In the model there is exactly one zone such that each individual within that zone has probability 1 of being a cluster, while the probability for individuals outside the cluster is 2 . The probability for any one individual is independent from all the others. The null hypothesis is H : 1 = 2 MLE Junction Node Alternative Hotspot Locus Alternative Hotspot Delineation Tessellated Region R MLE Junction Node Alternative Hotspot Locus Alternative Hotspot Delineation Tessellated Region R and the alternative hypothesis is H 1 : 1 2 , Z ∈ . Under H , X A ~ BinN A , 1 ∀ A. Under H 1 , X A ~ BinN A , 1 ∀ A ⊂ Z, and X A ~ BinN A , 2 ∀ A ⊂ Z c . Let n z denote the observed number of cases in zone Z and n G is the total number of cases. N Z is the total population in zone Z while N G is the total population. Hence the likelihood function for the Bernoulli model is expressed as To detect the zone that is most likely to be a cluster, we find the zone that maximizes the likelihood function. We do this in two steps. First we maximize the likelihood function conditioned on Z. +, - . - + , . - + , 1 , - . 1 . - 2 + , 1 , - . 1 . - 3456 , - . - 7 , 1 , - . 1 . - + , 1 . 1 + . 1 , 1 . 1 8 9:;=? A Next we find the solution B C B D E. We are also interested in making statistical inference. F5G H I J K J L K and the likelihood ratio test statistic can be written as M N -O N P N - QRQ STU QVQ STU N - In order to find the value of the statistic test, we need a way to calculate the likelihood ratio as it is maximized over the collection of cluster in the alternative hypothesis. This might seem like a daunting task since the number of cluster could easily be infinite. Two properties allows us to reduce it to a finite problem. The number of observed clusters is always finite and for a fixed number of clusters the likelihood decreases as the measure of the moving window increases.

2.9. Monte Carlo-Based Hypothesis Testing