Pemodelan Celular Automata Untuk Visualisasi Dan Prediksi Pola Penyebaran Penyakit Demam Berdarah.
                                                                                A CELLULAR AUTOMATA MODELING FOR VISUALIZING
AND PREDICTING SPREADING PATTERNS
OF DENGUE FEVER
PUSPA EOSINA HOSEN
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2015
STATEMENT OF THESIS AND SOURCES OF
INFORMATION AND DEVOLUTION COPYRIGHT
Hereby, I state that the thesis entitled “A Cellular Automata Modeling for
Visualizing and Predicting Spreading Patterns of Dengue Fever” is my own work
and to the best of my knowledge, under supervision by Dr Eng Taufik Djatna,
STP MSi and Helda Khusun, STP MSc PhD. It has never previously been
published in any university. All of incorporated originated from other published as
well as unpublished papers are stated clearly in the texts as well as in the
references.
Hereby, I devolve the copyright of my thesis to Bogor Agricultural
University.
Bogor, August 2015
Puspa Eosina Hosen
Student ID G651130421
SUMMARY
PUSPA EOSINA HOSEN. A Cellular Automata Modeling for Visualizing and
Predicting Spreading Patterns of Dengue Fever. Supervised by TAUFIK
DJATNA and HELDA KHUSUN.
Modeling is a simplification of a real problem, aiming to study and
understand the phenomena in the real world. In epidemiology, system modeling
approach is commonly used for viewing the epidemic process. Moreover,
visualization is required as the first step in epidemiological analysis to understand
the spatial characteristics of a dataset, identifying the epidemiology of disease
pattern in a given geographical area, predicting the spreading pattern of disease in
the next period. Unfortunately, Ordinary Differential Equation (ODE) or statistical
models the most common method used in epidemiological analysis are unable to
elaborate spatial patterns and interactions such as in visualization and prediction
of spreading disease. This limitation could be overcome by using Cellular
Automata (CA) model.
The use of CA models in many problems such as in epidemic process
analysis and spatiotemporal pattern analysis showed the powerful of CA in
solving the problem that related to the spatial pattern. CA is one of the dynamic
system approaches that implementing discretization of time and space. CA
consists of cells, called cellular space, a local connection to other cells, and
boundary conditions. Each cell, representing a state, could change at every timestep using local transmission rules which generate a new state based on the
previous state of the cell and its neighborhood. Therefore, the concept of
neighborhood is very important. The other important aspect that determines the
accuracy of CA model is the trasmition rule f. This rule is able to be represented
as a deterministic or probabilistic function.
In this research, we proposed a new approach in developing a spreading
pattern model of Dengue Hemorrhagic Fever (DHF) based on CA. We especially
focused on determining a probabilistic function using Hidden Markov Model
(HMM) which has not been used by researchers yet. HMM is a probabilistic
model that is suitable for solving the problem related to the sequential-temporal
data. To show the effectiveness of the proposed model, we implemented this
approach to the DHF case. We used dataset from a limited area such as West
Bogor in the period of 2013 and defined the state criteria from these dataset.
Moreover, we only considered an infective state which was dedicated particular
attention to the spatial distribution of infected areas. The evaluation was
conducted by comparing the results of data simulation of the proposed model to
that of one yielded by the Susceptible-Infected-Recovered (SIR) model, as a
classical approach. The evaluation result showed that the CA model was capable
of generating patterns that similar to the patterns generated by SIR models with a
similarities value of 0.95.
Keywords: Cellular Automata, Dengue Fever, HMM, Neighborhood, SIR
RINGKASAN
PUSPA EOSINA HOSEN. Pemodelan Celular Automata untuk Visualisasi dan
Prediksi Pola Penyebaran Penyakit Demam Berdarah. Supervised by TAUFIK
DJATNA and HELDA KHUSUN.
Pemodelan adalah penyederhaan dari sebuah masalah atau fenomena di
dunia nyata yang bertujuan untuk mempelajari dan memahaminya. Pada
epidemiologi, pemodelan pada umumnya digunakan untuk melihat proses
epidemik. Oleh karena itu, visualisasi diperlukan sebagai langkah awal, antara lain
pada analisis epidemiologi untuk memahami karakter spasial dari dataset,
mengidentifikasi pola penyebaran penyakit pada area geografis tertentu, dan
memprediksi pola penyebaran penyakit pada periode selanjutnya. Selama ini,
Ordinary Diferential Equation (ODE) atau statistik, sebagai model yang paling
umum digunakan pada analisis epidemiologi, tidak dapat mengelaborasi proses
dari pola spasial dan interaksinya, seperti pada kasus visualisasi dan prediksi
penyebaran penyakit. Model Cellular Automata (CA) diperkenalkan sebagai
model yang dapat mengatasi keterbatasan dari model ODE dan statistik tersebut.
Penggunaan model CA pada banyak kasus, seperti analisis proses
epidemik dan analisis pola spatio-temporal memperlihatkan CA cukup baik dalam
memecahkan permalahan yang berkaitan dengan pola spasial. Model CA adalah
pendekatan sistem dinamik yang mengimplementasikan konsep diskritasi ruang
dan waktu. Model ini terdiri atas sel-sel yang disebar pada ruang selular, sebuah
koneksi lokal yang menghubungkan sel yang satu dengan sel yang lain, serta
kondisi batas. Setiap sel berada pada nilai state tertentu yang dapat berubah ke
nilai state lain setiap waktu. Perubahan state dipengaruhi oleh state sel tersebut
dan lingkungannya (neighborhood) pada periode sebelumnya. Oleh karena itu,
konsep neighborhood menjadi penting. Aspek penting lainnya yang menentukan
akurasi model CA adalah aturan trasmisi perubahan state untuk setiap sel yang
merupakan sebuah fungsi yang bersifat probabilistik.
Pada penelitian ini, diusulkan suatu pendekatan dalam mengembangkan
model penyebaran penyakit Demam Berdarah Dengue (DBD) berbasis CA.
Penelitian ini difokuskan pada penentuan fungsi probabilisitik, sebagai CA rule,
menggunakan pendekatan Hidden Markov Model (HMM) yang belum pernah
digunakan oleh peneliti-peneliti sebelumnya untuk model CA. HMM adalah
model probabilistik yang sesuai untuk memecahkan permasalahan data sekuensial
temporal. Untuk memperlihatkan efektivitas model yang diusulkan, pendekatan
ini diimplementasikan pada kasus DBD di wilayah Bogor Barat tahun 2013. Dari
dataset ini didefinisikan beberapa kriteria state untuk memodelkan proses spasial
kondisi terinfeksi (infective state) kelurahan-kelurahan di Bogor Barat. Evaluasi
dilakukan dengan membandingkan hasil simulasi data dari model yang diusulkan
terhadap hasil simulasi data yang diperoleh dari model Susceptible-InfectedRecovered (SIR), yaitu salah satu pendekatan klasik yang paling sering digunakan
dalam bidang epidemiologi karena sudah diakui tingkat keakuratannya. Hasil
evaluasi memperlihatkan bahwa model CA dapat menghasilkan pola yang serupa
dengan pola yang dihasilkan model SIR dengan nilai similaritas sebesar 0.95.
Kata kunci: Cellular Automata, Dengue Fever, HMM, Neighborhood, SIR.
© Copyright of This Thesis Belongs to Bogor Agricultural
University (IPB), 2015
All Rights Reserved
Prohibited citing in part or whole of this paper without include or citing sources.
The quotation is only for educational purposes, research, scientific writing, report
writing, criticism, or review of a problem; and citations are not detrimental to the
interests of IPB.
Prohibited announced and reproduce part or all of this publication in any form
without permission IPB.
A CELLULAR AUTOMATA MODELING FOR VISUALIZING
AND PREDICTING SPREADING PATTERNS
OF DENGUE FEVER
PUSPA EOSINA HOSEN
Thesis
as partial fulfillment of the requirements for the degree of
Master of Computer Science
in
the Department of Computer Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2015
Non-committee examiner on thesis examination: Dr. Arya Kekalih, MTI
PREFACE
My deepest gratitude goes first and foremost to my supervisor Dr. Eng.
Taufik Djatna, S.Tp and Helda Khusun, S.Tp, M.Sc., PhD who provided me an
excellent working environment to finish my Master study. All I have learned from
them will become priceless treasure throughout my career. I wish to express my
gratitude to my thesis examiner, Dr. Aria Kekalih, M.T.I and Toto Haryanto,
S. Kom., M. Kom. as moderator in my final defence, for their inspiring and
invaluable advice.
I am thankful to all lectures and staff of Computer Science Department, all
my friends in Computer Science. I am lucky to have all of you and study in a
helpful environment, especially for Riva, Halimah, Husnul, Kana, Tengku Khairil,
Pungki, Syaif Usman, Luky, Irma, Peter, Akbar, Pizaini, Rake, and others who
help and encourage me. I would like to thank all members of Dr. Eng. Taufik
Djatna Lab (Laboratory Computer of Agro-industrial Technology Department),
Aisah, Hety, Novi, Zaki Hadi, Yogha, Rohmah, Yudishtira, Ikhsan and others. I
am glad to be a part of group members. Special thanks go to UIKA which
financially supported my research.
Last but not least, I would like to thank my beloved parents Jazib Hosen
alm. and Isnaniar, my lovely hushband Wisnu Ananta Kusuma, and my sixteenyears-old son Bara Samudra Syuhada and my brother and sisters Radian
Zarathustra, Jaziar Radianti, Ionia Veritawati, Dewi Ramadani, Farida
Candrasekar for their endless love and warm support all the way. Before all and
after all the man thanks should be to the Almighty God, Allah Subhanallahu Wa
ta’ala.
Hopefully this research would be useful.
Bogor, August 2015
Puspa Eosina Hosen
TABLE OF CONTENTS
LIST OF TABLES
vii
LIST OF FIGURES
vii
LIST OF APPENDIX
vii
GLOSARY
viii
1 INTRODUCTION
Background
Problem Statement
Objectives
Benefits
Boundaries
1
1
2
2
3
3
2 LITERATURE REVIEW
Epidemiology
Dengue Hemorrhagic Fever (DHF)
SIR Model
Model Cellular Automata (CA)
Composing Neighborhoods
Hidden Markov Model
4
4
4
5
5
7
7
3 METHODOLOGY
Research Framework
Defining a Spreading Pattern Model Based on CA Model
Defining a Cellular Space and Neighborhood
10
10
10
10
Defining a Set of States
12
Data and Model Construction
Collecting the Dataset
13
13
A Cellular Space Construction
14
Neighborhood
15
A Set of States
16
Finding a Probabilistic Function
Prediction of Pattern of the Disease Spread
Verification and Validation
4 RESULTS AND DISCUSSION
The Spreading Pattern Model base on CA
The Cellular Space
The Probabilistic Function as Rule on CA Model
Prediction Spreading Process of DHF using The Proposed Model
Evaluation
17
21
22
23
23
23
23
26
27
5 CONCLUSION AND RECOMMENDATION
Conclusion
Recommendation
27
27
27
REFERENCES
30
Appendix A List the variables/attributes of DHF factors
32
Appendix B. The interview result
33
Appendix C. Form of The table store of counter result of states changes
37
Appendix D List of state change affected by neighborhood that should be
counted
38
Appendix E The table store of counter result of states changes affected by
neighborhood
40
BIOGRAPHY
41
LIST OF TABLES
The Transition Probability values
The Emission Probability values
The Prior Probability values
1
2
3
4
5
6
7
8
9
10
11
12
13
8
9
9
Number of Dengue cases in West Bogor in 2013
13
The number of direct neighborhood of each region
State definition of infected area
List of the data construction
Transition Probabilities Matrix
Emission Probabilities Matrix
List of a state change
The cells representing the region
The number of state change based on the data cases DHF in West Bogor
The counter result of states changes affected by neighborhood
15
17
18
19
19
20
23
24
24
LIST OF FIGURE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The triangle cart of the epidemiology elements
SIR model
Two-dimensional cellular space
Neighborhood
The illustration of determining neighborhood technique
4
5
6
6
7
State Transition Diagram
Frame work of the research
8
11
The possibility of composing neighborhood
The map of regions in West Bogor
12
14
The proposed cellular space
Von Neumann neighborhood
The neighborhood frame
State Transition Diagram of proposed model
The illustration of algorithm of CA model
The Prediction Results of Dengue Spreading Pattern on CA model
The Tendency of Graph Number of Infected Area in Bogor Barat
14
16
16
19
21
27
27
LIST OF APPENDIX
A
B
C
D
E
List the variables/attributes of DHF factors
The survey forms
Form of the table store of counter result of states changes
List of state change affected by neighborhood that should be counted
The table store of counter result of states changes affected by neighborhood
30
33
37
38
40
GLOSARY
Aedes aegypti is mosquito that could spread dengue fever, chikungunya, and
yellow fever viruses, and other diseases
Emission probability is conditional distribution of observations given states
Epidemic is the rapid spread of infectious disease to a large number of people in a
given population within a short period of time, usually two weeks or less
Epidemiology is the science that studies the patterns, causes, and effects of health
and disease conditions in defined populations. It is the cornerstone of public
health, and informs policy decisions and evidence-based practice by
identifying risk factors for disease and targets for preventive healthcare.
Ergodic HMM is one for wich underlying Marcov chain is ergodic, or at least is
irreducible and admits a unique stationary distribution.
Neighborhood is is a geographically localized community within a larger city,
town, suburb or rural area
Spatiotemporal is the existing in both space and time.
Transition probability is the probabilities associated with various state changes
1 INTRODUCTION
Background
Modeling aims to study and understand the phenomena by simplifying of a
real problem. In epidemiology, system modeling approach is commonly used for
viewing the epidemic process (Cuesta 2013). Most of the models for epidemics
simulations are based on Ordinary Differential Equations (ODE) or statistical
model (Pfeiffer 2008, White et al. 2007, Nishiura 2006). Moreover, visualization
is required as the first step in an epidemiological analysis to understand the spatial
characteristics of a dataset (Pfeiffer 2008). Pfeiffer (2008) also mentioned that
visualization is needed for identifying the epidemiology of disease pattern in a
given geographical area, predicting the spreading pattern of disease in the next
period, and creating awareness for the target stakeholders based on the prediction
results, hence helps clinical management of disease. Unfortunately, ODE or
statistical models are unable to elaborate spatial patterns and interactions such as
in the visualization and prediction of spreading disease (White et al. 2007).
In order to overcome these limitations, researchers used Celluar Automata
(CA) models for involving time and space in epidemic process analysis (Santos et
al. 2011). Some studies has been conducted such as developing a mathematical
model of disease spread and its simulation using CA (White 2009), analyzing
some scenarios of disease spread (Lopez et al. 2013), applying the CA approach to
the Susceptible-Infective-Recovered (SIR) model of disease spread by considering
birth and death factors and the changes of rules for each state in the dynamic CA
(Athithan et al. 2014), and analyzing the complex spatiotemporal patterns
observed in transmission of vector infectious disease (Santos et al. 2009).
Basically, CA is one of the dynamic system approaches that implementing
discretization of time and space (White et al. 2007, Santos et al. 2011, Elsayed et
al. 2013). CA consists of cells, called cellular space, a local connection of to other
cells, and boundary conditions (White 2009). Each cell, representing a state, could
change at every time-step using local transmission rules which would generate a
new state based on the previous state of the cell and its neighborhood. Therefore,
the concept of neighborhoods is very important. Santos et al. (2011) showed the
effects of neighborhood structures on diseases spreading by using the SusceptibleInfected (SI) epidemics CA-model (Santos et al. 2011). Moreover, Hagoort et al.
(2008) described the rule of neighborhood in determining the model interacts
(Hagoort et al. 2008).
The other important aspect that determines the state changes of CA model
is the trasmition rule f. This rule was able to be represented as a deterministic or
probabilistic function (Santos et al. 2009, Elyased et al. 2013). Many methods to
find function f as rule of the CA model have been introduced such as using
Markov Chain (Peng et al. 2011), the differential equations of the classical model
(German et al. 2011), and the Genetic Algorithm (Mitchell 1996). In this research
the Hidden Markov Model (HMM) which has not been used by researchers yet,
was employed to find a probabilistic function that represented the CA
transmission rule. HMM is a probabilistic model that is suitable for solving the
problem related to the data sequential-temporal (Dugat 1996). To show the
2
effectiveness of the proposed model, this approach was implemented to the
Dengue Fever case.
The reason of using the Dengue Fever case is because it includes as one of
the deadly and infectious pandemic diseases in Indonesia. This disease, also called
Dengue Hemorrhagic Fever (DHF), is caused by the Dengue virus and is
transmitted by the Aedes aegypti mosquito as a vector. Several studies related to
the monitoring DHF in Indonesia have been conducted, such as the studies that
aimed to see the trend of dengue outbreak in the future by Saragi (2011) and
Octora (2010) (Saragi 2011, Octora 2010). Saragi (2011) used the Time Series
method for showing the trend of dengue outbreak (Saragi 2011). The study
predicted the number of dengue fever patients for next four years based on DHF
patient data in the province of North Sumatra from 2005 to 2009. Octora (2010)
compared the Autoregressive Integrated Moving Average (ARIMA) and the
Winter approach to predict the number of DHF cases in the next six months
(Octora 2010). This research used DHF cases data from Surabaya from January
2005 - June 2010. In this study, Octora (2010) applied four models of the Winter
method and three models of the ARIMA method.
This paper explained how to develop a spreading pattern model of DHF on
CA model that was used for visualizing and predicting spreading pattern of DHF.
This study was especially focused on determining a probabilistic function using
HMM with the dataset from a limited area such as West Bogor in the period of
2013. These dataset was used for defining the state criteria. Moreover, this study
only considered an infective state which was dedicated particular attention to the
spatial distribution of infected areas.The evaluation was conducted by comparing
the results of the proposed model to that of one yielded by the SIR method, as a
classical approach.
Problem Statement
As the problem definition above, the formulation of the problem in this
study could be described as follows:
1. How to develop a visualization model of spreading pattern of the Dengue
Fever.
2. How to predict the spreading process of Dengue Fever.
3. How to evaluate performance of our proposed model.
This research did not use real data cases of DHF for the following year
because when data collecting was conducted, the availability of data had not been
completed yet. Therefore, simulated data, as spreading initialization, was used for
viewing and predicting the spread of DHF from the proposed model. The
simulated data were generated randomly from John von Neumann-Random
Generator based on CA rule. For evaluating the model, the proposed model was
compared to the popular and accurate existing prediction mode such as SIR.
Objectives
The objectives of this research are:
3
1. To develop a visualization model of spreading pattern of the DHF based on
CA approach supported by HMM
2. To predict the spread process of DHF.
3. To evaluate the performance of the model by comparing the results of the
obtained model to that of one yielded by SIR method, as a classical approach.
Benefits
The contribution of this study is to provide information for researchers in
epidemiology and government, especially for units under Departement of Public
Health, in order to create a recommendation for controlling and preventing the
spreading of Dengue disease.
Boundaries
This study only considered the dataset that contains of the DHF cases
occurred in West Bogor in 2013. The spreading disease is commonly affected by
some factors including birth, death, and density of mosquito, and population
movement. In the case of DHF, the factors of birth and death could be ignored
since the number of cases is quite small compare to the number of population.
However, although there is only one occurrence of DHF, this occurrence should
have to be handled. In addition, in the development of a model, the definition of
Susceptible was changed from the number of populations to the number of
infected regions. Therefore, it was assumed that the factor of population
movement did not have a significant influence to the spreading disease of DHF.
Thus, the factor of population movement could be ignored. The other factor that
could be considered is the density of mosquito. However, in this research, those
kinds of data could not be provided yet. Thus, by only considering the existing
data cases for each period in every region, the HMM was chosen as a suitable
method for determining a function that represents CA’s rule by assuming that the
initial state of a cell at the beginning of the period has the same probability as
those of possible states values. Moreover this study uses Von Neumannneighborhood with r = 1 which represented the concept of the area affected by the
state change of surrounding area.
2 LITERATURE REVIEW
Epidemiology
Epidemiology is the field of study which is focused on exploring the
causative factors of disease and the distribution of health in the certain region
(Cuesta 2013). Cuesta (2013) stated that this study is very important since a
plague could undermine human and cause the economic losses. Cuesta (2013)
draw the elements of epidemiology as shown in Figure 1 as follow:
Environment
Time/Periode
Population
Agent
Figure 1 The triangle cart of the epidemiology elements (Cuesta 2013)
Epidemic in a certain region is affected by the basics elements that related
each others, such as pathogens from the susceptible population. In Figure 1
pathogens are represented by Agent. Pathogens would spread in a certain
population. This spreading of disease is actually related to the population
behavior in a certain evnvironment. Moreover, the environment is defined as an
external condition that may cause the disease spread. Some environmental factors
are geography, demography, climate, and social customs. The interaction among
all elements, such as Agent, Population, and Environment in the range of period,
describes a seasonal diseases.
Dengue Hemorrhagic Fever (DHF)
DHF is one of the deadly epidemics diseases that permanently exist in the
certain region with the certain population (Cuesta 2013). DHF is caused by
dengue virus transmitted to human by Aedes aegypti as a vector through his bite
(Candra 2010). Based on Candra (2010) researches, the number of dengue cases
has never decreased, even tends to increase, especially in the tropics or subtropics. DHF is able to infect any age, especially those who are less active.
5
SIR Model
The most common model used in the epidemiology fields where describes
an infectious disease in a population is a SIR model, described as a state diagram
in Figure 2 (Cuesta 2013) as follows:
Figure 2 SIR model (Cuesta 2013)
In his book, Cuesta (2013) described the SIR model as a state diagram that
consists of three states including Susceptible (S), Infected (I), and Recovered (R).
State S represents members of a population who are at risk of becoming infected.
State S interacts with members of a population who are infected, I. There are two
possibles conditions of an individu who is in state I. The fist condition is an
individu would be still infected during the period of infection, indicated in Figure
2 by an arrow toward itself. The second condition is the individu become to be
recovered, R.
The SIR model solves by mathematical approach using Ordinary
Differential Equation (ODE). The ODE of SIR represented as follow:
dS
(1)
   S  I
dt
dI
(2)
  S  I   I
dt
dR
(3)
 I
dt
where S = number of susceptible, I = number of infectious, and R = number of
recovered. Case represents the transmission probability of the disease while
represents the period of infection.
Model Cellular Automata (CA)
A Cellular Automata (CA) is a discrete model consisting of points or
identical cells that are in certain states conditions, from a possible finite number of
states, changing according to a local transition rule in time-step (White et al.
2007). Cells are arranged uniformly in cellular space that could be onedimensional, two-dimensional or three-dimensional. The state condition of a one
cell at the next time, t+1, depens on the states of the other cells surrounding,
called its neighborhood, at the time, t. Mathematically the CA model is defined as
a 4-uplet (C, S, V, f). C represents a cellular space. S represents a set of possible
state values for each cell in the cellular space. V is a set of neighborhoods around
a focus cell. Function f defines a local transition function that represents an update
rule for each state change of each cell (White 2009).
6
CA model is a model that represents data objects observed as grid of cells
in which in a neighborhood, each cells influence each other (Maeda 2006). In this
study, Maeda (2006) defined that in the next period, a cell (initial state) which
interacts with other one in a neighborhood would be changed into a new state
(next state). The way of how a neighborhood affects a cell (from initial state to
next period) is called as CA rule. The two dimensional CA is defined as a model
on a discrete system dynamic with some objects regularly spread in the two
dimensional space or coordinate space (Figure 3) in which each cells is assigned
with an initial state (White 2009). In the case of 2-dimensional CA, there are two
basic forms of neighborhood including Von Neumann–neighborhood and Moore–
neighborhood. Von Neumann–neighborhood with r = 1, has size of 5, consists of a
center cells and 4 neighborhood cells, upside, down side, left side, and right side
(Figure 4 (a)). Moore–neighborhood with r = 1, has a size of 9 consists of a center
cell and 8 neighborhood cells close each other (Figure 4 (b)).
Figure 3 Two-dimensional cellular space (Elsayed et al. 2013)
Figure 4 (a) Von Neumann-neighborhood
(Elsayed et al. 2013)
(b) Moore-neighborhood
The extention of a neighborhood is determined by parameter of radius (r), the
distance of a cell to the cell farthest from the neighbors that may affect the cells in
a state change. Size of Moore-neighborhood could be calculated with the
following equation (Maeda 2006):
n   r  1
2
(4)
Cell changes from one state to another are defined by a rule that could be
either deterministic or probabilistic rule (Elsayed et al. 2013). In his research,
Elsayed et al. (2013) divided CA model in two cases, uniform and non-uniform.
The uniform case is CA with the same rule definition for each neighboring cell in
a cell affects the center. The non-uniform case apply the different rule to each
neighboring cell in influencing the central cell.
7
Composing Neighborhoods
One of the important steps in modeling of CA is how to determine
neighborhood cells spatially. There are three techniques related to spatial
neighborhood relationship, including topological, distance, and direction relation
which may be combined by logical operators to express a more complex
neighbourhood relation (Ester et al. 2001). In spatial problem, the most important
object is point. The other objects such as lines, polygons or polyhedrons are
represented by a set of points.
Topological relations, the first technique in determining spatial
neighborhood relationship, are determined by considering the boundaries,
interiors and complements of the two related objects. These relations remain
unchanged when transformations are applied. There are some types of
transformations such as continuous, one-one, onto and whose inverse is
continuous. The relations are: A disjoint B, A meets B, A overlaps B, A equals B,
A covers B, A covered-by B, A contains B, A inside B.
The second technique is Distance relations. This technique used the
arithmetic comparison operators in order to compare the distance of two objects
with a given constant. In the case of the distance based technique, a distance
relationship is determined by calculating the distance between two objects, A and
B. The third technique is the direction relations. Figure 5 illustrates the definition
of some direction relations using 2D polygons. Obviously, the directions are not
specifically defined but there is always a smallest direction relation for two
objects A and B, called the exact direction relation of A and B, which is uniquely
determined.
Figure 5 The illustration of determining neighborhood technique (Ester et al. 2001)
Hidden Markov Model
Markov chain is a stochastic process, in which the value of state in a
period depends on the state of the previous period. The Hidden Markov Model
(HMM) could be used for events that change could not be observed directly, or
depending on the observation of other objects. HMM is a probabilistic model that
is suitable to be applied to temporally sequential data (Peng et al. 2011). This
model could be expressed in the State Transition diagram. There are two types of
models of State Transition in HMM, namely: Ergodic HMM (Figure 6 (a)) and the
Left-Right HMM (Figure 6 (b)).
8
ST1
ST1
ST2
ST2
ST3
ST3
(b) Left-Right HMM
(a) Ergodic HMM
Figure 6 State Transition Diagram
The type of problems that are able to be solved using HMM approach are:
1. Evaluating. The main objective of evaluating is how to find the probability
of a chain of observations. This type of problem could be solved using a
forward-backward procedure.
2. Decoding.The aim is to determine state chain that would be occurred on
the main object due to a series of observation by finding a maximum
probability of the possible state value. This type could be solved using the
Viterbi algorithm.
3. Learning. In this type of problem the optimal HMM models would be
found by changing the HMM parameters so that the maximum probability
is found in a series of observations. This type could be solved using the
Baum-Welch algorithm.
Mathematically, the HMM is written as:
  (T , E,  )
(5)
where λ is a HMM model (Dugat 1996). T is a Transition Probabilities Matrix that
represents the probability of states transition of an object. In the Figure 5 every
arrow in state diagram represents the value of the probability of occurrence of
state of an object changes from one period to the next. The values are stored in T
as shown in Table 1.
Table 1. The Transition Probability values
nth Period
ST1
ST2
ST3
Σ
ST1
P(a|a)
P(b|a)
P(c|a)
1
ST2
P(a|b)
P(b|b)
P(c|b)
1
ST3
P(a|c)
P(b|c)
P(c|c)
1
th
(n-1) Period
9
E is an Emission Probabilities Matrix that represents the probabilities of an
object to be in a certain state which is affected by state conditions of the
surrounding of the other observed object. The values are stored in E as shown in
Table 2. Prior Matrix, π, is a probabilities matrix of an object on the beginning of
the sequence of events was in a certain state. The values are stored in π as shown
in Table 3.
Table 2. The Emission Probability values
An Observed Object
(Xi)
X1
X2
a1
b1
a2
b2
ST1
P(a1|ST1)
P(b1|ST1)
P(a2|ST1)
P(b2|ST1)
ST2
P(a1|ST2)
P(b1|ST2)
P(a2|ST2)
P(b2|ST2)
ST3
P(a1|ST3)
P(b1|ST3)
P(a2|ST3)
P(b2|ST3)
Σ
1
1
1
1
Table 3. The Prior Probability values
ST1
ST2
P(ST1)
P(ST2)
ST3
P(ST3)
Σ
1
The probability of an objek C be a certain state condition at the time i,
denoted by P(Ci | X i ) , which is affected by the state condition of the observed
surrounding objects X is able to be calculated using Bayes theory
P(Ci | X i ) 
P( X i | Ci ).P(Ci )
P( X i )
(6)
where P(Ci ) is a transition probability value of state changes of an object C from
time at i-1 to i, and P( X i | Ci ) is an emission probability value of changing states
of an object X with the certain state condition of C at that time. Generally
probability of an object C be a certain state condition at the time n, denoted by
P(C1 , C2 ,..., Cn | X1 , X 2 ,..., X n ) , for a moving time from i = 1 to i = n could be
written as follows:
n
n
i 1
i 1
P(C1 , C2 ,..., Cn | X 1 , X 2 ,..., X n )   P( X i | Ci ). P(Ci | Ci 1 )
(7)
where P(Ci | Ci 1 ) is a transition probability value of state changes of an object C
from time at i=1 to i=n, and P( X i | Ci ) is an emission probability value of
changing states of an object X with the certain state condition of C at the same
time.
3 METHODOLOGY
Research Framework
The objective of this research is to develop a model of disease spread
using CA method. The contribution of this research is how to define parameters
and rule function used in simulation development. Several stages were done to
achieve the research objective, including: defining the CA model, collecting
datasets and constructing the model, finding a probabilistic function as a CA rule,
predicting the spread of disease using the proposed model, and evaluating the
model. These analysis results would be the basic information for preventing of
disease.
The main problem of this research is how to find a function that represents
a proper CA rule, and the HMM was chosen as a method for determining a
function that represents CA rule. Several steps on HMM were done to find the
function including: counting the number of state transition, counting the number
of emission, determining the Transition Probabilistic Matrix, and determining the
Emission Probabilistic Matrix. Figure 6 shows all the stages of this study.
Defining a Spreading Pattern Model Based on CA Model
There are four steps for defining the CA model, such as: defining a cellular
space, defining neighborhood used in a cellular space, defining the criteria of the
possible state values, and determining some probability values of function f that
represent the CA rule. Function f is required to obtain a spreading pattern of
disease on CA model.
Defining a Cellular Space and Neighborhood
First steps, a map of regions were transformed into a rectangular polygon
Ci. Next, the polygons were composed in the cellular space as the same size cells.
Each cell defines ununiformed objects and describes the number of disease cases
that occurred in the region. The index i of cell Ci stated the cell in the cellular
space. Each cell represents a region according to id of cell. A region position in a
cellular space was determined using grid region rule based on distance approach
(Figure 5). This approach was conducted by calculating distance to the
surrounding region. The shortest distance was stated as Von Neumann
neighborhood. The calculation of distance was conducted using Manhattan
distance formula (Bajracharya and Duboz 2013) as follows:
i 2
dist (T1 , T2 ) 
 a  b 
i 1
i
n
i
(8)
where T1 and T2 are two regions in which the distance between them were
calculated using the equation 8, in which a represented the position of T1 and b
represented the position of T2. The calculation results of equation 8 were changed
into the discrete form. Two regions that have direct boundary would have distance
11
value of 0. Moreover, two regions that have one region between them would have
distance value of 1.
Defining the CA model
Start
Defining a cellular space
Defining neighborhood
Defining the criteria of states value
Data and Model Costruction
Finding Probabilistic function f as the CA rule
Counting the number of state
transition from the data
Counting the number of Emission
from the data
Determining The Transition
Probabilistic Matrix
Determining The Emission
Probabilistic Matrix
Determining function f
Predicting the spread of disease
Evaluating
End
Figure 7 Frame work of the research
12
This research also defined a rule in which if there are two regions or more
having direct boundary with one other region, then the total distance of all
possible neighborhood composition would be counted. A neighborhood
composition with the most minimum total distance would be chosen as region
position composition in a cellular space. Figure 8 shows the example in which
there are two possibilities in composing neighborhood. The first composition has
total distance of 1, while the second composition has total distance of 0. In this
case, the second composition was chosen since it had the most minimum total
distance.
1
The first
compotition
3
4
1
3
2
4
Total distance = 1
2
The second
compotition
1
2
3
4
Total distance = 0
Figure 8 The possibility of a composing neighborhood
The number of cell in the cellular spaces actually has not always to be the
same as the number of the observed regions. For instance, we are able to define 20
or 25 cellular spaces for the 16 observed regions by adding the definition of
boundary condition for the regions which are not included into the 16 observed
regions (White et al. 2007). Boundary condition is a cell condition whereas an
observed cell does not have a complete number of neighborhoods since its
position is in the corner or in the boundary side of cellular space. In the proposed
model, a cell has incomplete neighborhood if i  4  0 or i  1  0 or i  1  16
or i  4  16 . In this study, for the proposed model, we assumed null boundary
conditions. Defining a cellular space was done by calculating the number of direct
neighborhood of each region to compose regions in a cellular space.
Defining a Set of States
The research that related to the state changes of a cell in two-dimensional
has performed by Djatna and Morimoto (2008). In this research, the concept of
state change was used for selecting features. The state change was calculated
based on the change of shape of the geometry which represented the affecting
results of two dimensional rules which is applied to the pair of attributes (Djatna
and Morimoto 2008). In the proposed model, the concept of state change was
applied to visualize the spreading pattern of disease. We defined a state changes
based on data content on location. First, we defined the categories value in a
13
number of categories, and set the color for each category. Next, the state changes
were seen as cell color changes in a cellular space.
Data and Model Construction
Collecting the Dataset
In this research, a CA model applied to DHF cases. Before conducting a
data colection, we studied data characteristics related to DHF. The spread of
dengue fever disease is influenced by several factors including condition of the
environment, behavior of population and agent of disease vectors (in this case is
the mosquito Aedes aegypti). Based on the literature study, generally, we noticed
some variables/attributes which were important to be considered as causative
factors of DHF. We listed the variables/attributes in Appendix A.
We decided to use dataset collected from Dinas Kesehatan Kota Bogor
(DKK-Bogor). The data were collected using an interview technique. We did an
interview with the DKK-Bogor Data Officer on July 16, 2014. The observation
process of spreading disease of DHF in Dinas Kesehatan tingkat Kota, in Bogor
was conducted by collecting reports from each Puskesmas in Bogor city. This
process was conducted until the smallest government entity, called Kelurahan.
The interview result was shown in Appendix B. In collecting the datasets, we did
some steps as follows: identify of geographical study area, conducting field study
for data collection, deciding sample used in this research, and determining the
source of the data. We also decided to focus on the West Bogor which was
divided into 16 regions.
Table 4 Number of Dengue cases in West Bogor in 2013
(Source: Dinas Kesehatan Kota Bogor)
Nol
Region
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Menteng
Cilendek Timur
Cilendek Barat
Sindang Barang
Bubulak
Situgede
Margajaya
Balumbang Jaya
Semplak
Curug
Curug Mekar
Pasir Mulya
Loji
Gunung Batu
Pasir Jaya
Pasir Kuda
1
2
5
1
2
2
1
1
0
2
7
1
1
0
7
1
2
2
1
2
4
2
1
0
0
0
0
2
0
1
1
3
0
3
The number of DHF cases per period
3
4
5
6
7
8
9 10
2
1
0
2
6
0
0
1
0
0
1
1
4
0
0
0
3
0
2
1
4
0
4
0
4
0
1
2
6
0
2
2
4
0
1
0
3
0
0
0
0
0
0
1
0
0
0
0
0
2
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
1
0
2
3
3
0
1
0
1
0
3
0
1
0
0
0
0
0
0
0
0
0
2
0
0
0
1
0
1
0
0
0
0
0
1
2
0
1
4
0
1
0
0
1
1
0
0
0
0
0
1
1
0
0
1
0
1
0
11
2
1
2
0
0
0
0
0
3
3
2
0
6
1
0
2
12
0
0
0
0
0
0
0
3
1
3
0
0
0
1
0
1
14
Data attributes used in this research are: “name of a region” and “number
of Dengue cases on 12 period”. In this research, we used the dataset that contains
of the occurrence of DHF cases in West Bogor in 2013 shown in Table 4.
Figure 9 The map of regions in West Bogor
A Cellular Space Construction
In this research, 16 cells in two-dimensional cellular space (Table 5) were
defined representing 16 regions in West Bogor (Figure 9). This research used
West Bogor that consisted of 16 regions. The total number of direct
neighborhoods of 16 regions was listed in Table 4. Next, the map was transformed
into 4X4 rectangular polygons as shown in Figure 10. The polygons should have
the same size as cells in the cellular space. Each cell defines ununiformed objects
and describes the number of dengue cases that occurred in the region.
Figure 10 The proposed cellular space
15
The cellular space is defined as a two-dimensional space in which each
cell represents a region with some Dengue cases in each period. The total region
in West Bogor is 16 regions. Thus, we defined 16 cells. Each cell contained some
un-uniformed objects that described some Dengue cases that occurred in a region
for the certain period.
Table 5 The number of direct neighborhood of each regions
Region
Neighborhoods
Situ Gede
Balumbang
Jaya
Margajaya
Balumbang Jaya, Semplak, Bubulak
Margajaya, Bubulak, Situ Gede
3
3
Sindang Barang, Bubulak,
Balumbang Jaya
Situ Gede, Balumbang Jaya, Margajaya, Sindang Barang,
Semplak
Curug, Curug Mekar, Cilendek Barat, Sindang Barang,
Bubulak, Situ Gede
Curug Mekar, Semplak
Curug, Semplak, Cilendek Barat, Cilendek Timur
Margajaya, Bubulak, Semplak, Cilendek Barat, Menteng,
Loji
Sindang Barang, Menteng, Cilendek Timur, Curug
Mekar, Semplak
Menteng, Curug Mekar, Cilendek Barat
3
Sindang Barang, Menteng, Gunung Batu
Cilendek Barat, Cilendek Timur, Loji, Sindang Barang,
Gunung Batu
Loji, Menteng, Pasir Mulya,
Pasir Jaya
Gunung Batu, Pasir Jaya, Pasir Kuda
Pasir Mulya, Gunung Batu, Pasir Kuda
Pasir Mulya, Pasir Jaya
3
5
Bubulak
Semplak
Curug
Curug Mekar
Sindang
Barang
Cilendek
Barat
Cilendek
Timur
Loji
Menteng
Gunung Batu
Pasir Mulya
Pasir Jaya
Pasir Kuda
Total
5
6
2
4
6
5
3
4
3
3
2
The most important step was to find all possible composition of polygons
that met to the map of West Bogor, representing region into a cellular space based
on the data on Table 4. Next, the total distances of the possibility compositions
were calculated using equation (8) for obtaining the minimum total distance.
In data construction, each cell was defined as a one-dimensional array
variable X  Xi / i  1,2,..,16 . Variable Xi represented a cell as shown in Figure
10.
Neighborhood
Based on the data in Table 4, there is no region which has 8 directly border
neighbors, therefore 4-neighborhoods from Von Neumann were used in the
proposed model. The Von Neumann neighborhood is a collection of five cells in
16
which the middle cell is a focus of attention as shown in Figure 11 (White 2009).
The remaining cells are cells that affect the state change of a cell in subsequent
periods. In the proposed model, neighborhood cells are represented as a onedimensional array variable, V  V j / j  0,1, 2,3, 4 , spatially could be showed in
Figure 11.
V1
V4
V0
V2
V3
Figure 11 Von Neumann-Neighborhoods
The Neighborhood frame as indicated in Figure 12 moved to each cell in
the cellular space. Whenever moving, the initial condition states of each cell were
checked. The Neighborhood frame moved in the cellular space with the equation:
V0  X i ; V1  X i 4 ; V2  X i 1 ; V3  X i 4 ; V4  X i  4
(9)
Figure 12 The neighborhood frame
A Set of States
In the proposed model, we defined a state changes based on data of
Dengue cases in West Bogor in 2013. First, we defined the categories value in
four categories and set the color for each category. Next, the state changes were
seen as cell color changes in a cellular space. In this research, we defined four
criteria of state as S  S1 , S2 , S3 , S4  . The four colors and their criterias of states
17
are shown in Table 6. In data construction, states value was represented by an
array, with the array variable of S  S1 , S2 , S3 , S4  .
Table 6 State definition of infected area
State
S1 :
S2 :
S3 :
S4 :
State definition
State colour
all peoples have been recovered, or no one was infected
1-2 peoples were infected
3-5 peoples were infected
> five peoples were infected
The construction for another data was shown in the Table 7. In this
research, Excel spreadsheets as a tool was used to build a simulation model to find
a probabilistic function and a spreading pattern. Moreover, a Scipy module in
Python 3.4 (https://github.com/hmcuesta/PDA_Book/tree/master/Chapter9) as
tool was used for evaluating the proposed model.
Finding a Probabilistic Function
The next step is how to determine a function f that represents the CA rule
based on parameters defined. In detail, our method for defining the CA model was
described as follows. Firstly, from the dataset that consist of 16 regions, we
defined the two-dimensional space and put each region into a one cell and set an
index for each cell, then we defined an array variable to represent the 16 cells in
which each cell has an index. Next, we put the number of data cases for each
region (number of infected) into an array variable in which data in one period
were stored into an array variable, running as time-step. Finally, with a set of
states criterias, we replaced the data cases with the data states.
The main problem of this research is how to find the function that
represents a proper CA’s rule. Many methods to find function f as rule of CA
model have been conducted, and in this research we used HMM, a method that
has not been used by researchers yet. In HMM the state chages described as the
state transition diagram. By ignoring the death factor and the birth factor, and by
assuming that the probability of an infected cell is affected by surrounding cells
that are considered an effective influence, then the HMM approach was suitable to
be used to determine a probabilistic function f.
The CA characteristic was represented as a Markov process (Knutson 2011).
Since the dataset was able to be classified as a time series dataset, it was proper to
use a probabilistic function that could be found using HMM. HMM is a
probabilistic model that is suitable for solving the problem related to the data
sequential-temporal (Peng et al. 2011). In the proposed CA model, the state
change of a cell to another state could be described as a State Transition Diagram
(Figure 10). The State Transition Diagram was able to express in HMM model as
T. The state change probabilities of a certain area affected by its neighborhoods
the emission probabilities, was able to express in HMM model as E.
18
Table 7 List of the data construction
per
SCij
SSi
Tij
V1ij
V2ij
V3ij
V4ij
SV1i
SV2i
SV3i
SV4i
E1ij
E2ij
E3ij
E4ij
Variable of period
Two-dimensional array variable for the number of state change of V0
from Si to Sj in a period to the next period.
i = 1..4; j = 1..4
One-dimensional array variable for the sum of state change of V0
from Si.
i = 1..4
Two-dimensional array variable for the transition probability matrix
of V0.
SCij
; i = 1..4; j = 1..4
Tij 
SSi
Two-dimensional array variable for the number of V0 in Si state
condition when V1 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V2 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V3 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V4 in Sj state condition. i = 1..4; j = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V2 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
Two-dimensional array variable for the emission probability matrix
of V0 affected by V1.
V 1ij
; i = 1..4
E1ij 
SV 1i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V2.
V 2ij
; i = 1..4
E 2ij 
SV 2i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V3.
V 3ij
E 3ij 
; i = 1..4
SV 3i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V4.
V 4ij
E 4ij 
; i = 1..4
SV 4i
19
Figure 13 State Transition Diagram of proposed model
Table 8.
                AND PREDICTING SPREADING PATTERNS
OF DENGUE FEVER
PUSPA EOSINA HOSEN
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2015
STATEMENT OF THESIS AND SOURCES OF
INFORMATION AND DEVOLUTION COPYRIGHT
Hereby, I state that the thesis entitled “A Cellular Automata Modeling for
Visualizing and Predicting Spreading Patterns of Dengue Fever” is my own work
and to the best of my knowledge, under supervision by Dr Eng Taufik Djatna,
STP MSi and Helda Khusun, STP MSc PhD. It has never previously been
published in any university. All of incorporated originated from other published as
well as unpublished papers are stated clearly in the texts as well as in the
references.
Hereby, I devolve the copyright of my thesis to Bogor Agricultural
University.
Bogor, August 2015
Puspa Eosina Hosen
Student ID G651130421
SUMMARY
PUSPA EOSINA HOSEN. A Cellular Automata Modeling for Visualizing and
Predicting Spreading Patterns of Dengue Fever. Supervised by TAUFIK
DJATNA and HELDA KHUSUN.
Modeling is a simplification of a real problem, aiming to study and
understand the phenomena in the real world. In epidemiology, system modeling
approach is commonly used for viewing the epidemic process. Moreover,
visualization is required as the first step in epidemiological analysis to understand
the spatial characteristics of a dataset, identifying the epidemiology of disease
pattern in a given geographical area, predicting the spreading pattern of disease in
the next period. Unfortunately, Ordinary Differential Equation (ODE) or statistical
models the most common method used in epidemiological analysis are unable to
elaborate spatial patterns and interactions such as in visualization and prediction
of spreading disease. This limitation could be overcome by using Cellular
Automata (CA) model.
The use of CA models in many problems such as in epidemic process
analysis and spatiotemporal pattern analysis showed the powerful of CA in
solving the problem that related to the spatial pattern. CA is one of the dynamic
system approaches that implementing discretization of time and space. CA
consists of cells, called cellular space, a local connection to other cells, and
boundary conditions. Each cell, representing a state, could change at every timestep using local transmission rules which generate a new state based on the
previous state of the cell and its neighborhood. Therefore, the concept of
neighborhood is very important. The other important aspect that determines the
accuracy of CA model is the trasmition rule f. This rule is able to be represented
as a deterministic or probabilistic function.
In this research, we proposed a new approach in developing a spreading
pattern model of Dengue Hemorrhagic Fever (DHF) based on CA. We especially
focused on determining a probabilistic function using Hidden Markov Model
(HMM) which has not been used by researchers yet. HMM is a probabilistic
model that is suitable for solving the problem related to the sequential-temporal
data. To show the effectiveness of the proposed model, we implemented this
approach to the DHF case. We used dataset from a limited area such as West
Bogor in the period of 2013 and defined the state criteria from these dataset.
Moreover, we only considered an infective state which was dedicated particular
attention to the spatial distribution of infected areas. The evaluation was
conducted by comparing the results of data simulation of the proposed model to
that of one yielded by the Susceptible-Infected-Recovered (SIR) model, as a
classical approach. The evaluation result showed that the CA model was capable
of generating patterns that similar to the patterns generated by SIR models with a
similarities value of 0.95.
Keywords: Cellular Automata, Dengue Fever, HMM, Neighborhood, SIR
RINGKASAN
PUSPA EOSINA HOSEN. Pemodelan Celular Automata untuk Visualisasi dan
Prediksi Pola Penyebaran Penyakit Demam Berdarah. Supervised by TAUFIK
DJATNA and HELDA KHUSUN.
Pemodelan adalah penyederhaan dari sebuah masalah atau fenomena di
dunia nyata yang bertujuan untuk mempelajari dan memahaminya. Pada
epidemiologi, pemodelan pada umumnya digunakan untuk melihat proses
epidemik. Oleh karena itu, visualisasi diperlukan sebagai langkah awal, antara lain
pada analisis epidemiologi untuk memahami karakter spasial dari dataset,
mengidentifikasi pola penyebaran penyakit pada area geografis tertentu, dan
memprediksi pola penyebaran penyakit pada periode selanjutnya. Selama ini,
Ordinary Diferential Equation (ODE) atau statistik, sebagai model yang paling
umum digunakan pada analisis epidemiologi, tidak dapat mengelaborasi proses
dari pola spasial dan interaksinya, seperti pada kasus visualisasi dan prediksi
penyebaran penyakit. Model Cellular Automata (CA) diperkenalkan sebagai
model yang dapat mengatasi keterbatasan dari model ODE dan statistik tersebut.
Penggunaan model CA pada banyak kasus, seperti analisis proses
epidemik dan analisis pola spatio-temporal memperlihatkan CA cukup baik dalam
memecahkan permalahan yang berkaitan dengan pola spasial. Model CA adalah
pendekatan sistem dinamik yang mengimplementasikan konsep diskritasi ruang
dan waktu. Model ini terdiri atas sel-sel yang disebar pada ruang selular, sebuah
koneksi lokal yang menghubungkan sel yang satu dengan sel yang lain, serta
kondisi batas. Setiap sel berada pada nilai state tertentu yang dapat berubah ke
nilai state lain setiap waktu. Perubahan state dipengaruhi oleh state sel tersebut
dan lingkungannya (neighborhood) pada periode sebelumnya. Oleh karena itu,
konsep neighborhood menjadi penting. Aspek penting lainnya yang menentukan
akurasi model CA adalah aturan trasmisi perubahan state untuk setiap sel yang
merupakan sebuah fungsi yang bersifat probabilistik.
Pada penelitian ini, diusulkan suatu pendekatan dalam mengembangkan
model penyebaran penyakit Demam Berdarah Dengue (DBD) berbasis CA.
Penelitian ini difokuskan pada penentuan fungsi probabilisitik, sebagai CA rule,
menggunakan pendekatan Hidden Markov Model (HMM) yang belum pernah
digunakan oleh peneliti-peneliti sebelumnya untuk model CA. HMM adalah
model probabilistik yang sesuai untuk memecahkan permasalahan data sekuensial
temporal. Untuk memperlihatkan efektivitas model yang diusulkan, pendekatan
ini diimplementasikan pada kasus DBD di wilayah Bogor Barat tahun 2013. Dari
dataset ini didefinisikan beberapa kriteria state untuk memodelkan proses spasial
kondisi terinfeksi (infective state) kelurahan-kelurahan di Bogor Barat. Evaluasi
dilakukan dengan membandingkan hasil simulasi data dari model yang diusulkan
terhadap hasil simulasi data yang diperoleh dari model Susceptible-InfectedRecovered (SIR), yaitu salah satu pendekatan klasik yang paling sering digunakan
dalam bidang epidemiologi karena sudah diakui tingkat keakuratannya. Hasil
evaluasi memperlihatkan bahwa model CA dapat menghasilkan pola yang serupa
dengan pola yang dihasilkan model SIR dengan nilai similaritas sebesar 0.95.
Kata kunci: Cellular Automata, Dengue Fever, HMM, Neighborhood, SIR.
© Copyright of This Thesis Belongs to Bogor Agricultural
University (IPB), 2015
All Rights Reserved
Prohibited citing in part or whole of this paper without include or citing sources.
The quotation is only for educational purposes, research, scientific writing, report
writing, criticism, or review of a problem; and citations are not detrimental to the
interests of IPB.
Prohibited announced and reproduce part or all of this publication in any form
without permission IPB.
A CELLULAR AUTOMATA MODELING FOR VISUALIZING
AND PREDICTING SPREADING PATTERNS
OF DENGUE FEVER
PUSPA EOSINA HOSEN
Thesis
as partial fulfillment of the requirements for the degree of
Master of Computer Science
in
the Department of Computer Science
GRADUATE SCHOOL
BOGOR AGRICULTURAL UNIVERSITY
BOGOR
2015
Non-committee examiner on thesis examination: Dr. Arya Kekalih, MTI
PREFACE
My deepest gratitude goes first and foremost to my supervisor Dr. Eng.
Taufik Djatna, S.Tp and Helda Khusun, S.Tp, M.Sc., PhD who provided me an
excellent working environment to finish my Master study. All I have learned from
them will become priceless treasure throughout my career. I wish to express my
gratitude to my thesis examiner, Dr. Aria Kekalih, M.T.I and Toto Haryanto,
S. Kom., M. Kom. as moderator in my final defence, for their inspiring and
invaluable advice.
I am thankful to all lectures and staff of Computer Science Department, all
my friends in Computer Science. I am lucky to have all of you and study in a
helpful environment, especially for Riva, Halimah, Husnul, Kana, Tengku Khairil,
Pungki, Syaif Usman, Luky, Irma, Peter, Akbar, Pizaini, Rake, and others who
help and encourage me. I would like to thank all members of Dr. Eng. Taufik
Djatna Lab (Laboratory Computer of Agro-industrial Technology Department),
Aisah, Hety, Novi, Zaki Hadi, Yogha, Rohmah, Yudishtira, Ikhsan and others. I
am glad to be a part of group members. Special thanks go to UIKA which
financially supported my research.
Last but not least, I would like to thank my beloved parents Jazib Hosen
alm. and Isnaniar, my lovely hushband Wisnu Ananta Kusuma, and my sixteenyears-old son Bara Samudra Syuhada and my brother and sisters Radian
Zarathustra, Jaziar Radianti, Ionia Veritawati, Dewi Ramadani, Farida
Candrasekar for their endless love and warm support all the way. Before all and
after all the man thanks should be to the Almighty God, Allah Subhanallahu Wa
ta’ala.
Hopefully this research would be useful.
Bogor, August 2015
Puspa Eosina Hosen
TABLE OF CONTENTS
LIST OF TABLES
vii
LIST OF FIGURES
vii
LIST OF APPENDIX
vii
GLOSARY
viii
1 INTRODUCTION
Background
Problem Statement
Objectives
Benefits
Boundaries
1
1
2
2
3
3
2 LITERATURE REVIEW
Epidemiology
Dengue Hemorrhagic Fever (DHF)
SIR Model
Model Cellular Automata (CA)
Composing Neighborhoods
Hidden Markov Model
4
4
4
5
5
7
7
3 METHODOLOGY
Research Framework
Defining a Spreading Pattern Model Based on CA Model
Defining a Cellular Space and Neighborhood
10
10
10
10
Defining a Set of States
12
Data and Model Construction
Collecting the Dataset
13
13
A Cellular Space Construction
14
Neighborhood
15
A Set of States
16
Finding a Probabilistic Function
Prediction of Pattern of the Disease Spread
Verification and Validation
4 RESULTS AND DISCUSSION
The Spreading Pattern Model base on CA
The Cellular Space
The Probabilistic Function as Rule on CA Model
Prediction Spreading Process of DHF using The Proposed Model
Evaluation
17
21
22
23
23
23
23
26
27
5 CONCLUSION AND RECOMMENDATION
Conclusion
Recommendation
27
27
27
REFERENCES
30
Appendix A List the variables/attributes of DHF factors
32
Appendix B. The interview result
33
Appendix C. Form of The table store of counter result of states changes
37
Appendix D List of state change affected by neighborhood that should be
counted
38
Appendix E The table store of counter result of states changes affected by
neighborhood
40
BIOGRAPHY
41
LIST OF TABLES
The Transition Probability values
The Emission Probability values
The Prior Probability values
1
2
3
4
5
6
7
8
9
10
11
12
13
8
9
9
Number of Dengue cases in West Bogor in 2013
13
The number of direct neighborhood of each region
State definition of infected area
List of the data construction
Transition Probabilities Matrix
Emission Probabilities Matrix
List of a state change
The cells representing the region
The number of state change based on the data cases DHF in West Bogor
The counter result of states changes affected by neighborhood
15
17
18
19
19
20
23
24
24
LIST OF FIGURE
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
The triangle cart of the epidemiology elements
SIR model
Two-dimensional cellular space
Neighborhood
The illustration of determining neighborhood technique
4
5
6
6
7
State Transition Diagram
Frame work of the research
8
11
The possibility of composing neighborhood
The map of regions in West Bogor
12
14
The proposed cellular space
Von Neumann neighborhood
The neighborhood frame
State Transition Diagram of proposed model
The illustration of algorithm of CA model
The Prediction Results of Dengue Spreading Pattern on CA model
The Tendency of Graph Number of Infected Area in Bogor Barat
14
16
16
19
21
27
27
LIST OF APPENDIX
A
B
C
D
E
List the variables/attributes of DHF factors
The survey forms
Form of the table store of counter result of states changes
List of state change affected by neighborhood that should be counted
The table store of counter result of states changes affected by neighborhood
30
33
37
38
40
GLOSARY
Aedes aegypti is mosquito that could spread dengue fever, chikungunya, and
yellow fever viruses, and other diseases
Emission probability is conditional distribution of observations given states
Epidemic is the rapid spread of infectious disease to a large number of people in a
given population within a short period of time, usually two weeks or less
Epidemiology is the science that studies the patterns, causes, and effects of health
and disease conditions in defined populations. It is the cornerstone of public
health, and informs policy decisions and evidence-based practice by
identifying risk factors for disease and targets for preventive healthcare.
Ergodic HMM is one for wich underlying Marcov chain is ergodic, or at least is
irreducible and admits a unique stationary distribution.
Neighborhood is is a geographically localized community within a larger city,
town, suburb or rural area
Spatiotemporal is the existing in both space and time.
Transition probability is the probabilities associated with various state changes
1 INTRODUCTION
Background
Modeling aims to study and understand the phenomena by simplifying of a
real problem. In epidemiology, system modeling approach is commonly used for
viewing the epidemic process (Cuesta 2013). Most of the models for epidemics
simulations are based on Ordinary Differential Equations (ODE) or statistical
model (Pfeiffer 2008, White et al. 2007, Nishiura 2006). Moreover, visualization
is required as the first step in an epidemiological analysis to understand the spatial
characteristics of a dataset (Pfeiffer 2008). Pfeiffer (2008) also mentioned that
visualization is needed for identifying the epidemiology of disease pattern in a
given geographical area, predicting the spreading pattern of disease in the next
period, and creating awareness for the target stakeholders based on the prediction
results, hence helps clinical management of disease. Unfortunately, ODE or
statistical models are unable to elaborate spatial patterns and interactions such as
in the visualization and prediction of spreading disease (White et al. 2007).
In order to overcome these limitations, researchers used Celluar Automata
(CA) models for involving time and space in epidemic process analysis (Santos et
al. 2011). Some studies has been conducted such as developing a mathematical
model of disease spread and its simulation using CA (White 2009), analyzing
some scenarios of disease spread (Lopez et al. 2013), applying the CA approach to
the Susceptible-Infective-Recovered (SIR) model of disease spread by considering
birth and death factors and the changes of rules for each state in the dynamic CA
(Athithan et al. 2014), and analyzing the complex spatiotemporal patterns
observed in transmission of vector infectious disease (Santos et al. 2009).
Basically, CA is one of the dynamic system approaches that implementing
discretization of time and space (White et al. 2007, Santos et al. 2011, Elsayed et
al. 2013). CA consists of cells, called cellular space, a local connection of to other
cells, and boundary conditions (White 2009). Each cell, representing a state, could
change at every time-step using local transmission rules which would generate a
new state based on the previous state of the cell and its neighborhood. Therefore,
the concept of neighborhoods is very important. Santos et al. (2011) showed the
effects of neighborhood structures on diseases spreading by using the SusceptibleInfected (SI) epidemics CA-model (Santos et al. 2011). Moreover, Hagoort et al.
(2008) described the rule of neighborhood in determining the model interacts
(Hagoort et al. 2008).
The other important aspect that determines the state changes of CA model
is the trasmition rule f. This rule was able to be represented as a deterministic or
probabilistic function (Santos et al. 2009, Elyased et al. 2013). Many methods to
find function f as rule of the CA model have been introduced such as using
Markov Chain (Peng et al. 2011), the differential equations of the classical model
(German et al. 2011), and the Genetic Algorithm (Mitchell 1996). In this research
the Hidden Markov Model (HMM) which has not been used by researchers yet,
was employed to find a probabilistic function that represented the CA
transmission rule. HMM is a probabilistic model that is suitable for solving the
problem related to the data sequential-temporal (Dugat 1996). To show the
2
effectiveness of the proposed model, this approach was implemented to the
Dengue Fever case.
The reason of using the Dengue Fever case is because it includes as one of
the deadly and infectious pandemic diseases in Indonesia. This disease, also called
Dengue Hemorrhagic Fever (DHF), is caused by the Dengue virus and is
transmitted by the Aedes aegypti mosquito as a vector. Several studies related to
the monitoring DHF in Indonesia have been conducted, such as the studies that
aimed to see the trend of dengue outbreak in the future by Saragi (2011) and
Octora (2010) (Saragi 2011, Octora 2010). Saragi (2011) used the Time Series
method for showing the trend of dengue outbreak (Saragi 2011). The study
predicted the number of dengue fever patients for next four years based on DHF
patient data in the province of North Sumatra from 2005 to 2009. Octora (2010)
compared the Autoregressive Integrated Moving Average (ARIMA) and the
Winter approach to predict the number of DHF cases in the next six months
(Octora 2010). This research used DHF cases data from Surabaya from January
2005 - June 2010. In this study, Octora (2010) applied four models of the Winter
method and three models of the ARIMA method.
This paper explained how to develop a spreading pattern model of DHF on
CA model that was used for visualizing and predicting spreading pattern of DHF.
This study was especially focused on determining a probabilistic function using
HMM with the dataset from a limited area such as West Bogor in the period of
2013. These dataset was used for defining the state criteria. Moreover, this study
only considered an infective state which was dedicated particular attention to the
spatial distribution of infected areas.The evaluation was conducted by comparing
the results of the proposed model to that of one yielded by the SIR method, as a
classical approach.
Problem Statement
As the problem definition above, the formulation of the problem in this
study could be described as follows:
1. How to develop a visualization model of spreading pattern of the Dengue
Fever.
2. How to predict the spreading process of Dengue Fever.
3. How to evaluate performance of our proposed model.
This research did not use real data cases of DHF for the following year
because when data collecting was conducted, the availability of data had not been
completed yet. Therefore, simulated data, as spreading initialization, was used for
viewing and predicting the spread of DHF from the proposed model. The
simulated data were generated randomly from John von Neumann-Random
Generator based on CA rule. For evaluating the model, the proposed model was
compared to the popular and accurate existing prediction mode such as SIR.
Objectives
The objectives of this research are:
3
1. To develop a visualization model of spreading pattern of the DHF based on
CA approach supported by HMM
2. To predict the spread process of DHF.
3. To evaluate the performance of the model by comparing the results of the
obtained model to that of one yielded by SIR method, as a classical approach.
Benefits
The contribution of this study is to provide information for researchers in
epidemiology and government, especially for units under Departement of Public
Health, in order to create a recommendation for controlling and preventing the
spreading of Dengue disease.
Boundaries
This study only considered the dataset that contains of the DHF cases
occurred in West Bogor in 2013. The spreading disease is commonly affected by
some factors including birth, death, and density of mosquito, and population
movement. In the case of DHF, the factors of birth and death could be ignored
since the number of cases is quite small compare to the number of population.
However, although there is only one occurrence of DHF, this occurrence should
have to be handled. In addition, in the development of a model, the definition of
Susceptible was changed from the number of populations to the number of
infected regions. Therefore, it was assumed that the factor of population
movement did not have a significant influence to the spreading disease of DHF.
Thus, the factor of population movement could be ignored. The other factor that
could be considered is the density of mosquito. However, in this research, those
kinds of data could not be provided yet. Thus, by only considering the existing
data cases for each period in every region, the HMM was chosen as a suitable
method for determining a function that represents CA’s rule by assuming that the
initial state of a cell at the beginning of the period has the same probability as
those of possible states values. Moreover this study uses Von Neumannneighborhood with r = 1 which represented the concept of the area affected by the
state change of surrounding area.
2 LITERATURE REVIEW
Epidemiology
Epidemiology is the field of study which is focused on exploring the
causative factors of disease and the distribution of health in the certain region
(Cuesta 2013). Cuesta (2013) stated that this study is very important since a
plague could undermine human and cause the economic losses. Cuesta (2013)
draw the elements of epidemiology as shown in Figure 1 as follow:
Environment
Time/Periode
Population
Agent
Figure 1 The triangle cart of the epidemiology elements (Cuesta 2013)
Epidemic in a certain region is affected by the basics elements that related
each others, such as pathogens from the susceptible population. In Figure 1
pathogens are represented by Agent. Pathogens would spread in a certain
population. This spreading of disease is actually related to the population
behavior in a certain evnvironment. Moreover, the environment is defined as an
external condition that may cause the disease spread. Some environmental factors
are geography, demography, climate, and social customs. The interaction among
all elements, such as Agent, Population, and Environment in the range of period,
describes a seasonal diseases.
Dengue Hemorrhagic Fever (DHF)
DHF is one of the deadly epidemics diseases that permanently exist in the
certain region with the certain population (Cuesta 2013). DHF is caused by
dengue virus transmitted to human by Aedes aegypti as a vector through his bite
(Candra 2010). Based on Candra (2010) researches, the number of dengue cases
has never decreased, even tends to increase, especially in the tropics or subtropics. DHF is able to infect any age, especially those who are less active.
5
SIR Model
The most common model used in the epidemiology fields where describes
an infectious disease in a population is a SIR model, described as a state diagram
in Figure 2 (Cuesta 2013) as follows:
Figure 2 SIR model (Cuesta 2013)
In his book, Cuesta (2013) described the SIR model as a state diagram that
consists of three states including Susceptible (S), Infected (I), and Recovered (R).
State S represents members of a population who are at risk of becoming infected.
State S interacts with members of a population who are infected, I. There are two
possibles conditions of an individu who is in state I. The fist condition is an
individu would be still infected during the period of infection, indicated in Figure
2 by an arrow toward itself. The second condition is the individu become to be
recovered, R.
The SIR model solves by mathematical approach using Ordinary
Differential Equation (ODE). The ODE of SIR represented as follow:
dS
(1)
   S  I
dt
dI
(2)
  S  I   I
dt
dR
(3)
 I
dt
where S = number of susceptible, I = number of infectious, and R = number of
recovered. Case represents the transmission probability of the disease while
represents the period of infection.
Model Cellular Automata (CA)
A Cellular Automata (CA) is a discrete model consisting of points or
identical cells that are in certain states conditions, from a possible finite number of
states, changing according to a local transition rule in time-step (White et al.
2007). Cells are arranged uniformly in cellular space that could be onedimensional, two-dimensional or three-dimensional. The state condition of a one
cell at the next time, t+1, depens on the states of the other cells surrounding,
called its neighborhood, at the time, t. Mathematically the CA model is defined as
a 4-uplet (C, S, V, f). C represents a cellular space. S represents a set of possible
state values for each cell in the cellular space. V is a set of neighborhoods around
a focus cell. Function f defines a local transition function that represents an update
rule for each state change of each cell (White 2009).
6
CA model is a model that represents data objects observed as grid of cells
in which in a neighborhood, each cells influence each other (Maeda 2006). In this
study, Maeda (2006) defined that in the next period, a cell (initial state) which
interacts with other one in a neighborhood would be changed into a new state
(next state). The way of how a neighborhood affects a cell (from initial state to
next period) is called as CA rule. The two dimensional CA is defined as a model
on a discrete system dynamic with some objects regularly spread in the two
dimensional space or coordinate space (Figure 3) in which each cells is assigned
with an initial state (White 2009). In the case of 2-dimensional CA, there are two
basic forms of neighborhood including Von Neumann–neighborhood and Moore–
neighborhood. Von Neumann–neighborhood with r = 1, has size of 5, consists of a
center cells and 4 neighborhood cells, upside, down side, left side, and right side
(Figure 4 (a)). Moore–neighborhood with r = 1, has a size of 9 consists of a center
cell and 8 neighborhood cells close each other (Figure 4 (b)).
Figure 3 Two-dimensional cellular space (Elsayed et al. 2013)
Figure 4 (a) Von Neumann-neighborhood
(Elsayed et al. 2013)
(b) Moore-neighborhood
The extention of a neighborhood is determined by parameter of radius (r), the
distance of a cell to the cell farthest from the neighbors that may affect the cells in
a state change. Size of Moore-neighborhood could be calculated with the
following equation (Maeda 2006):
n   r  1
2
(4)
Cell changes from one state to another are defined by a rule that could be
either deterministic or probabilistic rule (Elsayed et al. 2013). In his research,
Elsayed et al. (2013) divided CA model in two cases, uniform and non-uniform.
The uniform case is CA with the same rule definition for each neighboring cell in
a cell affects the center. The non-uniform case apply the different rule to each
neighboring cell in influencing the central cell.
7
Composing Neighborhoods
One of the important steps in modeling of CA is how to determine
neighborhood cells spatially. There are three techniques related to spatial
neighborhood relationship, including topological, distance, and direction relation
which may be combined by logical operators to express a more complex
neighbourhood relation (Ester et al. 2001). In spatial problem, the most important
object is point. The other objects such as lines, polygons or polyhedrons are
represented by a set of points.
Topological relations, the first technique in determining spatial
neighborhood relationship, are determined by considering the boundaries,
interiors and complements of the two related objects. These relations remain
unchanged when transformations are applied. There are some types of
transformations such as continuous, one-one, onto and whose inverse is
continuous. The relations are: A disjoint B, A meets B, A overlaps B, A equals B,
A covers B, A covered-by B, A contains B, A inside B.
The second technique is Distance relations. This technique used the
arithmetic comparison operators in order to compare the distance of two objects
with a given constant. In the case of the distance based technique, a distance
relationship is determined by calculating the distance between two objects, A and
B. The third technique is the direction relations. Figure 5 illustrates the definition
of some direction relations using 2D polygons. Obviously, the directions are not
specifically defined but there is always a smallest direction relation for two
objects A and B, called the exact direction relation of A and B, which is uniquely
determined.
Figure 5 The illustration of determining neighborhood technique (Ester et al. 2001)
Hidden Markov Model
Markov chain is a stochastic process, in which the value of state in a
period depends on the state of the previous period. The Hidden Markov Model
(HMM) could be used for events that change could not be observed directly, or
depending on the observation of other objects. HMM is a probabilistic model that
is suitable to be applied to temporally sequential data (Peng et al. 2011). This
model could be expressed in the State Transition diagram. There are two types of
models of State Transition in HMM, namely: Ergodic HMM (Figure 6 (a)) and the
Left-Right HMM (Figure 6 (b)).
8
ST1
ST1
ST2
ST2
ST3
ST3
(b) Left-Right HMM
(a) Ergodic HMM
Figure 6 State Transition Diagram
The type of problems that are able to be solved using HMM approach are:
1. Evaluating. The main objective of evaluating is how to find the probability
of a chain of observations. This type of problem could be solved using a
forward-backward procedure.
2. Decoding.The aim is to determine state chain that would be occurred on
the main object due to a series of observation by finding a maximum
probability of the possible state value. This type could be solved using the
Viterbi algorithm.
3. Learning. In this type of problem the optimal HMM models would be
found by changing the HMM parameters so that the maximum probability
is found in a series of observations. This type could be solved using the
Baum-Welch algorithm.
Mathematically, the HMM is written as:
  (T , E,  )
(5)
where λ is a HMM model (Dugat 1996). T is a Transition Probabilities Matrix that
represents the probability of states transition of an object. In the Figure 5 every
arrow in state diagram represents the value of the probability of occurrence of
state of an object changes from one period to the next. The values are stored in T
as shown in Table 1.
Table 1. The Transition Probability values
nth Period
ST1
ST2
ST3
Σ
ST1
P(a|a)
P(b|a)
P(c|a)
1
ST2
P(a|b)
P(b|b)
P(c|b)
1
ST3
P(a|c)
P(b|c)
P(c|c)
1
th
(n-1) Period
9
E is an Emission Probabilities Matrix that represents the probabilities of an
object to be in a certain state which is affected by state conditions of the
surrounding of the other observed object. The values are stored in E as shown in
Table 2. Prior Matrix, π, is a probabilities matrix of an object on the beginning of
the sequence of events was in a certain state. The values are stored in π as shown
in Table 3.
Table 2. The Emission Probability values
An Observed Object
(Xi)
X1
X2
a1
b1
a2
b2
ST1
P(a1|ST1)
P(b1|ST1)
P(a2|ST1)
P(b2|ST1)
ST2
P(a1|ST2)
P(b1|ST2)
P(a2|ST2)
P(b2|ST2)
ST3
P(a1|ST3)
P(b1|ST3)
P(a2|ST3)
P(b2|ST3)
Σ
1
1
1
1
Table 3. The Prior Probability values
ST1
ST2
P(ST1)
P(ST2)
ST3
P(ST3)
Σ
1
The probability of an objek C be a certain state condition at the time i,
denoted by P(Ci | X i ) , which is affected by the state condition of the observed
surrounding objects X is able to be calculated using Bayes theory
P(Ci | X i ) 
P( X i | Ci ).P(Ci )
P( X i )
(6)
where P(Ci ) is a transition probability value of state changes of an object C from
time at i-1 to i, and P( X i | Ci ) is an emission probability value of changing states
of an object X with the certain state condition of C at that time. Generally
probability of an object C be a certain state condition at the time n, denoted by
P(C1 , C2 ,..., Cn | X1 , X 2 ,..., X n ) , for a moving time from i = 1 to i = n could be
written as follows:
n
n
i 1
i 1
P(C1 , C2 ,..., Cn | X 1 , X 2 ,..., X n )   P( X i | Ci ). P(Ci | Ci 1 )
(7)
where P(Ci | Ci 1 ) is a transition probability value of state changes of an object C
from time at i=1 to i=n, and P( X i | Ci ) is an emission probability value of
changing states of an object X with the certain state condition of C at the same
time.
3 METHODOLOGY
Research Framework
The objective of this research is to develop a model of disease spread
using CA method. The contribution of this research is how to define parameters
and rule function used in simulation development. Several stages were done to
achieve the research objective, including: defining the CA model, collecting
datasets and constructing the model, finding a probabilistic function as a CA rule,
predicting the spread of disease using the proposed model, and evaluating the
model. These analysis results would be the basic information for preventing of
disease.
The main problem of this research is how to find a function that represents
a proper CA rule, and the HMM was chosen as a method for determining a
function that represents CA rule. Several steps on HMM were done to find the
function including: counting the number of state transition, counting the number
of emission, determining the Transition Probabilistic Matrix, and determining the
Emission Probabilistic Matrix. Figure 6 shows all the stages of this study.
Defining a Spreading Pattern Model Based on CA Model
There are four steps for defining the CA model, such as: defining a cellular
space, defining neighborhood used in a cellular space, defining the criteria of the
possible state values, and determining some probability values of function f that
represent the CA rule. Function f is required to obtain a spreading pattern of
disease on CA model.
Defining a Cellular Space and Neighborhood
First steps, a map of regions were transformed into a rectangular polygon
Ci. Next, the polygons were composed in the cellular space as the same size cells.
Each cell defines ununiformed objects and describes the number of disease cases
that occurred in the region. The index i of cell Ci stated the cell in the cellular
space. Each cell represents a region according to id of cell. A region position in a
cellular space was determined using grid region rule based on distance approach
(Figure 5). This approach was conducted by calculating distance to the
surrounding region. The shortest distance was stated as Von Neumann
neighborhood. The calculation of distance was conducted using Manhattan
distance formula (Bajracharya and Duboz 2013) as follows:
i 2
dist (T1 , T2 ) 
 a  b 
i 1
i
n
i
(8)
where T1 and T2 are two regions in which the distance between them were
calculated using the equation 8, in which a represented the position of T1 and b
represented the position of T2. The calculation results of equation 8 were changed
into the discrete form. Two regions that have direct boundary would have distance
11
value of 0. Moreover, two regions that have one region between them would have
distance value of 1.
Defining the CA model
Start
Defining a cellular space
Defining neighborhood
Defining the criteria of states value
Data and Model Costruction
Finding Probabilistic function f as the CA rule
Counting the number of state
transition from the data
Counting the number of Emission
from the data
Determining The Transition
Probabilistic Matrix
Determining The Emission
Probabilistic Matrix
Determining function f
Predicting the spread of disease
Evaluating
End
Figure 7 Frame work of the research
12
This research also defined a rule in which if there are two regions or more
having direct boundary with one other region, then the total distance of all
possible neighborhood composition would be counted. A neighborhood
composition with the most minimum total distance would be chosen as region
position composition in a cellular space. Figure 8 shows the example in which
there are two possibilities in composing neighborhood. The first composition has
total distance of 1, while the second composition has total distance of 0. In this
case, the second composition was chosen since it had the most minimum total
distance.
1
The first
compotition
3
4
1
3
2
4
Total distance = 1
2
The second
compotition
1
2
3
4
Total distance = 0
Figure 8 The possibility of a composing neighborhood
The number of cell in the cellular spaces actually has not always to be the
same as the number of the observed regions. For instance, we are able to define 20
or 25 cellular spaces for the 16 observed regions by adding the definition of
boundary condition for the regions which are not included into the 16 observed
regions (White et al. 2007). Boundary condition is a cell condition whereas an
observed cell does not have a complete number of neighborhoods since its
position is in the corner or in the boundary side of cellular space. In the proposed
model, a cell has incomplete neighborhood if i  4  0 or i  1  0 or i  1  16
or i  4  16 . In this study, for the proposed model, we assumed null boundary
conditions. Defining a cellular space was done by calculating the number of direct
neighborhood of each region to compose regions in a cellular space.
Defining a Set of States
The research that related to the state changes of a cell in two-dimensional
has performed by Djatna and Morimoto (2008). In this research, the concept of
state change was used for selecting features. The state change was calculated
based on the change of shape of the geometry which represented the affecting
results of two dimensional rules which is applied to the pair of attributes (Djatna
and Morimoto 2008). In the proposed model, the concept of state change was
applied to visualize the spreading pattern of disease. We defined a state changes
based on data content on location. First, we defined the categories value in a
13
number of categories, and set the color for each category. Next, the state changes
were seen as cell color changes in a cellular space.
Data and Model Construction
Collecting the Dataset
In this research, a CA model applied to DHF cases. Before conducting a
data colection, we studied data characteristics related to DHF. The spread of
dengue fever disease is influenced by several factors including condition of the
environment, behavior of population and agent of disease vectors (in this case is
the mosquito Aedes aegypti). Based on the literature study, generally, we noticed
some variables/attributes which were important to be considered as causative
factors of DHF. We listed the variables/attributes in Appendix A.
We decided to use dataset collected from Dinas Kesehatan Kota Bogor
(DKK-Bogor). The data were collected using an interview technique. We did an
interview with the DKK-Bogor Data Officer on July 16, 2014. The observation
process of spreading disease of DHF in Dinas Kesehatan tingkat Kota, in Bogor
was conducted by collecting reports from each Puskesmas in Bogor city. This
process was conducted until the smallest government entity, called Kelurahan.
The interview result was shown in Appendix B. In collecting the datasets, we did
some steps as follows: identify of geographical study area, conducting field study
for data collection, deciding sample used in this research, and determining the
source of the data. We also decided to focus on the West Bogor which was
divided into 16 regions.
Table 4 Number of Dengue cases in West Bogor in 2013
(Source: Dinas Kesehatan Kota Bogor)
Nol
Region
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Menteng
Cilendek Timur
Cilendek Barat
Sindang Barang
Bubulak
Situgede
Margajaya
Balumbang Jaya
Semplak
Curug
Curug Mekar
Pasir Mulya
Loji
Gunung Batu
Pasir Jaya
Pasir Kuda
1
2
5
1
2
2
1
1
0
2
7
1
1
0
7
1
2
2
1
2
4
2
1
0
0
0
0
2
0
1
1
3
0
3
The number of DHF cases per period
3
4
5
6
7
8
9 10
2
1
0
2
6
0
0
1
0
0
1
1
4
0
0
0
3
0
2
1
4
0
4
0
4
0
1
2
6
0
2
2
4
0
1
0
3
0
0
0
0
0
0
1
0
0
0
0
0
2
0
1
0
0
0
0
0
0
1
0
0
0
0
0
1
0
1
0
0
0
1
0
2
3
3
0
1
0
1
0
3
0
1
0
0
0
0
0
0
0
0
0
2
0
0
0
1
0
1
0
0
0
0
0
1
2
0
1
4
0
1
0
0
1
1
0
0
0
0
0
1
1
0
0
1
0
1
0
11
2
1
2
0
0
0
0
0
3
3
2
0
6
1
0
2
12
0
0
0
0
0
0
0
3
1
3
0
0
0
1
0
1
14
Data attributes used in this research are: “name of a region” and “number
of Dengue cases on 12 period”. In this research, we used the dataset that contains
of the occurrence of DHF cases in West Bogor in 2013 shown in Table 4.
Figure 9 The map of regions in West Bogor
A Cellular Space Construction
In this research, 16 cells in two-dimensional cellular space (Table 5) were
defined representing 16 regions in West Bogor (Figure 9). This research used
West Bogor that consisted of 16 regions. The total number of direct
neighborhoods of 16 regions was listed in Table 4. Next, the map was transformed
into 4X4 rectangular polygons as shown in Figure 10. The polygons should have
the same size as cells in the cellular space. Each cell defines ununiformed objects
and describes the number of dengue cases that occurred in the region.
Figure 10 The proposed cellular space
15
The cellular space is defined as a two-dimensional space in which each
cell represents a region with some Dengue cases in each period. The total region
in West Bogor is 16 regions. Thus, we defined 16 cells. Each cell contained some
un-uniformed objects that described some Dengue cases that occurred in a region
for the certain period.
Table 5 The number of direct neighborhood of each regions
Region
Neighborhoods
Situ Gede
Balumbang
Jaya
Margajaya
Balumbang Jaya, Semplak, Bubulak
Margajaya, Bubulak, Situ Gede
3
3
Sindang Barang, Bubulak,
Balumbang Jaya
Situ Gede, Balumbang Jaya, Margajaya, Sindang Barang,
Semplak
Curug, Curug Mekar, Cilendek Barat, Sindang Barang,
Bubulak, Situ Gede
Curug Mekar, Semplak
Curug, Semplak, Cilendek Barat, Cilendek Timur
Margajaya, Bubulak, Semplak, Cilendek Barat, Menteng,
Loji
Sindang Barang, Menteng, Cilendek Timur, Curug
Mekar, Semplak
Menteng, Curug Mekar, Cilendek Barat
3
Sindang Barang, Menteng, Gunung Batu
Cilendek Barat, Cilendek Timur, Loji, Sindang Barang,
Gunung Batu
Loji, Menteng, Pasir Mulya,
Pasir Jaya
Gunung Batu, Pasir Jaya, Pasir Kuda
Pasir Mulya, Gunung Batu, Pasir Kuda
Pasir Mulya, Pasir Jaya
3
5
Bubulak
Semplak
Curug
Curug Mekar
Sindang
Barang
Cilendek
Barat
Cilendek
Timur
Loji
Menteng
Gunung Batu
Pasir Mulya
Pasir Jaya
Pasir Kuda
Total
5
6
2
4
6
5
3
4
3
3
2
The most important step was to find all possible composition of polygons
that met to the map of West Bogor, representing region into a cellular space based
on the data on Table 4. Next, the total distances of the possibility compositions
were calculated using equation (8) for obtaining the minimum total distance.
In data construction, each cell was defined as a one-dimensional array
variable X  Xi / i  1,2,..,16 . Variable Xi represented a cell as shown in Figure
10.
Neighborhood
Based on the data in Table 4, there is no region which has 8 directly border
neighbors, therefore 4-neighborhoods from Von Neumann were used in the
proposed model. The Von Neumann neighborhood is a collection of five cells in
16
which the middle cell is a focus of attention as shown in Figure 11 (White 2009).
The remaining cells are cells that affect the state change of a cell in subsequent
periods. In the proposed model, neighborhood cells are represented as a onedimensional array variable, V  V j / j  0,1, 2,3, 4 , spatially could be showed in
Figure 11.
V1
V4
V0
V2
V3
Figure 11 Von Neumann-Neighborhoods
The Neighborhood frame as indicated in Figure 12 moved to each cell in
the cellular space. Whenever moving, the initial condition states of each cell were
checked. The Neighborhood frame moved in the cellular space with the equation:
V0  X i ; V1  X i 4 ; V2  X i 1 ; V3  X i 4 ; V4  X i  4
(9)
Figure 12 The neighborhood frame
A Set of States
In the proposed model, we defined a state changes based on data of
Dengue cases in West Bogor in 2013. First, we defined the categories value in
four categories and set the color for each category. Next, the state changes were
seen as cell color changes in a cellular space. In this research, we defined four
criteria of state as S  S1 , S2 , S3 , S4  . The four colors and their criterias of states
17
are shown in Table 6. In data construction, states value was represented by an
array, with the array variable of S  S1 , S2 , S3 , S4  .
Table 6 State definition of infected area
State
S1 :
S2 :
S3 :
S4 :
State definition
State colour
all peoples have been recovered, or no one was infected
1-2 peoples were infected
3-5 peoples were infected
> five peoples were infected
The construction for another data was shown in the Table 7. In this
research, Excel spreadsheets as a tool was used to build a simulation model to find
a probabilistic function and a spreading pattern. Moreover, a Scipy module in
Python 3.4 (https://github.com/hmcuesta/PDA_Book/tree/master/Chapter9) as
tool was used for evaluating the proposed model.
Finding a Probabilistic Function
The next step is how to determine a function f that represents the CA rule
based on parameters defined. In detail, our method for defining the CA model was
described as follows. Firstly, from the dataset that consist of 16 regions, we
defined the two-dimensional space and put each region into a one cell and set an
index for each cell, then we defined an array variable to represent the 16 cells in
which each cell has an index. Next, we put the number of data cases for each
region (number of infected) into an array variable in which data in one period
were stored into an array variable, running as time-step. Finally, with a set of
states criterias, we replaced the data cases with the data states.
The main problem of this research is how to find the function that
represents a proper CA’s rule. Many methods to find function f as rule of CA
model have been conducted, and in this research we used HMM, a method that
has not been used by researchers yet. In HMM the state chages described as the
state transition diagram. By ignoring the death factor and the birth factor, and by
assuming that the probability of an infected cell is affected by surrounding cells
that are considered an effective influence, then the HMM approach was suitable to
be used to determine a probabilistic function f.
The CA characteristic was represented as a Markov process (Knutson 2011).
Since the dataset was able to be classified as a time series dataset, it was proper to
use a probabilistic function that could be found using HMM. HMM is a
probabilistic model that is suitable for solving the problem related to the data
sequential-temporal (Peng et al. 2011). In the proposed CA model, the state
change of a cell to another state could be described as a State Transition Diagram
(Figure 10). The State Transition Diagram was able to express in HMM model as
T. The state change probabilities of a certain area affected by its neighborhoods
the emission probabilities, was able to express in HMM model as E.
18
Table 7 List of the data construction
per
SCij
SSi
Tij
V1ij
V2ij
V3ij
V4ij
SV1i
SV2i
SV3i
SV4i
E1ij
E2ij
E3ij
E4ij
Variable of period
Two-dimensional array variable for the number of state change of V0
from Si to Sj in a period to the next period.
i = 1..4; j = 1..4
One-dimensional array variable for the sum of state change of V0
from Si.
i = 1..4
Two-dimensional array variable for the transition probability matrix
of V0.
SCij
; i = 1..4; j = 1..4
Tij 
SSi
Two-dimensional array variable for the number of V0 in Si state
condition when V1 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V2 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V3 in Sj state condition. i = 1..4; j = 1..4
Two-dimensional array variable for the number of V0 in Si state
condition when V4 in Sj state condition. i = 1..4; j = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V2 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
One-dimensional array variable for the sum of V0 in Si state condition
when V1 in any state. i = 1..4
Two-dimensional array variable for the emission probability matrix
of V0 affected by V1.
V 1ij
; i = 1..4
E1ij 
SV 1i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V2.
V 2ij
; i = 1..4
E 2ij 
SV 2i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V3.
V 3ij
E 3ij 
; i = 1..4
SV 3i
Two-dimensional array variable for the emission probability matrix
of V0 affected by V4.
V 4ij
E 4ij 
; i = 1..4
SV 4i
19
Figure 13 State Transition Diagram of proposed model
Table 8.
