T2 972014001 Full text

Detection Model of Landslide-Potential Areas based on
Local-Learning using Iterative Dichotomiser Three
Algorithm

Tesis
Diajukan kepada
Fakultas Teknologi Informasi
untuk Memperoleh Gelar Master of Computer Science

Oleh:
Yerymia Alfa Susetyo
NIM: 972014001

Program Studi Magister Sistem Informasi
Fakultas Teknologi Informasi
Universitas Kristen Satya Wacana
Salatiga
November 2016

ii


iii

iv

v

Kata Pengantar
Puji syukur kepada Tuhan Yesus Kristus, atas kasih karunia yang telah
dikaruniakan sehingga penulis dapat kuat dan semangat dalam menyelesaikan pengerjaan
tugas akhir ini.
Ucapan terima kasih penulis sampaikan kepada pihak-pihak yang telah membantu
dalam penelitian ini:

1. Bapak Dr. Dharmaputra Palekahelu, M.Pd., selaku dekan Fakultas Teknologi
Informasi, Universitas Kristen Satya Wacana.
2. Bapak Prof. Ir. Danny Manongga, M.Sc, Ph.D, selaku Ketua Program Studi
Magister Sistem Informasi dan dosen pembimbing 1 atas bimbingan, arahan,
dan semangat yang telah diberkan selama masa pengerjaan tesis ini.
3. Bapak Prof. Dr. Ir. Wiranto Herry Utomo, M.Kom, selaku dosen pembimbing
2 atas bimbingan, dan masukan yang telah diberikan dalam penulisan tesis ini.

4. Badan Nasional Penanggulangan Bencana (BNPB) Republik Indonesia atas
data kejadian longsor yang telah diberikan, sehingga bisa bermanfaat dalam
penelitian ini
5. Segenap dosen dan karyawan Fakultas Teknologi Informasi, Universitas
Kristen Satya Wacana yang memberikan banyak bantuan kepada penulis
selama menuntut ilmu di FTI UKSW
6. Segenap keluarga, kerabat, dan teman-teman yang telah mendukung dalam
setiap kebersamaan.
7. Semua pihak yang telah membantu dalam penelitian ini baik secara langsung
maupun tidak langsung.
Terakhir, semoga penelitian ini bermanfaat bagi pemerintah dan
masyarakat. Penulis mohon maaf apabila terjadi kekeliruan dalam penelitian ini,
saran dan masukan akan sangat bermanfaat.
Salatiga, 5 Oktober 2016
Yerymia Alfa Susetyo

vi

Daftar Isi


Halaman Judul.............................................................................................. i
Pernyataan Tidak Plagiat .............................................................................. ii
Pernyataan Persetujuan Akses ...................................................................... iii
Lembar Persetujuan Pembimbing ................................................................ iv
Lembar Pengesahan ..................................................................................... v
Kata Pengantar ............................................................................................. vi
Daftar Isi....................................................................................................... vii
Daftar Gambar.............................................................................................. viii
Daftar Tabel .................................................................................................. ix
Abstract ........................................................................................................ 1
I. Introduction............................................................................................... 1
II. Related Works .......................................................................................... 2
III. Proposed Method ................................................................................... 2
IV. Result and Test ........................................................................................ 4
V. Discussion ................................................................................................ 6
VI. Conclusion .............................................................................................. 7
References .................................................................................................... 7
Lampiran ...................................................................................................... 9

vii


Daftar Gambar

Fig 1. Stages of Detection Model of Landslide-Potential Areas .................. 2
Fig 2. Spatial Data of Landslide Incidents in Java Island ............................ 3
Fig 3. Model of Detection System of Landslide-Potential Areas................. 4

viii

Daftar Tabel

Table 1 Table Structure of Detection Model of Landslide ........................... 3
Table 2 Discrete Value of Triggering Attributes of Landslide ..................... 3
Table 3 Confusion Matrix ............................................................................ 4
Table 4 Entropy Value and Gain Value in the First Iteration........................ 5
Table 5 Rules Generated from Model .......................................................... 5
Table 6 Confusion Matrix Results................................................................ 6
Table 7 Comparison of Accuracy Levels ..................................................... 6

ix


(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

Detection Model of Landslide-Potential Areas based
on Local-Learning using Iterative Dichotomiser Three
Algorithm
Yeremia A. Susetyo
Faculty of Information Technology
Satya Wacana Christian University
Salatiga, Indonesia

Daniel F. H. Manongga
Faculty of Information Technology
Satya Wacana Christian University
Salatiga, Indonesia

and East Java have the highest number of landslide incidents
[3].
Landslide-potential area is defined as the area that shows

landslide tendencies. The occurrence of a landslide in a
particular area can be related with similarities of area and
climate characteristics in other areas with previous landslide
incidents. It is then expected that developing an earlywarning system in landslide-prone areas helps identify other
areas with similar climate and soil physical characteristics as
landslide-potential areas [4].
A model that can detect landslide-prone areas is a
decision-tree learning algorithm that is derived from machine
learning [5]. Rules of this decision tree are appropriate tools
to predict a condition based on various variables [6]. This
method can model relation among variables without having
to stick to rules of data distribution or weighting. Besides, it
is no longer necessary to have specific rules for data format.
Data can take form of numbers or scale [7]. Iterative
Dicotomiser Tree (ID3) is a decision tree that can handle
continuous attributes by going through the discretization
process [8].
Developing a detection model of landslide-potential areas
requires combination of various factors that represent
physiographic and climate conditions of areas [1]. Indonesia

has very diverse inter-area climate and especially
physiographic conditions. One has to take into account these
differences when determining local geographical condition of
an area. Thus, it is necessary to have an algorithm that can
learn local geographical characteristics of each area. In this
case, using learning algorithm method helps detect landslide
potentials of each area based on local-learning without having
to rely on global weighting process.

Abstract— Landslide is the most destructive natural disaster
since it causes very significant environmental and socioeconomic
damages. Java, Indonesia is the most densely populated island
in the world. High population density and careless land
conversion lead to frequent landslides. Landslide itself is the
most frequent natural disaster in Indonesia. This research aims
to develop an early warning model of landslide-potential areas
based on local-learning that suits local geographical conditions
using Iterative Dichotomiser Three (ID3) in Java as the most
landslide-prone area in Indonesia. We analyze and map
landslide data with climate and soil characteristics using ID3

algorithm. In this research, we utilize landslide-causing
attributes i.e. area slope, rainfall, soil type, and land cover. This
research produce 36 leaf-node decision tree, where 19 leaf-node
indicate “Landslide-potential” and 17 leaf-node points to “Not
Landslide-potential”. Furthermore, the accuracy level of this
model is 92.37% with land cover attribute is the main attribute
that trigger landslide.
Keywords— ID3 Algorithm; Landslide; Land Use;
Learning Algorithm; Local Geographic

I.

Wiranto H. Utomo
Faculty of Information Technology
Satya Wacana Christian University
Salatiga, Indonesia

INTRODUCTION

Landslide is the most destructive natural disaster since it

causes very significant environmental and socioeconomic
damages [1]. It frequently occurs in various parts of the
world, especially in developing countries due to poor land use
plan, expanding human settlement areas, careless land
conversion practices, and climate change [2]. As a developing
country, Indonesia suffers frequent natural disasters and
landslide is one of the most frequent natural disasters. Data
from Indonesian National Disaster Prevention Agency show
that in December 2014 landslide was the most frequent
natural disaster. More specifically, there were 111 incidents
of landslide, much more frequent than flood as the second
most frequent disaster (86 incidents). There were 12
provinces suffered landslide, with Central Java, West Java,
1

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

II.


RELATED WORKS

The soft computing literature has produced mapping and
detection model of landslide-potential areas by combining
triggering factors of landslide. Statistical method is the most
commonly used method to predict landslide. In Trabzon,
Turkey, two statistical methods, Multi-Criteria Decision
Making (MCDM) and Support Vector Regression (SVR), are
combined to predict landslides. In MDCM, one has to firstly
determine weights of each attribute that trigger landslides.
The attribute with the highest weight indicates that this
attribute is the most influential [1]. However, statistical
methods have serious flaws in determining the weights of
each attribute. More specifically, it is likely that statistical
methods will produce different results [5].
Decision-tree learning algorithm is another model that
can be used to detect natural disasters. In a research
conducted in Kelantan, Malaysia, this method can predict
flood with accuracy level as high as 87%. It also claims that
learning algorithm outperforms statistical methods in the

sense that learning algorithm does not have to rely on
statistical assumptions and can handle differences of
weighting scale [5].
Another research of landslide detection that was
conducted in Penang, Malaysia, compares the accuracy level
of four types of learning algorithm, i.e. CHAID, Exhaustive
CHAID, CRT, and QUEST. These four methods can produce
considerably high accuracy level of 74%-82% [4].
Meanwhile, the statistical methods (MCDM, SVR, and LR)
in the Trabzon study only produces accuracy level of 69%77% [1].
Iterative Dichotomiser Three (ID3) is a form of learning
algorithm. Another research indicates that ID3 algorithm is
an algorithm that works well in non-continuous or discrete
data [13]. Data of triggering attributes of landslide that are
released by Indonesian government are discrete or interval
spatial data. Based on these arguments, this research aims to
develop an early warning model of landslide-potential areas
based on local-learning using Iterative Dichotomiser Three
(ID3) in Indonesia, especially in Java as the area with most
frequent landslides.
III.

Fig. 1. Stages of Detection Model of Landslide-Potential Areas based on
Local-Learning using ID3 Algorithm

A. Data Collection
Based on the decree of Minister of Public Work Republic
Indonesia No 22/PRT/M/2007 about guidelines of land use
planning of landslide-potential areas, we use four variables or
attributes that trigger landslide, i.e. land slope, rainfall, land
cover, and land type [9]. We obtain the data from related
authoritative agencies in the form of spatial data.
Additionally, we also use data of landslide incidents in the
three Java provinces from 2011 to 2015 from Indonesian
National Disaster Prevention Agency. Spatial data of
landslide incidents in the three Java provinces shown in
Figure 2.

PROPOSED METHOD

Figure 1 shows stages of Detection Model of LandslidePotential Areas based on Local-Learning using ID3
algorithm. This research consists of four stages i.e. (a) the
first stage collecting spatial data of landslide incidents and
triggering attributes of landslides; (b) the second phase
preprocessing to convert spatial data into discrete data; (c) the
third stage developing the algorithm model to learn landslidepotential areas with ID3; and (d) the fourth stage testing the
model accuracy using confusion matrix.

B. Data Preprocessing
This stage screens data from unnecessary characteristics
so that data fit better when being implemented in data mining
processing [10]. Before developing landslide detection model
using ID3, we firstly define necessary table structure. This
activity consists of determining attributes, data type used, and
role given to each attribute. The commonly used role are
attribute and label. Table 1 shows table structure of detection
model of landslide.
2

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

Fig. 2. Spatial Data of Landslide Incidents in Java from 2011 – 2015 Issued by Indonesian National Disaster Prevention Agency

TABLE 1
TABLE STRUCTURE OF DETECTION MODEL OF LANDSLIDE
Attribute
Data Type
Role
Location Code
Integer
Id
Slope
Polynominal
Regular Attribute
Land Cover
Polynominal
Regular Attribute
Rainfall
Polynominal
Regular Attribute
Land Type
Polynominal
Regular Attribute
Incident Type
Polynominal
Label

The last step of data preprocessing is classifying data into
two groups, i.e. data training and data testing. Data training
forms model of 80% from overall data, while data testing
tests model accuracy of 20% from overall data.
C.

Detection Model of Landslide-Potential Areas with ID3
Iterative Dichotomiser Three (ID3) is an algorithm that
is specially used for learning algorithm. This algorithm
develops a classification tree model. ID3 algorithm is a
classification algorithm that is developed based on entropy
value, i.e. evaluation all existing attributes to identify the
influence level of an attribute in classifying data sample using
a particular measure that is commonly known as information
gain [12].
Entropy is a parameter to measure heterogeneity of a set
of data sample. The more heterogeneous a set of sample data,
the higher is the entropy value [13]. Mathematically, entropy
can be formulated as follows:

Next, values of numeric attributes from spatial data are
converted into interval or discrete label (discretization). We
classify discrete rainfall and slope data using scale of 1000
and 15, respectively. We do not convert land cover data.
Meanwhile, we group land type data based on land damage
potential according to Ministry of Environment Republic of
Indonesia [11]. Table 2 shows the discrete value of triggering
attributes of landslide.
TABLE 2
DISCRETE VALUE OF TRIGGERING ATTRIBUTES OF LANDSLIDE
Attribute
Value
Discrete
< 2000 mm/year
2000 – 3000 mm/year
> 3000 mm/year

Low
Medium
High

Slope

< 15%
15% - 30%
> 30%

Flat
Corrugated
Steep

Soil Type

Vertisol, Oxisol
Alfisol, Mollisol, Ultisols
Inceptisols, Entisols,
Histosols, Spodosol,
Andisol

Light
Medium
High

Land Cover

Forest
Wet Field
Farming Estate
Dry Farming
Human Settlement

Forest
Field
Farming Estate
Dry Farming
Settlement

Rainfall

Entropy (S) =

∑ci - p log p
i

2

(1)

i

c is number of values in the target attribute, and pi is number
of sample for class i. After generating entropy value of a set
of sample data, the influence or effectiveness level of an
attribute in classifying data can be measured.
This
effectiveness measure is labeled as information gain [13].
Mathematically, information gain of an attribute A can be
formulated as follows:
Gain (S, A) = Entropy (S) Entropy (Sv)

3



v  values (A)

(|Sv|/|S|)
(2)

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

Where:
A
V
Values (A)
|Sv|
|S|
Entropy (S)

TP (True Positive) refers to number of data predicted to be
YES while in fact it is YES, FN (False Negative) is the
number of data predicted to be YES while in fact it is NO, FP
(False Positive) points to the number of data predicted to be
NO while in fact it is YES, and TN (True Negative) is the
number of data predicted NO when in fact it is NO.

: attribute
: a possible value of attribute A
: a set of values for attribute A
: number of sample for value v
: number of overall sample data
: entropy for sample data

In order to develop a detection model of landslidepotential area, ID3 algorithm can be implemented using
recursive function (a self-retrieving function). The ID3
algorithm to detect landslide potentials is as follows:

Prediction
YES
NO

Algorithms 1 detection model of landslide-potential area by
ID3 algorithm

TABLE 3
CONFUSION MATRIX
Actual
YES
TP
FN

NO
FP
TN

We define accuracy as the ratio of data classified
correctly to total data. The mathematical equation to
determine the accuracy level is as follows [14]:

ID3 (Z, Attributes, Target)
1. p = createNode ()
2. label (p) = mostCommonClass (Z, Target)
3. IF (x, c(x))  Z : c(x) = c THEN return (p) ENDIF
4. IF Attributes =  THEN return (p) ENDIF
5. K* = argmaxAAttributes (informationGain (Z,K))
6. FOREACH a  A* DO
Za = {(x, c(x))  Z : x|K* = a}
IF Za =  THEN
p' = createNode()
label (p') = mostCommonClass (Z, Target)
createEdge (p, a, p')
ELSE
createEdge (p,a, ID3 (Za, Attributes \ {K*}, Target))
ENDIF
ENDDO
7. return (p)

Accuracy = (TP + TN) / (TP + FP + TN + FN)

(3)

Where TP, FP, TN, and FN are generated from confusion
matrix.
IV.

RESULTS AND TESTS

Our results as shown in Figure 3, consist of three points,
i.e. data preprocessing results, decision tree to detect
landslide-potential areas with ID3, and model accuracy test.

In this algorithm, line 1 and 2 are early initialization to
form a node with its label in the decision tree. The third step
classifies data based on their labels (the “Landslide-potential”
and “non Landslide-potential” groups). Point 4 indicates that
if there is no attribute given, the decision tree will end with
single node. On the contrary, if there exists attribute given,
the fifth step seeks the best classifier by finding the highest
information gain. After generating the best classifier
attribute, step 6 exhibits iteration to form branches placed
below root (the best classifier). This step also checks attribute
value on label groups. If one of attributes do not have sample
value, iteration will step and its last knot and decision (label)
will be formed. On the contrary, if the sample value still
exists, the function of ID3 recursive will be retrieved to enter
each iteration.

Fig. 3. Model of Detection System of Landslide-Potential Areas

To develop a decision tree of landslide disaster based on
local-learning by using Iterative Dichotomiser Three (ID3)
learning algorithm, we classify our point of research location
into two groups, i.e. areas with landslide incidents and areas
without landslide incidents. Our observations years are 20112015 and we focus on three provinces: Central Java,
Jogjakarta Special Region, and West Java. We obtain 590
location points with 429 points experiencing landslide
incidents and 161 points are not. Next, we classify data into

D. Testing the Accuracy of Detection Model of Landslidepotential Areas
In order to test the accuracy level of classification and
prediction in the model, we firstly develop Confusion Matrix
[14]. Confusion Matrix measures performance of two
decisions models produced as shown in Table 3. In Table 3,

4

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016
TABLE 5
RULES GENERATED FROM LANDSLIDE DETECTION MODEL BASED ON
LOCAL-LEARNING WITH ID3 ALGORITHM IN JAVA ISLAND

two groups, i.e. data training and data testing. We utilize 471
data from January 2011 to December 2014 (354 data with
landslides and 117 without landslides) for data training. For
data testing, we use 20% of total sample data that comprise
of 118 data (75 data with landslides and 43 data without
landslides). We use landslide incident data from January
2015 to June 2015 for data testing activities.
We use data training to develop the decision tree. Firstly,
we measure information gain or effectiveness level of each
attribute on landslide incidents. We use entropy value and
information gain of each attribute to determine the best
classifier or root of decision tree, as shown by Table 4. The
formula to generate overall entropy value of data training is
shown in (1) below:
Entropy (S) = - (354/471) log2 (354/471) – (117/471) log2
(117/471)
= 0.808745

Node
1
2
3
4
5
6
7
8

TABLE 4
ENTROPY VALUE AND GAIN VALUE OF EACH ATTRIBUTE
IN THE FIRST ITERATION
Attribute
Rainfall

Slope

Soil Type

Entropy Value
E(S Low)
0.838008
E(S Medium)
0.733302
E(S High)
0.454701

9
10

Gain Value
Gain(S, Rainfall) =
0.152526

11
12

E(S Flat )
E(S Corrugated)
E(S Steep)

0.991526
0.836641
0.477429

Gain(S, Slope) =
0.091311

E(S Light )
E(S Medium)
E(S High)

0.994030
0.622896
0.852682

Gain(S, Soil Type) =
0.024706

E(S Forest )
E(S Field)
E(S Farming Estate)
E(S Dry Farming )
E(S Settlement )

0.439497
0.937963
0.739481
0.277289
0.543564

Gain(S, Land Cover) =
0.284734

13
14
15
16

Land Cover

17
18
19

As shown by Table 4, Land Cover has the highest
information gain value (0.284734); implying that Land Cover
is the best classifier and positioned as root of the decision
tree. In the second iteration, the best classifiers are located
below Land Cover knot for each branch value. Rainfall (gain
0.208) is located below Forest value, while Slope (0.177) is
located below Farming Estate. Further, Rainfall (gain 0.121)
is located below Dry Farming value and Slope (gain 0.701) is
located below Field value.
After performing the fourth iteration and fully
developing the decision tree structure, we generate rules to
detect landslides as can be seen at Table 5. This research
produces 36 leaf-node, where 19 of them are “Landslidepotential” and the rest are “non Landslide-potential”.

Landslide-Potential Rules Produced
IF (Land Cover=’Forest’ AND Rainfall=’High’ AND
Slope=’Corrugated’ AND Soil Type=’High’) THEN
‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’Corrugated’
AND Rainfall =’Medium’) THEN ‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’Corrugated’
AND Rainfall =’High’ AND Soil Type=’Medium’) THEN
‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’Steep’ AND
Rainfall =’Medium’) THEN ‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’ Steep’ AND
Rainfall =’High’) THEN ‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’Flat’ AND
Rainfall =’High’ AND Soil Type=’Medium’) THEN
‘Landslide-Potential’
IF (Land Cover=’Farming Estate’ AND Slope=’Flat’ AND
Rainfall =’High’ AND Soil Type=’High’) THEN
‘Landslide-Potential’
IF (Land Cover=’Settlement ’ AND Rainfall =’Low’ AND
Slope=’Corrugated’) THEN ‘Landslide-Potential’
IF (Land Cover=’Settlement ’ AND Rainfall =’Low’ AND
Slope=’Steep’) THEN ‘Landslide-Potential’
IF (Land Cover=’Settlement ’ AND Rainfall =’Medium’)
THEN ‘Landslide-Potential’
IF (Land Cover=’Settlement ’ AND Rainfall =’High’)
THEN ‘Landslide-Potential’
IF (Land Cover=’Dry Farming’ AND Rainfall =’Low’ AND
Soil Type=’Light’ AND Slope=’Steep’) THEN ‘LandslidePotential’
IF (Land Cover=’Dry Farming’ AND Rainfall =’Low’ AND
Soil Type=’High’) THEN ‘Landslide-Potential’
IF (Land Cover=’Dry Farming’ AND Rainfall =’Medium’)
THEN ‘Landslide-Potential’
IF (Land Cover=’Dry Farming’ AND Rainfall =’High’ AND
Soil Type=’Medium’) THEN ‘Landslide-Potential’
IF (Land Cover=’Dry Farming’ AND Rainfall =’High’ AND
Soil Type=’High’) THEN ‘Landslide-Potential’
IF (Land Cover=’Field’ AND Slope=’Corrugated’ AND
Rainfall =’High’) THEN ‘Landslide-Potential’
IF (Land Cover=’Field’ AND Slope=’ Steep’) THEN
‘Landslide-Potential’
IF (Land Cover=’Field’ AND Slope=’Flat’ AND Rainfall
=’High’ AND Soil Type=’High’) THEN ‘LandslidePotential’

We use data testing from previous step to test the
accuracy of the model. Table 6 shows the confusion matrix
results. This indicates that from 118 data testing tested based
on decision tree rule, 72 of them are predicted to be landslidepotential and in reality they experience landslides.
Meanwhile, there are only 6 events that are predicted to be
landslide-potential but not experience landslides. Further,
there are only 3 events that are predicted to be landslidepotential but they experience landslide and there are 37
events that are predicted to be landslide-potential and in
reality there exists no landslide.

5

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016
TABLE 6
CONFUSION MATRIX RESULTS
Actual
Prediction
Yes
No
Yes
72
6
No
3
37

if this settlement area is located in the corrugated and steep
slope. Meanwhile, in the field and farming estate areas, slope
level is sufficiently influential in explaining landslide
incident, especially in corrugated and steep slope.
Land Cover is the only attribute that can be controlled by
social activities or government policies through regulation.
Globally, many areas frequently experience landslides,
especially those located in developing countries. One of main
causes of these incidents is poor land and spatial use planning
and expanding human settlement areas, careless land
conversion practices, and climate change [2]. Among various
attributes, land cover is the most sensitive to environmental
change and human activities. Therefore, management of
landslide-potential areas has to take land cover structure into
account [1].
Our results could serve as an early warning for society,
government, and private (developers), especially in planning
land use in three provinces that are among the most densely
populated and most prone to landslides, i.e. West Java,
Central Java, and Jogjakarta Special Region [3].
The ID3 algorithm is an algorithm that can learn local
conditions of an area. Compared to similar research that relies
on decision tree, our results indicate some differences.
Research on modelling of landslide modelling using Chisquare Automatic Interaction Detector (CHAID) decision
tree in Penang-Malaysia suggests that slope is the most
influential attribute [4]. These results are understandable
since Penang is dominated by areas with corrugated and steep
slopes. Only 43.28% of Penang area is flat. Further, Penang
has significant portions of forest and (fruit) farming estate
areas – much higher than our areas of research. Forest area
could catch a significant amount of water, thus securing the
water flow and sustaining slope stability [18]. Other research
that use decision tree in Hoa Binh, Vietnam shows that
distance between landslide points and streets as the most
influential attributes of landslide disaster [15]. These results
are similar to ours since street development is a part of land
cover. Street and settlement land covers could disrupt natural
topology and affect slope stability [15].
Further, ID3 algorithm method has the accuracy level of
92.37%. Table 7 shows comparison of accuracy level
between ID3 algorithm and other methods in similar research.

From confusion matrix produced, we can measure the
accuracy level of our model. The accuracy of our model is
0.9237 or 92.37%. We measure our accuracy level based on
(2):
Accuracy = (72 + 37) / (72 + 6 + 37 + 3) = 0.9237
V.

DISCUSSION

ID3 is the appropriate algorithm to develop decision tree
in detecting landslide because the decision tree construction
produced by this algorithm is based on data of triggering
attributes of landslides. For this research, we use discrete and
interval data from government agencies. This is consistent
with previous research that suggests that ID3 algorithm works
well in non-continuous or discrete data [13].
This algorithm maps previous landslide incidents into a
decision tree structure. The landslide incident in a particular
area can be related with similar soil or climate characteristics
of other areas that experienced landslide previously [4].
Consequently, when the decision tree’s decision is
“Landslide” based on previous incidents, this research
converts the decision into “potential” that suggests that the
area will potentially experience landslide in the future. Also,
the decision of “no Landslide” was converted into “not
potential” that implies that this area has low potentials of
landslide in the future.
Decision tree can also be used to analyze relationship
between attributes and landslide incidents [4]. The decision
tree presents the results of learning algorithm in hierarchical
structure where attributes in the highest sequence is the most
important attribute in influencing landslide potentials [4]
[15]. We determine attribute with highest sequence (root) by
referring to the highest information gain value in the first
iteration [16]. Table 4 shows the information gain value or
influence value of each attribute in the first iteration. More
specifically, Land Cover attribute has the highest influence as
indicated by its information gain of 0.284734. Consequently,
Land Cover attribute is located at root or the highest position
as this attribute has the most influence on the landslide
incident. Our results suggest that poor land use planning in
Java Island is the main reason of various natural disasters
such as sedimentation, erosion, landslides, and diminishing
water availability. Land use planning practices often neglect
environmental sustainability [17].
The second iteration of the decision tree produces
Rainfall and Slope attributes as the most influential attributes
after Land Cover. High rainfall in a particular area
considerably influence the landslide potentials, especially in
settlement area. High rainfall will potentially trigger landslide

TABLE 7
COMPARISON OF ACCURACY LEVELS OF VARIOUS LANDSLIDE DETECTION
METHODS
Method Groups
Method Name
Accuracy
Decision Tree
ID3 (This research)
92.37 %
CHAID
81.90 %
Exhaustive CHAID
82.00 %
CRT
75.60 %
QUEST
74.00 %
Non-Decision Tree
GIS-MCDA
77.49 %
SVR
75.12 %
LR
69.41 %

Table 7 suggests that methods in the decision tree group
has higher accuracy level than methods in non-decision tree
because the decision tree methods are not based on statistical

6

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

REFERENCES

weighting but local-learning of previous incidents in a
particular area. In this table, excluding ID3 method, the
Exhaustive CHAID method has the highest accuracy (82%)
for Decision Tree method [4], while for Non-Decision Tree
methods, the GIS-MCDA method has the highest accuracy of
77.49% [1]. Meanwhile, our research shows that the ID3
method has the accuracy level of 92.37%, implying that this
method performs better in developing landslide detection
model as indicated by its highest accuracy level.
VI.

[1]

[2]

[3]

CONCLUSION

[4]

This research models landslide detection based on locallearning using ID3 algorithm in three provinces of Central
Java, West Java, and Jogjakarta Special Region in Indonesia.
The accuracy level of this research is 92.37%, far better than
other methods that are weighting-based. It then can be
concluded that previous landslide incidents can be used as a
warning to be alert on potential landslide in the future.
Landslides in different areas may have different
triggering factor. Different geographical condition may
explain the differences. It is claimed that ID3 algorithm based
on local-learning from previous incidents could
accommodate differences in geographical conditions in
different areas.
This research suggests that Land Cover is the main
triggering attribute of landslide in Java, Indonesia. Land use
and conversion as a triggering attribute of landslide is heavily
influenced by societal dynamics and government policy.
Therefore, it is expected that our research could inform
governments in making land use planning, especially in Java
Island. Planning for settlement area must also combine other
attributes, such as rainfall, slope, and soil type. For settlement
in no landslide-potential area, one has taken climate into
account, especially in the area with low rainfall (less than
2000 mm/year) and flat slope (less than 15%). Meanwhile,
areas with high rainfall (more than 3000 mm/ year) and steep
slope (more than 30%) should not serve as settlement areas
but must be preserved as natural forest.
It is expected that future research could add other
landslide triggering attributes. Besides, one can use more
detailed scale in the discretization process or use other
decision tree method such as C4.5. Further, similar research
in other areas is welcome since each areas has its distinctive
geographical characteristics.

[5]

[6]
[7]

[8]

[9]

[10]

[11]
[12]

[13]
[14]

[15]

[16]

[17]

ACKNOWLEDGMENT

[18]

The author wishes to thank the Indonesian National
Disaster Prevention that Provided the data landslide for the
analysis.

7

T. Kavzoglu, E.K. Sahin, I. Colkesen, “Landslide Susceptibility
Mapping using GIS-Based Multi-Criteria Decision Analysis, Support
Vector Machines, and Logistic Regression”, Landslides, vol. 11 no. 3,
pp. 425 – 439, June 2014.
C. Yilmaz, T. Topal, M.L. Suzen, “GIS-Based Landslide Susceptibility
Mapping using Bivariate Statistical Analysis in Devrek (ZonguldakTurkey)”, Environmental Earth Science, vol. 65, pp. 2161 – 2178, July
2011.
Indonesia Disaster Information on December 2014, BNPB, Jakarta,
2014.
M.S. Alkhasawneh, U.K. Ngah, “Modeling and Testing Landslide
Hazard Using Decision Tree”, Journal of Applied Mathematics, vol.
2014, pp. 1-9, February 2014.
M.S. Tehrany, M.N. Jebur, B. Pradhan, “Spatial Prediction of Flood
Susceptible Areas using Rule Based Decision Tree (DT) and a Novel
Ensemble Bivariate and Multivariate Statistical Models in GIS”,
Journal of Hydrology, vol. 504, pp. 69-79, September 2013.
A.J. Myles, an Introduction to Decision Tree Modeling, J.
Chemometer, 2004.
R.B. Kheir, “Spatial Soil Zinc Content Distribution from Terrain
Parameters: a GIS-Based Decision-Tree Model in Lebanon”,
Environment Pollution, vol. 158, pp. 520-528, 2010.
O.O. Adeyemo, T.O. Adeyeye, “Comparative Study of ID3/C4.5
Decision tree and Multilayer Perceptron Algorithms for the Prediction
of Typhoid Fever”, African Journal of Computing and ICT, vol. 8 no.1,
pp. 103-112, March 2015.
The Law of Public Work’s Ministry Republic of Indonesia No.
22/PRT/M/2007 on Disaster Management, Departemen Pekerjaan
Umum Republik Indonesia, Jakarta, 2007.
H. Bhalekar, S. Kumbhar, “Pre-processing data using ID3 classifier”,
International Journal of Engineering and Techniques, vol. 1 no. 3, pp.
68-73, June 2015.
The Rule of Soil Damage Mapping, Kementerian Lingkungan Hidup
Republik Indonesia, Jakarta, 2009.
K. Adhatrao, A. Gaykar, “Predicting Students’ Performance Using ID3
and C4.5 Classification Algorithms”, International Journal of Data
Mining and Knowledge Management Process (IJDKP), vol. 3 no. 5,
pp. 39-52, September 2013.
M. Slocum, “Decision Making Using ID3 Algorithm”, InSight: River
Academic Journal, vol. 8 no. 2, pp. 1-12, 2012.
D.L. Gupta, A.K. Malviya, “Performance Analysis of Classification
Tree Learning Algorithms, International Journal of Computer
Applications”, International Journal of Computer Applications, vol. 55
n. 6, pp. 39-44, October 2012.
D.T. Bui, “Landslide Susceptibility Assessment in Vietnam Using
Support Vector Machines, Decision Tree, and Naive Bayes Models”,
Mathematical Problems in Engineering, vol. 2012, pp. 1-26, April
2012.
D.M. Farid, L. Zhang, “Hybrid decision tree and naïve Bayes
classifiers for multi-class classification tasks”, Expert System with
Application, vol. 41, pp. 1937-1946, 2014.
Maridi, A. Saputra, “Role of Vegetation for Water and Soil
Conservation in Watershed: Case Study in 3 Sub-Watershed of
Bengawan Solo (Keduang, Dengkeng, dan Samin)”, in National
Seminar on Conservation and Utilization of Natural Resources,
Surakarta Indonesia, 2015.
K.C. Devkota, A.D. Regmi, “Landslide susceptibility mapping using
certainty factor, index of entropy and logistic regression models and
their comparison at a landslide prone area in Nepal Himalaya”, Natural
Hazards, vol. 65 no. 1, pp. 135-165, 2013.

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

AUTHORS’ INFORMATION
Yeremia A. Susetyo, Master student in
Faculty of Information Technology, Satya
Wacana Christian University, Salatiga.
Finished Bachelor degree in information
technology about artificial intelligence and
Geographic Information System.

Daniel F. H. Manongga is a professor and
Head of
Master Program in Information
Systems, Faculty of Information Technology,
Satya Wacana Christian University, Indonesia.
Received his B.Eng (Electronics) from Satya
Wacana Christian, University Indonesia. MSc
(Information Technology) from Queen Mary
College, University of London, and PhD
(Management Sciences) from University of
East Anglia, UK. His research interests
include operation research and business
intelligence

Wiranto H. Utomo is a professor in Faculty
of Information Technology, Satya Wacana
Christian University, Indonesia. Received his
M.Com and Doctor (Computer Science) from
Gadjah Mada University. His research
interests include software engineering, web
services, cloud computing, and big data.

8

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

9

(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 14, No.09, September 2016

10