Performance of logistic regression model for predicting deforestation, case study cikepuh wildlife reserve and cibanteng natural reserve

(1)

PERFORMANCE OF LOGISTIC REGRESSION MODEL

AND SPATIAL METHOD

(Case: Predicting of Deforestation in

Cikepuh Wildlife Reserve

and Cibanteng Natural Reserve)

d

GRADUATE SCHOOL

BOGOR AGRICULTURAL UNIVERSITY

2006

(2)

PERFORMANCE OF LOGISTIC REGRESSION MODEL

AND SPATIAL METHOD

(Case: Predicting of Deforestation in

Cikepuh Wildlife Reserve

and Cibanteng Natural Reserve)

BONIE FAJAR DEWANTARA

A thesis submitted for the degree of Master of Science of Bogor Agricultural University

GRADUATE SCHOOL

BOGOR AGRICULTURAL UNIVERSITY

September 2006

(3)

STATEMENT

I am Bonie Fajar Dewantara stated that this thesis entitled :

Performance of Logistic Regression Model and Spatial Method (Case: Predicting of Deforestation in Cikepuh Wildlife Reserve and

Cibanteng Natural Reserve)

is result of my own works during the period January 2005 – September 2006 and it has not been published before. The contents of thesis have been examined by the advising committee and an external examiner.

Bogor, September 2006

(4)

ACKNOWLEDGEMENT

Alhamdulillah, Thanks to God, at the last this thesis has finished successfully, and I would like to thank to all people who have helped and assisted me during finishing the thesis. There are many people I should thank in regard to this work and no doubt I will not be able to mention them one by one, and I can buy beg forgiveness.

I deeply appreciate the efforts and thank to my supervisor Dr. Ir. Lilik B. Prasetyo, M.Sc and co-supervisor Idung Risdiyanto S.Si, M.Sc for their guidance, technical, comments and constructive criticism through all months of my research. My special gratitude also goes to Dr. Ir. I Nengah Surati Jaya, M.Sc as the external examiner and Dr. Ir. Hatrisari Hardjomidjojo, DEA, as the seminar and examination chairman (moderator) for their positive ideas, inputs, and criticism. And also my special gratitude goes to all my teachers, my lectures for sharing their knowledge and experiences.

I would like to thank to SEAMEO-BIOTROP management and staff, especially Dr. Ir. Tania June, M.Sc and MIT staff and management, technical and facility. Especially To Devi, Uma, and Bambang; Pak Jejen has been together gone to the field to collect ground truth data, and also Pak Asep in Ciracap for his home stay. Also, I thank to PPLH (Pusat Penelitian Lingkungan Hidup / Environmental Research Center) Bogor Agricultural University for the image data and Baplan (Badan Planologi / Forestry Planning Agency) Ministry of Forestry, for digital map data I would like to thank to Conservation International – Indonesia for the basic idea of image processing methodology and Wildlife Conservation Society – Indonesia Program for the ERDAS Imagine 8.7 and ArcView license usage. For my friends in MIT especially in the same batch 2002, I really appreciate our togetherness, our 24-hours-a-day works, and how to support each other to finish our assignment and study right on time.

(5)

Finally I feel deeply indebted to my lovely dear wife, Frida Yuliyanti S.Hut for her moral support and patience during the course, and especially also for both my sons Ariodanie Fudhail Hanif and Ariq Maulana Malik Ibrahim; my parents Adnan Hanif (alm) and Hj. Chadidjah, and all my family. I dedicated this thesis for the glory of knowledge and science of Indonesia.

Bonie Adnan September 2006

(6)

CURRICULUM VITAE

Bonie Fajar Dewantara was born in Belawan – Medan, North Sumatera, Indonesia at January 1st, 1971. He received his undergraduate diploma from Bogor Agricultural University in 1996, especially Forest Product Technology Department of Forestry Faculty.

Since 1996 to 1999 he worked at Risjad Salim International Bank, and from 1999 to 2002 worked at Carrefour Indonesia as Department Head, and from 2002 to 2004 worked at Ritel and Logistic Consultant PT. Wira Prima Abadi, and continued to consultant firm PT. Explorer Indonesia from 2004 – 2005 as Head of Forestry Division. Now, he has been working at Wildlife Conservation Society – Indonesia Program as GIS and Remote Sensing Analyst since 2005.

In 2002, he registered as a post-graduated student of Bogor Agricultural University, program study Master of Science in Information Technology for Natural Resources Management, and received his post-graduated diploma in 2006 with thesis title “Performance of Logistic Regression Model and Spatial Method (Case: Predicting of Deforestation in Cikepuh Wildlife Reserve and Cibanteng Natural Reserve”

(7)

ABSTRACT

BONIE FAJAR DEWANTARA (2006). Performance of Logistic Regression Model for Predicting Deforestation, Case Study: Cikepuh Wildlife Reserve and Cibanteng Natural Reserve. Under the supervision of LILIK BUDI PRASETYO and IDUNG RISDIYANTO.

Cikepuh Wildlife Reserve and Cibanteng Natural Reserve, since both conservation area was established in 1973 and 1925 have been facing complex problem caused by land use changed, deforestation, illegal hunting, forest fire, and so on. Deforestation itself is a complex socio-economic, cultural, and political event. This thesis focused on what factors affect the rate of deforestation by considering some common driving forces of deforestation and using logistic regression for predicting deforestation. It is clearly important to know where deforestation is likely to occur. The objectives of the thesis are to quantify the contribution of each deforestation driving factor such as distance from center of dweller, aspect, slope, distance from shore line, distance from existing road, and elevation, and to elaborate spatial projection of future trends of deforestation based on possibility of deforestation as the result of logistic regression equation.

The methodology is using Stacking Method from CI (Conservation International) CABS (Center for Biodiversity Applied Science) and developed together with WCS IP (Wildlife Conservation Society – Indonesia Program). Two image with different dates or one period was stacked and analyzed by visualization from both images. Signature area was extracted from the stacked-images by using shapefile polygon for forest to forest class, forest to non forest class, non to non forest class, water, cloud and shadow. Signature area should be represented certain spectral characteristic, so for obtaining number of class as many as possible, it could use 16 bit data type indeed 8 bits.

Classification method is supervised classification that was done by CART ERDAS Imagine plug in tool and See5, a stand alone decision-tree based classification program. The result of classification is thematic raster image with forest change attribute. Analysis was done in one attribute table of polygon vector cell (PVC), that is created by using Edit Tool Vector Grid, an extension from ArcView 3.3. All attribute of independent variables fill the squared-shaped polygon as called PVC, and the result probability of logistic regression as the result of the calculation as well.

Independent variable is divided to two binary category 0 and 1. 1 is a parameter that tends to occur deforestation such as less 1 km distance from road. 0 is stable condition that there is no change from forest to non forest. The result of possibility deforestation occurrence is if the road distance less than 1 km, tends to deforested occurrence 3 times compare the distance greater or equal 1 km. The smallest possibility of deforestation occurrence was contributed by predictor distance 1 km from river, and almost has no effect to deforested occurrence.

Regression logistic equation in this thesis can predict deforestation significantly, although some processes of polygon vector cell could not accommodated to assign data from attribute of independent variables to polygon vector cell exactly. Regression logistic model could predict deforestation better if distribution of independent variables that are assumed to tend to deforestation occurrence distribute evenly entire the study area.

(8)

Research Title : Performance of Logistic Regression Model and Spatial Method (Case: Predicting of Deforestation in Cikepuh Wildlife Reserve and Cibanteng Natural Reserve)

Name : Bonie Fajar Dewantara

Student ID : G.051020051

Study Program : Master of Science in Information Technology for

Natural Resources Management

Approved by, Advisory Board

Dr. Ir. Lilik Budi Prasetyo, M.Sc Idung Risdiyanto, S.Si, M.Sc Supervisor Co-supervisor

Endorsed by,

Program Coordinator Dean of Graduate School

Dr. Ir. Tania June, M.Sc Prof. Dr. Ir. Khairil A. Notodiputro, MS

(9)

Page

Table of Content ……… ……… i

List of Figure ……….. iii

List of Table ………..………. vi

List of Appendix ………..……….. vii

I INTRODUCTION 1.1. Background ……… 1

1.2. Obejctives ……… 2

1.3. Hypothesis ………. 3

II LITERATURE REVIEW 2.1. Logistic Regression Model ……… 4

2.1.1. Logistic Regression Equation ……… 6

2.1.2. Significance Test for Parameter Predictors ……… 7

2.1.3. Model Interpretation ……… 9

2.1.4. Logistic Regression Coefficient and Correlation ………… 11

2.2. Remote Sensing, GIS and Change Detection ……… 12

2.2.1. Remote Sensing ……… 12

2.2.2. Change Detection……… 12

2.3.3. Geographical Information System ……… 13

2.3. Deforestation ……… 14

III MATERIALS AND METHODS 3.1. Time and Location ………... 16

3.2. Data Sources ……… 17

3.3. Supporting Tools / Program ………. 18

3.4. Methodology………. 19

3.4.1. Image Preprocessing………... 19

(10)

b. Geo-Referencing ……… 19

3.4.2. Image Processing ……… 21

a. Image Stacking……….. 21

b. Signature Area ……… 22

c. ERDAS Imagine – CART Classification ………….…… 24

3.4.3. Vector Processing ……….. 26

a. Creating Cell Vector……….. 26

b. Extracting Variables Data ………. 27

3.4.4. Logistic Regression Model ……… 29

3.5. Assumption of Research Study ……….. 30

IV RESULTS AND DISCUSSION 32 4.1. Image Processing ……… 32

4.1.1. Period 1990 - 1997………. 32

4.1.2. Period 1997 - 2001 ……… 37

4.2. Vector Processing ………. 40

4.2.1. Creating Vector Cell ……… 40

4.2.2. Data Extracting of Contour SRTM Data Image ... 41

4.2.3. Data Extracting of River Buffer Process ……… 43

4.2.4. Data Extracting of Road Buffer Process ………... 44

4.2.5. Data Extracting of Shoreline Buffer Area Process ………. 45

4.2.6. Data Extracting of Pupolation Center Buffer Area Process . 46 4.2.7. Data Extracting of Aspect Area Image ……… 48

4.2.8. Data Extracting of Slope Area Image ………. 50

4.3. Logistic Regression ……… 51

4.3.1. Logistic Regression Equation ……… 51

4.3.2. Significance Test of Model and Predictors ………. 53

4.3.3 Logistic Coefficient and Correlation ……… 59

4.4. Validation and Accuracy Assessment ………... 61

4.4.1. Validation ……… 61

(11)

V CONCLUSION AND RECOMMENDATION 66

5.1. Conclusion ……… 66

5.2. Recommendation ……… 67

REFERENCES 69

Appendix 1. Vector Map of Study Area ………... 74

Appendix 2. Illustration of attribute table in spatial processing ……... 75

Appendix 3. SPSS Output ………. 76

(12)

LIST OF FIGURE

No Caption Page Figure 3.1. Study area, is included Ciemas and Ciracap Subdistrict of

Sukabumi Province

Figure 3.2. Flow chart of research activities and procedures 20 Figure 4.1. Stacking process of Landsat image 1990 and 1997 32 Figure 4.2. Defining a training site with one polygon and inside that polygon

must be similar the spectral characteristic in both dates

Figure 4.3. Observing the one class of stable forest by signature mean plot, which is similarity of spectral characteristic

Figure 4.4. The result of classification process that is using ERDAS Imagine 8.7, CART and See5 for period 1990 - 1997

Figure 4.5. The result of classification process that is using ERDAS Imagine 8.7, CART and See5 for period 1997 - 2001

Figure 4.6. ERDAS Imagine 8.7 Modeler Maker, the model is to clip deforestation class in first period to the raster in second period.

Figure 4.7. The result of clipping process between deforestation class in Period 1990 -1997 will be as non-forest in Period 1997 – 2001.

Figure 4.8. The result of clipping process between boundary polygon of study area and square shaped of cells.

40 Figure 4.9. SRTM (Shuttle Radar Topography Mission) data of topography

was obtained from GLCF website, and displaying by ERDAS Imagine 8.7

Figure 4.10. The result of assigning data by location (spatial join) of contour or altitude data (LogR_Alt), Figure 4.14. where yellow cells is altitude < 250 m (1) and light blue is ≥250 m (0).

Figure 4.11. The result of assigning data by location (spatial join) of river buffer (LogR_Riv) 1,000 m, where yellow cells is river or group of river < 1,000 m (1) and light blue is ≥ 1,000 m (0).

(13)

Figure 4.12. The result of assigning data by location (spatial join) of road buffer (LogR_Road) 1,000 m, where yellow cells is road or group of road network < 1,000 m (1) and light blue is ≥ 1,000 m (0)

Figure 4.13. The result of assigning data by location (spatial join) of shore line (LogR_SL) 1,000 m, where yellow cells is shoreline buffered < 1,000 m (1) and light blue is ≥ 1,000 m (0)

Figure 4.14. The result of assigning data by location (spatial join) of shore line (LogR_CP) 1,000 m, where yellow cells is center population buffered < 10,000 m (1) and light blue is ≥ 10,000 m (0)

Figure 4.15. The result of assigning data by location (spatial join) of aspect (LogR_COM), where yellow cells is East, West,, and flat area and remaining compass is the light blue.

Figure 4.16. The result of assigning data by location (spatial join) of slope (LogR_Slp), where yellow cells less 25 degree and the blue light is 25 – 90 degree.

Figure 4.17. Classification plot (ClassPlot), another very useful piece of information for assessing goodness of fit for the model

Figure 4.18. Comparison between prediction deforestation and actual deforestation in 2001.

(14)

LIST OF TABLE

No. Caption Page

Table 2.1 References of Regression logistic method for prediction model. 5

Table 3.1 Binary data and categorization of variables as the factors of deforestation ……… 27

Table 3.2 Recapitulation table as the result of SPSS calculation of logistic regression ……… 30

Table 3.3. Correlation matrix, that defining the correlation among the variables ……….. 30

Table 4.1 Variables not in the Equation ……….. 52

Table 4.2 Variables in the equation ……….. 53

Table 4.3 Variables in the Equation of null model ……….. 54

Table 4.4 Omnibus Tests of Model Coefficients ……… 54

Table 4.5 Hosmer and Lemeshow Test. ……….. 55

Table 4.6 Contingency Table for Hosmer and Lemeshow Test …………... 55

Table 4.7 Classification Table ………. 56

Table 4.8 Variables in the Equation and Wald test ……….. 58

Table 4.9 Recapitulation of raster classification process ………. 62

Table 4.10 Recapitulation of polygon vector cell process ………. 63 Table 4.11 Error matrix resulting from classifying logistic regression model

………..

(15)

LIST OF APPENDIX

No Caption Page

Appendix 1. Vector Map of Study Area ……… 74

Appendix 2. Illustration of attribute table in spatial processing ……… 75

Appendix 3. SPSS Output ……… 76

(16)

I. INTRODUCTION

1.1. Background

Wildlife reserve is a kind of protected area that possesses the unique characteristic species and or biodiversity and to manage the habitat for their living sustainability (Act (UU) No 5, 1990). Cikepuh Wildlife Reserve is one of conservation area located in the southern of Sukabumi District, West Java Province. Cikepuh Wildlife Reserve is bordered in the northern by Cibanteng Natural Reserve. Cibanteng Natural Reserve is forest and natural grass land and suitable for wildlife habitat.

Cikepuh Wildlife Reserve and Cibanteng Natural Reserve, since both conservation area was established in 1973 and 1925 have been faced complex problem caused by land use changed, deforestation, illegal hunting, forest fire, and so on. Sahardjo (2000) indicated that Cikepuh Wildlife Reserve had suffered from degradation forest, reaching 80%. Most of this degradation has been caused by illegal logging for wood and paddy field. Deforestation itself is a complex socio-economic, cultural, and political event. Concern over the rate of deforestation has given rise to a literature that quantifies the impact of forces that drive deforestation. The literature has focused on two questions: (1) What factor affect the location of deforestation? And (2) What factors affect the rate of deforestation? It is clearly important to know where deforestation is likely to occur (Cropper, Puri, Griffiths 2001).

This thesis focused on the first question above by considering some common driving forces of deforestation and using a spatial model to look at broader condition and logistic regression model is proposed as an effective framework for the modeling prediction of forest cover and non-forest category associated with the spatial pattern and rates of deforestation.

Logistic regression model is a special type of regression models, which used to study the probability of membership in two contradictory classes or categories. It should be noted that logistic regression can be used

(17)

identically such as deforestation possibility or stable forest. In the application of logistic regression, each “observation” is a cell.

Recent development of GIS (Geographic Information System) technology enhances the analytical power needed for the study of land use and land cover change. Remote sensing is a science that records and analyses the radiation reflected or emitted by the objects on the earth surface. The nature of the object land cover type (forest) determines which proportion of the radiation in a specific part of the electromagnetic spectrum (different wavelength) will be reflected and recorded by sensor of the satellite. The changes of land use and land cover due to natural and human activities can be observed using current and historical remotely sensed data available form archives.

1.2. Objectives

The objective of this study is to translate the complexity of deforestation processes into simple model by using the statistical method of logistic regression model which analyze the probability of deforestation in each single cell. The purpose of this analysis is to measure the possibilities of changed forest (deforestation) or unchanged forest based on the predictor or variable factor of its driving force, such as distance from center of dweller, aspect, slope, distance from shore line, distance from existing road, and elevation.

As the main objective of this study is to observe the performance of logistic regression model and spatial method, and the specific objectives are:

1. to quantify the forest cover and deforestation.

(18)

3. to elaborate spatial projection of future deforestation trends based on possibility of deforestation resulted from predicting logistic regression.

1.3. Hypothesis

Logistic regression does not assume a linear relationship between the dependents and the independents variables. Logistic regression does not require linear relationships between the independent variables as well, so in this study the hypothesis for the logistic regression analysis will be:

At least one of independent variables such as distance from river road, shore line and center of population, altitude, aspect, or slope is not equal to zero, and can be used for predicting deforestation by the equation of logistic regression.

(19)

II. LITERATURE REVIEW

2.1. Logistic Regression Model

Model can be interpreted as simplification of a system. While system is the illustration of a process or some processes (some sub-system) regularly. Model only depicting some aspects from a system and not have to express entire process that happened in the system. More process that being explained, the model will be more complex beside more inputs required. By the reason, primary factors at one particular model are the target of when that model is created. Based on the goal/target, model can be divided to become three kinds of (1) to the understanding of process, (2) prediction, and (3) for management purpose (Handoko 1994).

There are two distinct approaches to the modeling of systems: (1) statistical models and (2) structural models. In statistical model, a relationship between observed output and known input of a system is established by postulating a general mating the parameter of the relationship by adjusting them to best fit the empirical data. Regression and correlation analysis are examples of this widely used method. While structural model attempt to describe the structure of the system, which is responsible for its behavior (Bossol 1986)

Logistic regression model is being used to analyze the relationship between the explanatory variables and the outcome/response. The outcome variable is categorized to be success or failure, i.e zero or one. It is assumed each observation is independent one another, so that the number of success events will be binomial distribution (Sutisna 2002). Also logistic regression is a technique which used to analyzed data that its response variable is binary or dichotomy scale (Hosmer and Lemeshow 1989 in Amanati 2001), and independent variable can be continue or categorical-scale of data (Amanati 2001)

According to Saadi and Abolfazi (2003) that a logistic regression model is a statistical model which a relation between a phenomenon (a dependent variable) and some of its factors (independent variables) will be

(20)

defined based on some observation. These observations are in fact a set of values measured or observed for the dependent and independent variables. Having the model specified and calibrated, the unknown value of the phenomenon can be calculated and predicted on the basis of known values of its factors.

Often, the spatial phenomenon under investigation can only be described by a categorical variable for example bird distribution indicating presence or absence of birds (Anonim 2004) or forest area being either stables or destroy (Saadi and Abolfazi 2003). Another word according to Saadi and Abolfazi (2003) mentioned that a regression model was a special type of regression models, which used to study the probability of membership in two contradictory classes. It should be noted that logistic regression can be used to determine the probability of any of the two possibilities (categories) identically. Previous regression technique is not suitable because the dependent variable is neither interval or ratio (Anonim 2004).

Table 2.1 shows the references of logistic regression model that being used to predict deforestation and other purposes.

Table 2.1. References of Regression logistic method for prediction model.

No Authors Research Title Methods Results

Chengling Xie Bo Huang

Christophe Claramunt Magesh Chnadramouli

Spatial logistic regression and GIS to model rural-urban land conversion